Title: | Read-Write API Client Library for Wikidata |
---|---|
Description: | Read from, interrogate, and write to Wikidata <https://www.wikidata.org> - the multilingual, interdisciplinary, semantic knowledgebase. Includes functions to: read from wikidata (single items, properties, or properties); query wikidata (retrieving all items that match a set of criteria via Wikidata SPARQL query service); write to Wikidata (adding new items or statements via QuickStatements); and handle and manipulate Wikidata objects (as lists and tibbles). Uses the Wikidata and Quickstatements APIs. |
Authors: | Thomas Shafee [aut, cre] , Os Keyes [aut] , Serena Signorelli [aut], Alex Lum [ctb] , Christian Graul [ctb], Mikhail Popov [ctb] |
Maintainer: | Thomas Shafee <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.3.3 |
Built: | 2024-11-09 04:08:55 UTC |
Source: | https://github.com/ts404/wikidatar |
Convert an input string to the most likely property PID
as_pid(x)
as_pid(x)
x |
a vector, data frame, or tibble of strings representaing wikidata properties |
if the inputted string is a valid PID, return the string. If the inputted string matches a property label, return its PID. If the inputted string matches multiple labels of multiple properties, return the PID of the first hit.
# if input string is a valid PID as_pid("P50") # if input string matches multiple item labels as_pid("author") # if input string matches a single unique label as_pid("Scopus author ID")
# if input string is a valid PID as_pid("P50") # if input string matches multiple item labels as_pid("author") # if input string matches a single unique label as_pid("Scopus author ID")
Convert an input string to the most likely item QID
as_qid(x)
as_qid(x)
x |
a vector, data frame, or tibble of strings representaing wikidata items |
if the inputted string is a valid QID, return the string. If the inputted string matches an item label, return its QID. If the inputted string matches multiple labels of multiple items, return the QID of the first hit.
# if input string is a valid QID as_qid("Q42") # if input string matches multiple item labels as_qid("Douglas Adams") # if input string matches a single unique label as_qid("Douglas Adams and the question of arterial blood pressure in mammals")
# if input string is a valid QID as_qid("Q42") # if input string matches multiple item labels as_qid("Douglas Adams") # if input string matches a single unique label as_qid("Douglas Adams and the question of arterial blood pressure in mammals")
Add escaped quotation marks around strings that need them ready for submision to an API
as_quot(x, format = "tibble")
as_quot(x, format = "tibble")
x |
a vector, data frame, or tibble of strings |
format |
either "tibble" / "csv" to use plain quotation marks (default), or "api" / "website" to use '%22' |
tibble of items inside of escaped quotation marks unless they are already in escaped quotation marks, is a QID, (in which chase it is returned unchanged)
as_quot("text")
as_quot("text")
Convert an input string to the most likely source SID (equivalent to PID)
as_sid(x)
as_sid(x)
x |
a vector, data frame, or tibble of strings representaing wikidata source properties |
if the inputted string is a valid SID, return the string. If the inputted string matches a property label, return its SID If the inputted string matches multiple labels of multiple properties, return the SID of the first hit.
# if input string is a valid SID as_pid("S854") # if input string matches multiple item labels as_pid("URL") # if input string matches a single unique label as_pid("Reference URL")
# if input string is a valid SID as_pid("S854") # if input string matches multiple item labels as_pid("URL") # if input string matches a single unique label as_pid("Reference URL")
Utility function to handle namespaces. Used by get_item
and get_property
check_input(input, substitution)
check_input(input, substitution)
input |
string to check |
substitution |
string for what's been looked for |
boolian indicating whether the checked string contains a match for the substitution string
Add in empty lines for QuickStatements CREATE rows that mint new QIDs. This is a slightly messy quirk of the QuickStatements format that mints new QIDs via a line containing only "CREATE", so this function is a way to approximate that bevaviour in a tibble
createrows(items, vector)
createrows(items, vector)
items |
a vector, data frame or tibble of items (which may or may not contain the keyword "CREATE") |
vector |
a vector of properties or values which may be expanded based on the items vector |
if the vector is NULL, return NULL. Otherwise, if the "CREATE" keyword appears in the items vector, insert blank strings at those positions in the vector.
Add in QuickStatements CREATE rows that mint new QIDs from tidy input data. New items are created by any item starting that starts with the text "CREATE" followed by any unique ID.
createrows.tidy(QS.tib)
createrows.tidy(QS.tib)
QS.tib |
a tibble of items, values and properties (optionally qualifiers and sources). |
a tibble, with items that start with "CREATE" followed by any unique text causing the addition of a "Create" line above, being replaced with "LAST" in the Quickstatemnts format to create new QIDs.
Interactive function that presents alternative possible QID matches for a list of text strings and provides options for choosing between alternatives, rejecting all presented alternatives, or creating new items. Useful in cases where a list of text strings may have either missing wikidata items or multiple alternative potential matches that need to be manually disambuguated. Can also used on lists of lists (see examples). For long lists of items, the process can be stopped partway through and the returned vector will indicate where the process was stopped.
disambiguate_QIDs( list, variablename = "variables", variableinfo = NULL, filter_property = NULL, filter_variable = NULL, filter_firsthit = FALSE, Q_min = NULL, auto_create = FALSE, limit = 10 )
disambiguate_QIDs( list, variablename = "variables", variableinfo = NULL, filter_property = NULL, filter_variable = NULL, filter_firsthit = FALSE, Q_min = NULL, auto_create = FALSE, limit = 10 )
list |
a list or vector of text strings to find potential QID matches to. Can also be a list of lists (see examples) |
variablename |
type of items in the list that are being disambiguated (used in messages) |
variableinfo |
additional information about items that are being disambiguated (used in messages) |
filter_property |
property to filter on (e.g. "P31" to filter on "instance of") |
filter_variable |
values of that property to use to filter out (e.g. "Q571" to filter out books) |
filter_firsthit |
apply filter to the first match presented or only if alternatives requested? (default = FALSE, note: true is slower if filter not needed on most matches) |
Q_min |
return only possible hits with QIDs above the provided value |
auto_create |
if no match found, automatically assign "CREATE" |
limit |
number of alternative possible wikidata items to present if multiple potential matches |
a vector of:
Selected QID (for when an appropriate Wikidata match exists)
Mark that a new Wikidata item should be created (for when no appropriate Wikidata match yet exists)
Mark that no Wikidata item is needed
Mark that the process was halted at this point (so that output can be used as input to the function later)
## Not run: #Disambiguating possible QID matches for these music genres #Results should be: # "Q22731" as the first match # "Q147538" as the first match # "Q3947" as the second alternative match disambiguate_QIDs(list=c("Rock","Pop","House"), variablename="music genre") #Disambiguating possible QID matches for these three words, but not the music genres #This will take longer as the filtering step is slower #Results should be: # "Q22731" (the material) as the first match # "Q147538" (the soft drink) as the second alternative match # "Q3947" (the building) as the first match disambiguate_QIDs(list=c("Rock","Pop","House"), filter_property="instance of", filter_variable="music genre", filter_firsthit=TRUE, variablename="concept, not the music genre") #Disambiguating possible QID matches for the multiple expertise of #these three people as list of lists disambiguate_QIDs(list=list(alice=list("physics","chemistry","maths"), barry=list("history"), clair=list("law","genetics","ethics")), variablename="expertise") ## End(Not run)
## Not run: #Disambiguating possible QID matches for these music genres #Results should be: # "Q22731" as the first match # "Q147538" as the first match # "Q3947" as the second alternative match disambiguate_QIDs(list=c("Rock","Pop","House"), variablename="music genre") #Disambiguating possible QID matches for these three words, but not the music genres #This will take longer as the filtering step is slower #Results should be: # "Q22731" (the material) as the first match # "Q147538" (the soft drink) as the second alternative match # "Q3947" (the building) as the first match disambiguate_QIDs(list=c("Rock","Pop","House"), filter_property="instance of", filter_variable="music genre", filter_firsthit=TRUE, variablename="concept, not the music genre") #Disambiguating possible QID matches for the multiple expertise of #these three people as list of lists disambiguate_QIDs(list=list(alice=list("physics","chemistry","maths"), barry=list("history"), clair=list("law","genetics","ethics")), variablename="expertise") ## End(Not run)
extract claim information from data returned using
get_item
.
extract_claims(items, claims)
extract_claims(items, claims)
items |
a list of one or more Wikidata items returned with
|
claims |
a vector of claims (in the form "P321", "P12") to look for and extract. |
a list containing one sub-list for each entry in items
,
and (below that) the found data for each claim. In the event a claim
cannot be found for an item, an NA
will be returned
instead.
# Get item data adams_data <- get_item("42") # Get claim data claims <- extract_claims(adams_data, "P31")
# Get item data adams_data <- get_item("42") # Get claim data claims <- extract_claims(adams_data, "P31")
Return the nth paragraph of a section of text Useful for extracting information from wikipedia or other wikimarkup text
extract_para(text, para = 1, templ = NULL)
extract_para(text, para = 1, templ = NULL)
text |
the input text as a string |
para |
number indicating which paragraph(s) to return (default=1) |
templ |
an optional string specifying a mediawikitemplate within which to restrict the search restrict search |
the nth paragraph of the input text.
For a QID or vector of QIDs, remove ones that match a particular statement (e.g. remove all that are instances of academic publications or books).
filter_qids( ids, property = "P31", filter = c("Q737498", "Q5633421", "Q7725634", "Q13442814", "Q18918145"), message = NULL )
filter_qids( ids, property = "P31", filter = c("Q737498", "Q5633421", "Q7725634", "Q13442814", "Q18918145"), message = NULL )
ids |
QIDs to check |
property |
property to check (default = P31 to filter on "instance of") |
filter |
values of that property to use to filter out (default = Q737498, Q5633421, Q7725634, Q13442814, and Q18918145 to remove academic publications or books) |
message |
message to return (useful for disambiguate_QIDs function) |
a vector of QIDs that do not match the property filter
## Not run: # Filter three items called "Earth Science" to show only those that aren't # books, journals or journal articles filter_qids(ids = c("Q96695546","Q8008","Q58966429"), property = "P31", filter = c("Q737498","Q5633421","Q7725634","Q13442814","Q18918145")) ## End(Not run)
## Not run: # Filter three items called "Earth Science" to show only those that aren't # books, journals or journal articles filter_qids(ids = c("Q96695546","Q8008","Q58966429"), property = "P31", filter = c("Q737498","Q5633421","Q7725634","Q13442814","Q18918145")) ## End(Not run)
find_item
and find_property
allow you to retrieve a set
of Wikidata items or properties where the aliase or descriptions match a particular
search term. As with other WikidataR
code, custom print methods are available;
use str
to manipulate and see the underlying structure of the data.
find_item( search_term, language = "en", limit = 10, response_language = "en", ... ) find_property( search_term, language = "en", response_language = "en", limit = 10 )
find_item( search_term, language = "en", limit = 10, response_language = "en", ... ) find_property( search_term, language = "en", response_language = "en", limit = 10 )
search_term |
a term to search for. |
language |
the language to return the labels and descriptions in; this should consist of an ISO language code. Set to "en" by default. |
limit |
the number of results to return; set to 10 by default. |
... |
further arguments to pass to httr's GET. |
get_random
for selecting a random item or property,
or get_item
for selecting a specific item or property.
#Check for entries relating to Douglas Adams in some way adams_items <- find_item("Douglas Adams") #Check for properties involving the peerage peerage_props <- find_property("peerage")
#Check for entries relating to Douglas Adams in some way adams_items <- find_item("Douglas Adams") #Check for properties involving the peerage peerage_props <- find_property("peerage")
Gets the specified example(s) from [SPARQL query service examples page](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples) using [Wikidata's MediaWiki API](https://www.wikidata.org/w/api.php).
get_example(example_name)
get_example(example_name)
example_name |
the names of the examples as they appear on [this page](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples) |
If you are planning on extracting multiple examples, please provide all the names as a single vector for efficiency.
The SPARQL query as a character vector.
[query_wikidata]
## Not run: sparql_query <- extract_example(c("Cats", "Horses")) query_wikidata(sparql_query) # returns a named list with two data frames # one called "Cats" and one called "Horses" sparql_query <- extract_example("Largest cities with female mayor") cat(sparql_query) query_wikidata(sparql_query) ## End(Not run)
## Not run: sparql_query <- extract_example(c("Cats", "Horses")) query_wikidata(sparql_query) # returns a named list with two data frames # one called "Cats" and one called "Horses" sparql_query <- extract_example("Largest cities with female mayor") cat(sparql_query) query_wikidata(sparql_query) ## End(Not run)
get_geo_box
retrieves all geographic entities in
Wikidata that fall between a bounding box between two existing items
with geographic attributes (usually cities).
get_geo_box( first_city_code, first_corner, second_city_code, second_corner, language = "en", ... )
get_geo_box( first_city_code, first_corner, second_city_code, second_corner, language = "en", ... )
first_city_code |
a Wikidata item, or series of items, to use for one corner of the bounding box. |
first_corner |
the direction of |
second_city_code |
a Wikidata item, or series of items, to use for one corner of the bounding box. |
second_corner |
the direction of |
language |
the two-letter language code to use for the name of the item. "en" by default. |
... |
further arguments to pass to httr's GET. |
a data.frame of 5 columns:
item the Wikidata identifier of each object associated with
entity
.
name the name of the item, if available, in the requested language. If it
is not available, NA
will be returned instead.
latitude the latitude of item
longitude the longitude of item
entity the entity the item is associated with (necessary for multi-entity queries).
get_geo_entity
for using an unrestricted search or simple radius,
rather than a bounding box.
# Simple bounding box bruges_box <- get_geo_box("Q12988", "NorthEast", "Q184287", "SouthWest") # Custom language bruges_box_fr <- get_geo_box("Q12988", "NorthEast", "Q184287", "SouthWest", language = "fr")
# Simple bounding box bruges_box <- get_geo_box("Q12988", "NorthEast", "Q184287", "SouthWest") # Custom language bruges_box_fr <- get_geo_box("Q12988", "NorthEast", "Q184287", "SouthWest", language = "fr")
get_geo_entity
retrieves the item ID, latitude
and longitude of any object with geographic data associated with another
object with geographic data (example: all the locations around/near/associated with
a city).
get_geo_entity(entity, language = "en", radius = NULL, limit = 100, ...)
get_geo_entity(entity, language = "en", radius = NULL, limit = 100, ...)
entity |
a Wikidata item ( |
language |
the two-letter language code to use for the name of the item. "en" by default, because we're imperialist anglocentric westerners. |
radius |
optionally, a radius (in kilometers) around |
limit |
the maximum number of results to return. |
... |
further arguments to pass to httr's GET. |
a data.frame of 5 columns:
item the Wikidata identifier of each object associated with
entity
.
name the name of the item, if available, in the requested language. If it
is not available, NA
will be returned instead.
latitude the latitude of item
longitude the longitude of item
entity the entity the item is associated with (necessary for multi-entity queries).
get_geo_box
for using a bounding box
rather than an unrestricted search or simple radius.
# All entities sf_locations <- get_geo_entity("Q62") # Entities with French, rather than English, names sf_locations <- get_geo_entity("Q62", language = "fr") # Entities within 1km sf_close_locations <- get_geo_entity("Q62", radius = 1) # Multiple entities multi_entity <- get_geo_entity(entity = c("Q62", "Q64"))
# All entities sf_locations <- get_geo_entity("Q62") # Entities with French, rather than English, names sf_locations <- get_geo_entity("Q62", language = "fr") # Entities within 1km sf_close_locations <- get_geo_entity("Q62", radius = 1) # Multiple entities multi_entity <- get_geo_entity(entity = c("Q62", "Q64"))
get_item
and get_property
allow you to retrieve the data associated
with individual Wikidata items and properties, respectively. As with
other WikidataR
code, custom print methods are available; use str
to manipulate and see the underlying structure of the data.
get_item(id, ...) get_property(id, ...)
get_item(id, ...) get_property(id, ...)
id |
the ID number(s) of the item or property you're looking for. This can be in various formats; either a numeric value ("200"), the full name ("Q200") or even with an included namespace ("Property:P10") - the function will format it appropriately. This function is vectorised and will happily accept multiple IDs. |
... |
further arguments to pass to httr's GET. |
get_random
for selecting a random item or property,
or find_item
for using search functionality to pull out
item or property IDs where the descriptions or aliases match a particular
search term.
#Retrieve a specific item adams_metadata <- get_item("42") #Retrieve a specific property object_is_child <- get_property("P40")
#Retrieve a specific item adams_metadata <- get_item("42") #Retrieve a specific property object_is_child <- get_property("P40")
For a claim or set of claims, return the names of the properties
get_names_from_properties(properties)
get_names_from_properties(properties)
properties |
a claims list from |
tibble of labels for each property for a set of claims
get_random_item
and get_random_property
allow you to retrieve the data
associated with randomly-selected Wikidata items and properties, respectively. As with
other WikidataR
code, custom print methods are available; use str
to manipulate and see the underlying structure of the data.
get_random_item(limit = 1, ...) get_random_property(limit = 1, ...)
get_random_item(limit = 1, ...) get_random_property(limit = 1, ...)
limit |
how many random items to return. 1 by default, but can be higher. |
... |
arguments to pass to httr's GET. |
get_item
for selecting a specific item or property,
or find_item
for using search functionality to pull out
item or property IDs where the descriptions or aliases match a particular
search term.
## Not run: #Random item random_item <- get_random_item() #Random property random_property <- get_random_property() ## End(Not run)
## Not run: #Random item random_item <- get_random_item() #Random property random_property <- get_random_property() ## End(Not run)
convert unique identifiers to other unique identifiers
identifier_from_identifier( property = "ORCID iD", return = "IMDb ID", value = "0000-0002-7865-7235" )
identifier_from_identifier( property = "ORCID iD", return = "IMDb ID", value = "0000-0002-7865-7235" )
property |
the identifier property to search (for caveats, see |
return |
the identifier property to convert to |
value |
the identifier value to match |
vector of identifiers corresponding to identifiers submitted
identifier_from_identifier('ORCID iD','IMDb ID',c('0000-0002-7865-7235','0000-0003-1079-5604'))
identifier_from_identifier('ORCID iD','IMDb ID',c('0000-0002-7865-7235','0000-0003-1079-5604'))
Converting names into first initial and surname, or just initials
initials(x, format = "FLast")
initials(x, format = "FLast")
x |
a vector of people's names as strings |
format |
a vector of strings of either "FLast" or "FL" to indicate the output format |
the inputted name strings with first names shortened based on the selected format.
for a downloaded wikidata item, list the properties of all statements
list_properties(item, names = FALSE)
list_properties(item, names = FALSE)
item |
a list of one or more Wikidata items returned with
|
names |
a boolian for whether to return property names, or just P numbers and extract. |
a list containing one sub-list for each entry in items
,
and (below that) the found data for each claim. In the event a claim
cannot be found for an item, an NA
will be returned
instead.
# Get item data adams_data <- get_item("42") # Get claim data claims <- extract_claims(adams_data, "P31")
# Get item data adams_data <- get_item("42") # Get claim data claims <- extract_claims(adams_data, "P31")
print found items.
## S3 method for class 'find_item' print(x, ...)
## S3 method for class 'find_item' print(x, ...)
x |
find_item object with search results |
... |
Arguments to be passed to methods |
print found properties.
## S3 method for class 'find_property' print(x, ...)
## S3 method for class 'find_property' print(x, ...)
x |
find_property object with search results |
... |
Arguments to be passed to methods |
print found objects generally.
## S3 method for class 'wikidata' print(x, ...)
## S3 method for class 'wikidata' print(x, ...)
x |
wikidata object from get_item, get_random_item, get_property or get_random_property |
... |
Arguments to be passed to methods |
get_item, get_random_item, get_property or get_random_property
simple converter from DOIs to QIDs (for items in wikidata)
qid_from_DOI(DOI = "10.15347/WJM/2019.001")
qid_from_DOI(DOI = "10.15347/WJM/2019.001")
DOI |
digital object identifiers submitted as strings |
vector of QIDs corresponding to DOIs submitted
convert unique identifiers to QIDs (for items in wikidata).
qid_from_identifier( property = "DOI", value = c("10.15347/WJM/2019.001", "10.15347/WJM/2020.002") )
qid_from_identifier( property = "DOI", value = c("10.15347/WJM/2019.001", "10.15347/WJM/2020.002") )
property |
the identifier property to search (for caveats, see |
value |
the identifier value to match |
vector of QIDs corresponding to identifiers submitted
qid_from_identifier('ISBN-13','978-0-262-53817-6')
qid_from_identifier('ISBN-13','978-0-262-53817-6')
simple converter from label names to QIDs (for items in wikidata).
Essentially a simplification of find_item
qid_from_name(name = "Thomas Shafee", limit = 100, format = "vector")
qid_from_name(name = "Thomas Shafee", limit = 100, format = "vector")
name |
name labels submitted as strings |
limit |
if multiple QIDs match each submitted name, how many to return |
format |
output format ('vector' to return a simple vector, or 'list' to return a nested list) |
vector of QIDs corresponding to names submitted. Note: some names may return multiple QIDs.
simple converter from ORCIDs to QIDs (for items in wikidata)
qid_from_ORCID(ORCID = "0000-0002-2298-7593")
qid_from_ORCID(ORCID = "0000-0002-2298-7593")
ORCID |
digital object identifiers submitted as strings |
vector of QIDs corresponding to ORCIDs submitted
Makes a POST request to Wikidata Query Service SPARQL endpoint.
query_wikidata(sparql_query, format = "tibble", ...)
query_wikidata(sparql_query, format = "tibble", ...)
sparql_query |
SPARQL query (can be a vector of queries) |
format |
'tibble' (default) returns a pure character data frame, 'simple' returns a pure character vector, while 'smart' fetches JSON-formatted data and returns a tibble with datetime columns converted to 'POSIXct' |
... |
Additional parameters to supply to [httr::POST] |
A 'tibble' or 'vector'. Note: QID values will be returned as QIDs, rather than URLs.
There is a hard query deadline configured which is set to 60 seconds. There are also following limits: - One client (user agent + IP) is allowed 60 seconds of processing time each 60 seconds - One client is allowed 30 error queries per minute See [query limits section](https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual#Query_limits) in the WDQS user manual for more information.
# R's versions and release dates: sparql_query <- 'SELECT DISTINCT ?softwareVersion ?publicationDate WHERE { BIND(wd:Q206904 AS ?R) ?R p:P348 [ ps:P348 ?softwareVersion; pq:P577 ?publicationDate ] . }' query_wikidata(sparql_query) ## Not run: # "smart" format converts all datetime columns to POSIXct query_wikidata(sparql_query, format = "smart") ## End(Not run)
# R's versions and release dates: sparql_query <- 'SELECT DISTINCT ?softwareVersion ?publicationDate WHERE { BIND(wd:Q206904 AS ?R) ?R p:P348 [ ps:P348 ?softwareVersion; pq:P577 ?publicationDate ] . }' query_wikidata(sparql_query) ## Not run: # "smart" format converts all datetime columns to POSIXct query_wikidata(sparql_query, format = "smart") ## End(Not run)
Convert an input string to the most likely item QID
searcher(search_term, language, limit, response_language, type, ...)
searcher(search_term, language, limit, response_language, type, ...)
search_term |
a term to search for. |
language |
the language to conduct the search in; this should consist of an ISO language code. Set to "en" by default. |
limit |
the number of results to return; set to 10 by default. |
response_language |
the language to return the labels and descriptions in; this should consist of an ISO language code. Set to "en" by default. |
type |
type of wikidata object to return (default = "item") |
... |
Additional parameters to supply to [httr::POST] |
If the inputted string matches an item label, return its QID. If the inputted string matches multiple labels of multiple items, return the QID of the first hit. If the inputted string is already a QID, return the string.
# if input string is a valid QID as_qid("Q42") # if input string matches multiple item labels as_qid("Douglas Adams") # if input string matches a single unique label as_qid("Douglas Adams and the question of arterial blood pressure in mammals")
# if input string is a valid QID as_qid("Q42") # if input string matches multiple item labels as_qid("Douglas Adams") # if input string matches a single unique label as_qid("Douglas Adams and the question of arterial blood pressure in mammals")
Utility wrapper for wikidata spargle endpoint to download items.
Used by get_geo_entity
and get_geo_box
sparql_query(query, ...)
sparql_query(query, ...)
query |
the sparql query as a string |
... |
Additional parameters to supply to [httr::POST] |
a download of the full wikidata objects formatted as a nested json list
Special characters can otherwise mess up wikidata read-writes
unspecial(x)
unspecial(x)
x |
a vector of strings to check for special characters |
the inputted strings with special characters replaced with closest match plan characters.
Convert a URL ending in an identifier (returned by SPARQL queries) to just the plain identifier (QID or PID).
Convert a URL ending in an identifier (returned by SPARQL queries) to just the plan identifier (QID or PID).
url_to_id(x) url_to_id(x)
url_to_id(x) url_to_id(x)
x |
a vector of strings representing wikidata URLs |
if the URL ends in a QID or PID, return that PID or QID, else return the original string
QID or PID
url_to_id("http://www.wikidata.org/entity/42") url_to_id("http://www.wikidata.org/Q42")
url_to_id("http://www.wikidata.org/entity/42") url_to_id("http://www.wikidata.org/Q42")
Utility wrapper for wikidata API to download item.
Used by get_item
and get_property
wd_query(title, ...)
wd_query(title, ...)
title |
the wikidata item or property as a string |
... |
Additional parameters to supply to [httr::POST] |
a download of the full wikidata object (item or property) formatted as a nested json list
Utility wrapper for wikidata API to download random items. Used by random_item
wd_rand_query(ns, limit, ...)
wd_rand_query(ns, limit, ...)
ns |
string indicating namespace, most commonly "Main" for QID items, "Property" for PID properties |
limit |
how many random objesct to return |
... |
Additional parameters to supply to [httr::POST] |
a download of the full wikidata objects (items or properties) formatted as nested json lists
A dataset of Wikidata global variables.
A list of tibbles documenting key property constraints from wikidata
valid reference source properties
required data type for each property
expected regex match for each property
language abbreviations
language abbreviations for current wikis
Wikimedia abbreviations for current wikis
...
This package serves as an API client for reading and writing to and from Wikidata, (including via the QuickStatements format), as well as for reading from Wikipedia.
get_random
for selecting a random item or property,
get_item
for a /specific/ item or property, or find_item
for using search functionality to pull out item or property IDs where the descriptions
or aliases match a particular search term.
Upload data to a Wikibase instance, including creating items, adding statements to existing items (via the quickstatements format and API).
write_wikibase( items, properties = NULL, values = NULL, qual.properties = NULL, qual.values = NULL, src.properties = NULL, src.values = NULL, remove = FALSE, format = "tibble", format.csv.file = NULL, api.username = NULL, api.token = NULL, api.format = "v1", api.batchname = NULL, api.submit = TRUE, quickstatements.url = NULL, coordinate_pid = NULL )
write_wikibase( items, properties = NULL, values = NULL, qual.properties = NULL, qual.values = NULL, src.properties = NULL, src.values = NULL, remove = FALSE, format = "tibble", format.csv.file = NULL, api.username = NULL, api.token = NULL, api.format = "v1", api.batchname = NULL, api.submit = TRUE, quickstatements.url = NULL, coordinate_pid = NULL )
items |
a vector of strings indicating the items to which to add statements (as QIDs or labels).
Note: In contrast to |
properties |
a vector of strings indicating the properties to add as statements (as PIDs or labels).
Note: In contrast to |
values |
a vector of strings indicating the values to add as statements (as QIDs). Note: if strings are provided, they will be treated as plain text. |
qual.properties |
a vector, data frame, or tibble of strings indicating the properties to add as qualifiers to statements (as PIDs). |
qual.values |
a vector, data frame, or tibble of strings indicating the values to add as statements (as QIDs or strings). Note: if strings are provided, they will be treated as plain text. |
src.properties |
a vector, data frame, or tibble of strings indicating the properties to add as reference sources to statements (as SIDs or labels).
Note: if labels are provided, and multiple items match, the first matching item will be used
(see |
src.values |
a vector, data frame, or tibble of strings indicating the values to add reference sources to statements (as QIDs or strings). Note: if strings are provided, they will be treated as plain text. |
remove |
a vector of boolians for each statemnt indicating whether it should be removed from the item rather than added (default = FALSE) |
format |
output format as a string. Options include:
|
format.csv.file |
path to save the csv file. If none is provided, then printed to console. |
api.username |
a string indicating your wikimedia username |
api.token |
a string indicating your api token (the unique identifier that you can find listed at [your user page](https://quickstatements.toolforge.org/#/user)) |
api.format |
a string indicateing which version of the quickstatement format used to submit the api (default = "v1") |
api.batchname |
a string create a named batch (listed at [your batch history page](https://quickstatements.toolforge.org/#/batches)) and tag in the edit summaries |
api.submit |
boolian indicating whether to submit instruction directly to wikidata (else returns the URL that can be copy-pasted into a web browser) |
quickstatements.url |
url to access quickstatements of the corresponding Wikibase instance. |
coordinate_pid |
PID of a geocoordinates; need to have a different formatting |
data formatted to upload to wikidata (via quickstatemsnts),
optionally also directly uploded to wikidata (see format
parameter).
# Add a statement to the "Wikidata sandbox" item (Q4115189) # to say that it is an "instance of" (P31) of Q1 (the universe). # The instruction will submit directly to wikidata via the API # (if you include your Wikibase/Wikimedia username and token) ## Not run: write_wikibase( items = "Q24", properties = "P2", values = "Q8", format = "api", api.username = "myusername", api.token = "mytoken", api.submit = TRUE, quickstatements.url = NULL ) ## End(Not run) # note:
# Add a statement to the "Wikidata sandbox" item (Q4115189) # to say that it is an "instance of" (P31) of Q1 (the universe). # The instruction will submit directly to wikidata via the API # (if you include your Wikibase/Wikimedia username and token) ## Not run: write_wikibase( items = "Q24", properties = "P2", values = "Q8", format = "api", api.username = "myusername", api.token = "mytoken", api.submit = TRUE, quickstatements.url = NULL ) ## End(Not run) # note:
Upload data to wikidata, including creating items, adding statements to existing items (via the quickstatements format and API).
write_wikidata( items, properties = NULL, values = NULL, qual.properties = NULL, qual.values = NULL, src.properties = NULL, src.values = NULL, remove = FALSE, format = "tibble", api.username = NULL, api.token = NULL, api.format = "v1", api.batchname = NULL, api.submit = TRUE )
write_wikidata( items, properties = NULL, values = NULL, qual.properties = NULL, qual.values = NULL, src.properties = NULL, src.values = NULL, remove = FALSE, format = "tibble", api.username = NULL, api.token = NULL, api.format = "v1", api.batchname = NULL, api.submit = TRUE )
items |
a vector of strings indicating the items to which to add statements (as QIDs or labels).
Note: if labels are provided, and multiple items match, the first matching item will be used
(see |
properties |
a vector of strings indicating the properties to add as statements (as PIDs or labels).
Note: if labels are provided, and multiple items match, the first matching item will be used
(see |
values |
a vector of strings indicating the values to add as statements (as QIDs or strings). Note: if strings are provided, they will be treated as plain text. |
qual.properties |
a vector, data frame, or tibble of strings indicating the properties to add as qualifiers to statements (as PIDs or labels).
Note: if labels are provided, and multiple items match, the first matching item will be used
(see |
qual.values |
a vector, data frame, or tibble of strings indicating the values to add as statements (as QIDs or strings). Note: if strings are provided, they will be treated as plain text. |
src.properties |
a vector, data frame, or tibble of strings indicating the properties to add as reference sources to statements (as SIDs or labels).
Note: if labels are provided, and multiple items match, the first matching item will be used
(see |
src.values |
a vector, data frame, or tibble of strings indicating the values to add reference sources to statements (as QIDs or strings). Note: if strings are provided, they will be treated as plain text. |
remove |
a vector of boolians for each statemnt indicating whether it should be removed from the item rather than added (default = FALSE) |
format |
output format as a string. Options include:
|
api.username |
a string indicating your wikimedia username |
api.token |
a string indicating your api token (the unique identifier that you can find listed at [your user page](https://quickstatements.toolforge.org/#/user)) |
api.format |
a string indicateing which version of the quickstatement format used to submit the api (default = "v1") |
api.batchname |
a string create a named batch (listed at [your batch history page](https://quickstatements.toolforge.org/#/batches)) and tag in the edit summaries |
api.submit |
boolian indicating whether to submit instruction directly to wikidata (else returns the URL that can be copy-pasted into a web browser) |
data formatted to upload to wikidata (via quickstatemsnts),
optionally also directly uploded to wikidata (see format
parameter).
# Add a statement to the "Wikidata sandbox" item (Q4115189) # to say that it is an "instance of" (P31) of Q1 (the universe). # The instruction will submit directly to wikidata via the API # (if you include your wikimedia username and token) ## Not run: write_wikidata(items = "Wikidata Sandbox", properties = "instance of", values = "Q1", format = "api", api.username = "myusername", api.token = , #REDACTED# ) ## End(Not run) #note:
# Add a statement to the "Wikidata sandbox" item (Q4115189) # to say that it is an "instance of" (P31) of Q1 (the universe). # The instruction will submit directly to wikidata via the API # (if you include your wikimedia username and token) ## Not run: write_wikidata(items = "Wikidata Sandbox", properties = "instance of", values = "Q1", format = "api", api.username = "myusername", api.token = , #REDACTED# ) ## End(Not run) #note: