Query Researchers and Affiliations by ORCID¶
This notebook shows how to fetch an researcher and co-authors by ORCID frist. Then it extracts affiliation data to map the locations on a world map. Finally it shows researcher-affiliation relationships in a graph.
Related Notebooks:
- ORCID Notebook
Query for researchers' data by passing an ORCID to the Augment API. Visualise co-author relationships in a graph. - Publications Notebook
Extract a publications list for a researcher in Bibtex Format. Visualise publication counts with a bar plot and generate a keyword word-cloud. - DOI Notebook
Query publications data by passing a DOI to the API.
import sys
sys.path.append('../')
# Package for mapping data on world map
# !{sys.executable} -m pip install folium
import folium
# Packages for plotting charts, graphs
import ast
import altair as alt
import networkx as nx
import nx_altair as nxa
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# Packages for data manipulation
import pandas as pd
from datetime import datetime, date
# Build-in packages to use API
import requests
import json
# packages to read API_KEY
import os
from os.path import join, dirname
from dotenv import load_dotenv
load_dotenv();
API Errors¶
When using the API, we load API_KEY and ORCID ID you want to search into variables and add them in the url string. Later the python request package will pass those values to the API and get the data you want. This section shows the 2 types of common errors you might get when using augment API. Either the ORCID id passed is invalid or the API_KEY is not load successfully from you environment file.
ORCID ID Not Found¶
Here we assign an invalid value to the ORCID variable. When error occurs, the request.get( ) will be an object with the status code indicating error type and an error message.
# ORCID ID not found
API_KEY = os.environ.get("API_KEY")
ORCID = "0000-0003-XXXX-XXXX"
url = f'https://augmentapi.researchgraph.com/v1/orcid/{ORCID}?subscription-key={API_KEY}'
r = requests.get(url)
# print a short confirmation on completion
print('Augment API query complete ', r.status_code)
if r.status_code == 400:
print(r.json()[0]["error"])
Missing API_KEY¶
You will receive an authentication error if the API KEY is invalid.
# Missing API_KEY
API_KEY = ''
ORCID = "0000-0002-0715-6126"
url = f'https://augmentapi.researchgraph.com/v1/orcid/{ORCID}?subscription-key={API_KEY}'
r = requests.get(url)
# print a short confirmation on completion
print('Augment API query complete ', r.status_code)
if r.status_code == 401:
print(f'Authentication error.',r.json()['message'])
Data Extraction for Valid ORCID ID¶
For valid ORCID records retrieved, it is a nested dictionary structure with all data that is connected to the ORCID requested. First level has 3 keys as shown in the block below.
# ORCID ID does exist
API_KEY = os.environ.get("API_KEY")
ORCID = "0000-0002-0068-716X"
url = f'https://augmentapi.researchgraph.com/v1/orcid/{ORCID}?subscription-key={API_KEY}'
r = requests.get(url)
# print a short confirmation on completion
print('Augment API query complete ', r.status_code)
# Shows data
print('The data returned has below fields: ',r.json()[0].keys())
In 'nodes', data is stored in 5 labels from the ResearchGraph schema:
r.json()[0]["nodes"].keys()
Each data above is stored as a list of dictionaries. To extract the researcher we need, iterate through the list and check for the ORCID.
if r.status_code == 200 and r.json()[0]["nodes"]["researchers"]:
researchers = r.json()[0]["nodes"]["researchers"]
researcher = None
for i in range(len(researchers)):
if researchers[i]["orcid"] == ORCID:
researcher = researchers[i]
print()
print(f'ORCID: {researcher["orcid"]}')
print(f'First name: {researcher["first_name"]}')
print(f'Last name: {researcher["last_name"]}')
print()
print(f'The researcher {researcher["full_name"]} is connected to {r.json()[0]["stats"]}.')
List of co-authors¶
The researchers in the list has the ORCID we queried and other researchers connected to it. Note this only includes researchers who has an ORCID.
rf = pd.DataFrame(r.json()[0]["nodes"]["researchers"], columns=['first_name', 'last_name', 'full_name', 'orcid'])
dfStyler = rf.style.set_properties(**{'text-align': 'left'})
dfStyler.set_table_styles([dict(selector='th', props=[('text-align', 'left')])])
List of co-author affiliations¶
Researcher affiliations can be extracted from organisation nodes, and an example of the record is like this:
r.json()[0]["nodes"]["organisations"][0]
Note that the key includs a ResearchGraph prefix. To extract id from WikiData only, we need to format the string using force_wiki_data( ).
# Strip wikidata ID from key
def force_wikidata(n):
n['key'] = n['key'].split('/')[-1]
return n
json = map(force_wikidata, r.json()[0]["nodes"]["organisations"])
of = pd.DataFrame(json, columns=['name', 'country', 'key', 'ror', 'latitude', 'longitude'])
of = of.rename(columns={'key': 'wikidata'})
dfStyler = of.style.set_properties(**{'text-align': 'left'})
dfStyler.set_table_styles([dict(selector='th', props=[('text-align', 'left')])])
We can choose to use the langitude and latitide data to visualise the affiliation on a world map. Note that the data points can't be empty when mapping.
# map affiliations on a world map, center around home institution (Curtin University, for now done manually)
m = folium.Map(tiles='cartodbpositron', location=[of.loc[[4]].latitude, of.loc[[4]].longitude], zoom_start=3)
#Adding markers to the map
for index, row in of.iterrows():
if type(row['latitude']) is str:
if type(row['longitude']) is str:
folium.CircleMarker(location=[row['latitude'], row['longitude']],popup=row['name'], fill=True,
color="#8248C6", radius=2).add_to(m)
m
We can also visualise the researcher-affiliation relationships as a graph.
# Generate a graph from the co-authors and their affiliations
G = nx.Graph()
# add researchers as graph nodes
for index, row in rf.iterrows():
G.add_node(row['orcid'], name=row['full_name'], node_color='#54C48C', type='researcher')
# add organisations as graph nodes
for index, row in of.iterrows():
G.add_node(row['wikidata'], name=row['name'], node_color='#8248C6', type='organisation')
# Convert from and to for researcher relationships into ORCID IDs (to map the node labels)
def force_pid(n):
n['from'] = n['from'].split('/')[-1]
n['to'] = n['to'].split('/')[-1]
return n
# get co-author relationship with requested researcher
json = map(force_pid, r.json()[0]['relationships']['researcher-researcher'])
ef = pd.DataFrame(json, columns=['from', 'to'])
# get affiliation relationship for researchers
json = map(force_pid, r.json()[0]['relationships']['researcher-organisation'])
eo = pd.DataFrame(json, columns=['from', 'to'])
# add relationships as graph edges
G.add_edges_from(ef.to_numpy())
G.add_edges_from(eo.to_numpy())
# Compute positions for viz.
pos = nx.spring_layout(G)
options = {
"font_size": 12,
"node_size": 50,
"edge_color": "lightgray",
"linewidths": 0.1,
"width": 1
}
# Show information about the graph
print(nx.info(G))
print("Network density:", nx.density(G))
# export graph to a gephi file
nx.write_gexf(G, "affiliationss.gexf")
# Draw the graph using altair
viz = nxa.draw_networkx(G, pos=pos, node_tooltip='name', node_color='node_color', **options).properties(width=800, height=800)
viz.interactive()