Query Researchers and Co-author Relationships by ORCID¶
This notebook demonstrates how to query for researchers' data by passing an ORCID id to the Augment API. There are a lot of ways to process and get insights from the data. Here we give an example of plotting relationship data in a network graph.
Related Notebooks:
- Publications Notebook
Extract a publications list for a researcher in Bibtex Format. Visualise publication counts with a bar plot and generate a keyword word-cloud. - DOI Notebook
Query publications data by passing a DOI to the API. - Affiliations Notebook
Query researchers and affiliations by passing an ORCID to the API. Extract the geolocation data and map affiliations data on a world map. Plot researcher-organisation relationships in a graph.
import sys
sys.path.append('../')
# packages to read API_KEY
import os
from os.path import join, dirname
from dotenv import load_dotenv
load_dotenv();
# Packages to use API
import requests
import json
# Packages for data manipulation
import pandas as pd
from datetime import datetime, date
# Packages for chart plotting
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# Packages for graph plotting
import ast
import altair as alt
import networkx as nx
import nx_altair as nxa
API Errors¶
When using the API, we load API_KEY and ORCID ID you want to search into variables and add them in the url string. Later the python request package will pass those values to the API and get the data you want. This section shows the 2 types of common errors you might get when using augment API. Either the ORCID id passed is invalid or the API_KEY is not load successfully from you environment file.
ORCID ID Not Found¶
Here we assign an invalid value to the ORCID variable. When error occurs, the request.get( ) will be an object with the status code indicating what type error it is with an error message for explanation.
# pass an invalid ORCID
API_KEY = os.environ.get("API_KEY")
ORCID = "0000-0003-XXXX-XXXX"
url = f'https://augmentapi.researchgraph.com/v1/orcid/{ORCID}?subscription-key={API_KEY}'
r = requests.get(url)
# print a short confirmation on completion
print('Augment API query complete ', r.status_code)
if r.status_code == 400:
print(r.json()[0]["error"])
Missing API_KEY¶
You will receive an authentication error if the API KEY is invalid.
# Missing API_KEY
API_KEY = ''
ORCID = "0000-0002-0068-716X"
url = f'https://augmentapi.researchgraph.com/v1/orcid/{ORCID}?subscription-key={API_KEY}'
r = requests.get(url)
# print a short confirmation on completion
print('Augment API query complete ', r.status_code)
if r.status_code == 401:
print(f'Authentication error.',r.json()['message'])
Data Extraction for Valid ORCID ID¶
For valid ORCID records retrieved, it is a nested dictionary structure with all data that is connected to the ORCID requested. First level has 3 keys as shown in the block below.
# ORCID ID does exist
API_KEY = os.environ.get("API_KEY")
ORCID = "0000-0002-0068-716X"
url = f'https://augmentapi.researchgraph.com/v1/orcid/{ORCID}?subscription-key={API_KEY}'
r = requests.get(url)
# print a short confirmation on completion
print('Augment API query complete ', r.status_code)
# Shows data
print('The data returned has below fields: ',r.json()[0].keys())
In 'nodes', data is stored in 5 labels from the ResearchGraph schema:
r.json()[0]["nodes"].keys()
Each label above is stored as a list of dictionaries. To extract the researcher we need, iterate through the list and check for the ORCID.
# Extract Researcher information
if r.status_code == 200 and r.json()[0]["nodes"]["researchers"]:
researchers = r.json()[0]["nodes"]["researchers"]
researcher = None
for i in range(len(researchers)):
if researchers[i]["orcid"] == ORCID:
researcher = researchers[i]
print()
print(f'ORCID: {researcher["orcid"]}')
print(f'First name: {researcher["first_name"]}')
print(f'Last name: {researcher["last_name"]}')
print()
print(f'The researcher {researcher["full_name"]} is connected to {r.json()[0]["stats"]}.')
List of co-authors¶
The researchers in the list has the ORCID we queried and other researchers connected to it. Note this only includes researchers who has an ORCID.
rf = pd.DataFrame(r.json()[0]["nodes"]["researchers"], columns=['first_name', 'last_name', 'full_name', 'orcid'])
dfStyler = rf.style.set_properties(**{'text-align': 'left'})
dfStyler.set_table_styles([dict(selector='th', props=[('text-align', 'left')])])
Co-author Relationship¶
Now we can visualise co-authors relationships with our target researcher by extracting data in relationships list. However, the relationship keys needs some formatting to get ORCID. Some relationship examples are shown below.
r.json()[0]['relationships']['researcher-researcher'][:5]
# Format keys from relationship list to ORCID IDs (to map the node labels)
def force_orcid(n):
n['from'] = n['from'].split('/')[-1]
n['to'] = n['to'].split('/')[-1]
return n
Note the requested researcher is the 'from' node. These are the connections between the requested researcher and other researchers, we now add these connections to the graph.
# Generate a graph from the co-authors
G = nx.Graph()
# add co-author researcher nodes to the graph
for index, row in rf.iterrows():
G.add_node(row['orcid'], name=row['full_name'], color='#54C48C')
# format the relationship data
json = map(force_orcid, r.json()[0]['relationships']['researcher-researcher'])
ef = pd.DataFrame(json, columns=['from', 'to'])
# add them into graph as edges
G.add_edges_from(ef.to_numpy())
Show current graph:
# Compute positions
pos = nx.spring_layout(G)
options = {
"font_size": 12,
"node_size": 50,
"edge_color": "lightgray",
"node_color": "#54C48C",
"linewidths": 0.1,
"width": 1
}
# Show information about the graph
print(nx.info(G))
print("Network density:", nx.density(G))
# Disable maximum row check for big dataset
alt.data_transformers.disable_max_rows()
# Draw the graph using altair
viz = nxa.draw_networkx(G, pos=pos, node_tooltip='name', **options).properties(width=800, height=800)
viz.interactive()
Now we have a connection graph where the requested researcher is the center. But what about the connections between co-authors? If we want to learn more about the connections, we can repeat the process above for the list of co-authors and get researcher relationships for each of them.
API_KEY = os.environ.get("API_KEY")
# Fetch relationships between all co-authors
# This may take a while depending on the number of requests
for a in rf['orcid']:
url = f'https://augmentapi.researchgraph.com/v1/orcid/{a}?subscription-key={API_KEY}'
r = requests.get(url)
# print a short confirmation on completion
print('Augment API query complete ', r.status_code)
json = map(force_orcid, r.json()[0]['relationships']['researcher-researcher'])
ef = pd.DataFrame(json, columns=['from', 'to'])
# filter the relationship by start node in co-auther list
ef = ef[ef['from'].isin(rf['orcid'].to_list())]
# add them into graph as edges
G.add_edges_from(ef.to_numpy())
Finally we show the graph and store it as gexf file.
# Compute positions
pos = nx.spring_layout(G)
options = {
"font_size": 12,
"node_size": 50,
"edge_color": "lightgray",
"node_color": "#54C48C",
"linewidths": 0.1,
"width": 1
}
# Show information about the graph
print(nx.info(G))
print("Network density:", nx.density(G))
# Disable maximum row check for big dataset
alt.data_transformers.disable_max_rows()
# export graph to a gephi file
nx.write_gexf(G, "co-authors.gexf")
# Draw the graph using altair
viz = nxa.draw_networkx(G, pos=pos, node_tooltip='name', **options).properties(width=800, height=800)
viz.interactive()