Lesson 7. Crossref - Data Visualization

Crossref is a nonprofit organization that manages a registry of Digital Object Identifiers (DOIs). Publishers collaborate with Crossref to assign a unique DOI to each journal article, book, conference paper, or dataset they publish. This DOI acts like a permanent web address, enabling seamless linking between references, citations, research outputs, funding information, and more.

The Crossref REST API offers free access to the nonprofit’s metadata. This tutorial introduces two useful tools: JSON, a simple data format that resembles Python dictionaries and is easy to read and use, and Python’s built-in logging module.

Data skills | concepts¶

APIs
logging
JSON data

Learning objectives¶

Interpret documentation and apply concepts to write functional code.
Extract and work with JSON data using Python’s built-in tools.
Use Python’s logging module to capture and report errors that interrupt code execution.

This tutorial is designed to support multi-session workshops hosted by The Ohio State University Libraries Research Commons. It assumes you already have a basic understanding of Python, including how to iterate through lists and dictionaries to extract data using a for loop. To learn basic Python concepts visit the Python - Mastering the Basics tutorial.

LESSON 7¶

Crossref¶

Crossref provides detailed documentation and a wide range of robust learning resources to help users effectively work with its REST API.

JSON¶

Crossref queries return data in JSON format, which is easy to read and looks similar to Python dictionaries. You can work with JSON data by looping through its key-value pairs to access the information you need.

Solution

import requests
import pandas as pd

def lookup(target_doi):
    base_url='https://api.crossref.org/works/'
    url=base_url+target_doi
    response=requests.get(url)
    response.raise_for_status() #Raise an HTTP Error for bad responses
    json_data = response.json() #Parse JSON response
    return json_data

file=pd.read_csv('C:/Users/murphy.465/Documents/GitHub/data_visualization/data/dois.csv')
dois=file.doi.tolist()
results=pd.DataFrame(columns=['doi','publisher','article_title','journal_title','year','reference_count'])

for doi in dois:
    data={}
    response=lookup(doi)
    entry=response['message']
    data['doi']=doi
    data['publisher']=entry['publisher']
    data['article_title']=entry['title'][0]
    data['journal_title']=entry['container-title'][0]
    data['year']=entry['published']['date-parts'][0][0]
    data['reference_count']=entry['reference-count']
    row=pd.DataFrame(data, index=[0])
    results=pd.concat([row,results], axis=0, ignore_index=True)

Logging¶

APIs sometimes return error codes which interrupt our program’s execution. Logging tells Python how to handle these errors. It can also help to identify issues with your code.

Solution

import requests
import pandas as pd
import logging
import time

#  Configure logging
formatstring="%(asctime)s - %(levelname)s - %(message)s"
datestring="%m/%d/%Y %I%M%S %p"
logging.basicConfig(filename="cr_errors_find_dois.log", level=logging.ERROR, format=formatstring, datefmt=datestring)

# Define function to request url and log HTTP errors
def lookup(target_doi):
    try:
        base_url='https://api.crossref.org/works/'
        url=base_url+target_doi
        response=requests.get(url)
        response.raise_for_status() #Raise an HTTP Error for bad responses
        json_data = response.json() #Parse JSON response
        return json_data
    except requests.exceptions.HTTPError as http_err:
        logging.error(f"HTTP Error = {http_err}") # Log the HTTP error
        time.sleep(10)
    except Exception as err:
        logging.error(f"Other error = {err}") #Log any other errors
        time.sleep(10)
        
file=pd.read_csv('C:/Users/murphy.465/Documents/GitHub/data_visualization/data/dois.csv')
dois=file.doi.tolist()
results=pd.DataFrame(columns=['doi','publisher','article_title','journal_title','year','reference_count'])

for doi in dois[0:2]:
    data={}
    response=lookup(doi)
    entry=response['message']
    data['doi']=doi
    data['publisher']=entry['publisher']
    data['article_title']=entry['title'][0]
    data['journal_title']=entry['container-title'][0]
    data['year']=entry['published']['date-parts'][0][0]
    data['reference_count']=entry['reference-count']
    row=pd.DataFrame(data, index=[0])
    results=pd.concat([row,results], axis=0, ignore_index=True)

Websites and APIs

Lesson 6. OhioLINK ETD

Websites and APIs

Lesson 8. Scopus