Lesson 8. Scopus - Data Visualization

Elsevier provides API access to its Scopus database to academic researchers. This allows researchers to programmatically retrieve metadata about publications, authors, institutions, and more.

This tutorial introduces pybliometrics an API wrapper designed to simplify retrieving data from Scopus’s multiple API access points. An API wrapper is a python library or module that handles requests, authentication, parsing, and more.

Pybliometrics will help you to ...

Construct URLs for Scopus API calls
Store and handle API keys and institutional tokens
Parse JSON responses
Handle rate limits and errors

Data skills | concepts¶

API keys
API wrappers

Learning objectives¶

Install and use an API wrapper to authenticate, request, parse, and store data.
Interpret documentation and apply concepts to write functional code.

This tutorial is designed to support multi-session workshops hosted by The Ohio State University Libraries Research Commons. It assumes you already have a basic understanding of Python, including how to iterate through lists and dictionaries to extract data using a for loop. To learn basic Python concepts visit the Python - Mastering the Basics tutorial.

LESSON 8¶

Getting started¶

To use the Scopus APIs, researchers must first request an API key via the Elsevier’s Developers Portal and agree to comply with Elsevier’s API usage policies.

When requesting an API key, be ready to provide a few details:

Your use case - What you’re planning to do with the data?
The type of Scopus metadata you want to access - like publications, author or institutional profiles, or citations.
How much data you expect to retrieve - for example, “around 3,500 records.”
What your final product will be – such as a research paper, website, or something else.

It’s a good idea to read through the Getting Started guide for Scopus APIs before submitting your request. It will help you understand how the API works and what to expect. Also, be aware if you usually work off campus, you may need to request an institutional token and answer a few additional questions. Otherwise, you will need to use your API key while connected to the university’s network.

Once you have your API key and institutional token, if needed, install the stable version of pybliometrics from PyPI:

pip install pybliometrics

The first time you use pybliometrics, you will be prompted to input your API key and institutional token. These will be saved in ~/.config/pybliometrics.cfg.

import pybliometrics

ScopusSearch¶

ScopusSearch is one of the 11 API interfaces available to interact with Elsevier’s Scopus database through the pybliometrics library. The ScopusSearch class in pybliometrics allows you to ...

Query Scopus using all fields available in Scopus advanced search except “INDEXTERMS()” and “LIMIT-TO()”.
Filter results
Retrieve metadata

The search returns a list of named tuples that can be converted into a DataFrame with pandas for futher analysis or export to CSV.

Step 1. Construct query¶

To get started with the the ScopusSearch class in pybliometrics, we will begin by searching for publications that were:

Funded by the National Science Foundation (NSF)
Authored by researchers affiliated with The Ohio State University
Published between 2000 and 2001.

#identify libraries needed for project
from pybliometrics.scopus import ScopusSearch
import pandas as pd
import time

#initializes the class
pybliometrics.scopus.init() 

#query
q='(FUND-SPONSOR ( "National Science Foundation") AND AFFIL ("Ohio State University")) AND PUBYEAR > 2020 AND PUBYEAR < 2022' 

#search (creates an object)
s=ScopusSearch(q, verbose=True) #setting verbose to True turns on a progress bar for search

Step 2. Retrieve and store results¶

s.results retrieves the list of named tuples. Each item in s.results is an object with attributes.

article_title=s.results[0].title looks at the first tuple in the list, finds the attribute title and assigns the attribute to the variable article_title.

journal_title=s.results[5].publicationName looks at the sixth tuple in the list, finds the attribute publicationName and assigns the attribute to the variable journal_title.

You can loop through the list of tuples or use list indexing to pull specific attributes out of s.results or you can immediately create a DataFrame to filter, analyze, and store your search results.

results=pd.DataFrame(s.results)

#examine DataFrame shape
print(results.shape)

#examine column names
print(results.columns)

#export results to csv file
results.to_csv('results.csv', encoding='utf-8')

AuthorSearch¶

Learning to read and interpret documentation is an essential skill for anyone working with data. Good documentation can:

Uncover powerful or lesser-known features that can enhance your project.
Introduce optional parameters that help you fine-tune your queries—for example, setting enncoding=utf8, specifying column headers, or filtering results.
Define error messages and guide you through troubleshooting when things don’t work as expected.
Save your time by offering accurate, up-to-date information—often more reliable that what you’ll find in scattered or outdated online forums.

Solution

from pybliometrics.scopus import AuthorSearch
unique_author_ids=[]
author_ids=results.author_ids.tolist()

for each_list in author_ids:
    individual_ids=each_list.split(';')
    for each_id in individual_ids:
        if each_id not in unique_author_ids:
            search_string='AU-ID('+str(each_id)+')' + ' OR '
            unique_author_ids.append(search_string)


#query first 10 unique author ids
unique_author_ids=unique_author_ids[0:10]

#construct query
query=''.join(unique_author_ids).rstrip(' OR').strip()

#search
s_author=AuthorSearch(query, verbose=True)

#insert results into DataFrame
results_authors=pd.DataFrame(s_author.authors)

#select columns
results_authors=results_authors[['surname','initials','givenname','affiliation','city','country']]

#export results to csv file
results_authors.to_csv('results_authors.csv', encoding='utf-8')