Scopus API in Python#
By Vincent F. Scalfani and Avery Fernandez
The Scopus API, provided by Elsevier, offers programmatic access to a comprehensive database of abstracts and citations from peer-reviewed literature. It supports advanced search capabilities, author and affiliation retrieval, and citation analysis, facilitating a wide range of academic and research applications.
This tutorial content is intended to help facilitate academic research.
Please see the following resources for more information on API usage:
Documentation
Terms
Data Reuse
Scopus Platform
NOTE: The Scopus API limits requests to a maximum of 2 per second.
These recipe examples were tested on May 7, 2025.
Setup#
Import Libraries#
The following external libraries need to be installed into your enviornment to run the code examples in this tutorial:
We import the libraries used in this tutorial below:
import requests
from time import sleep
from pprint import pprint
from dotenv import load_dotenv
import os
import pandas as pd
Import API Key#
An API key is required to access the Scopus API. You can sign up for one at the Scopus Developer Portal.
We keep our API key in a separate file, a .env
file, and use the dotenv
library to access it. If you use this method, create a file named .env
in the same directory as this notebook and add the following line to it:
SCOPUS_API_KEY=PUT_YOUR_API_KEY_HERE
load_dotenv()
try:
API_KEY = os.environ["SCOPUS_API_KEY"]
except KeyError:
print("API key not found. Please set 'SCOPUS_API_KEY' in your .env file.")
else:
print("Environment and API key successfully loaded.")
Environment and API key successfully loaded.
3. Get References via a Title Search#
Number of Title Match Records#
# Search Scopus for all references containing 'ChemSpider' in the record title
params = {
"query": "TITLE(ChemSpider)",
"apiKey": API_KEY,
"httpAccept": "application/json"
}
try:
response = requests.get(BASE_URL, params=params)
response.raise_for_status() # Raise an error for bad responses
data = response.json()
print(data["search-results"]["opensearch:totalResults"])
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
7
# Repeat this in a loop
titleWord_list = ['ChemSpider', 'PubChem', 'ChEMBL', 'Reaxys', 'SciFinder']
# Get number of Scopus records for each title search
num_records_title = []
for titleWord in titleWord_list:
# Set up query parameters
params = {
"query": f"TITLE({titleWord})",
"apiKey": API_KEY,
"httpAccept": "application/json"
}
try:
# Make the API request
response = requests.get(BASE_URL, params=params)
response.raise_for_status() # Raise an error for bad responses
data = response.json()
# Extract the total number of results
numt = data["search-results"]["opensearch:totalResults"]
# Compile saved Scopus data into a list of lists
num_records_title.append([titleWord, numt])
# Delay 1 second between API calls to be nice to Elsevier servers
sleep(1)
except requests.exceptions.RequestException as e:
print(f"An error occurred for {titleWord}: {e}")
num_records_title.append([titleWord, None])
num_records_title
[['ChemSpider', '7'],
['PubChem', '102'],
['ChEMBL', '64'],
['Reaxys', '9'],
['SciFinder', '34']]
Download Title Match Record Data#
# Download records and create a list of selected metadata
titleWord_list = ['ChemSpider', 'PubChem', 'ChEMBL', 'Reaxys', 'SciFinder']
scopus_title_data = []
for titleWord in titleWord_list:
# Set up query parameters
params = {
"query": f"TITLE({titleWord})",
"apiKey": API_KEY,
"httpAccept": "application/json"
}
try:
# Make the API request
response = requests.get(BASE_URL, params=params)
# Delay 1 second between API calls to be nice to Elsevier servers
sleep(1)
# Raise an error for bad responses
response.raise_for_status()
data = response.json()
# Extract the 'entry' data and convert it to a DataFrame
entries = data['search-results'].get('entry', [])
for entry in entries:
# Extract relevant metadata
doi = entry.get('prism:doi', None)
title = entry.get('dc:title', None)
coverDate = entry.get('prism:coverDate', None)
# Append to the list
scopus_title_data.append([titleWord, doi, title, coverDate])
except requests.exceptions.RequestException as e:
print(f"An error occurred for {titleWord}: {e}")
scopus_title_data.append([titleWord, None, None, None])
# Add to DataFrame
scopus_title_data_df = pd.DataFrame(scopus_title_data)
scopus_title_data_df.rename(columns={0:"titleWord",1: "doi",2: "title", 3: "coverDate"},
inplace=True)
scopus_title_data_df
titleWord | doi | title | coverDate | |
---|---|---|---|---|
0 | ChemSpider | 10.1039/c5np90022k | Editorial: ChemSpider-a tool for Natural Produ... | 2015-08-01 |
1 | ChemSpider | 10.1021/bk-2013-1128.ch020 | ChemSpider: How a free community resource of d... | 2013-01-01 |
2 | ChemSpider | 10.1007/s13361-011-0265-y | Identification of "known unknowns" utilizing a... | 2012-01-01 |
3 | ChemSpider | 10.1002/9781118026038.ch22 | Chemspider: A Platform for Crowdsourced Collab... | 2011-05-03 |
4 | ChemSpider | 10.1021/ed100697w | Chemspider: An online chemical information res... | 2010-11-01 |
... | ... | ... | ... | ... |
86 | SciFinder | None | SciFinder not affordable [1] | 2006-03-13 |
87 | SciFinder | 10.1021/ci050481b | SciFinder Scholar 2006: An empirical analysis ... | 2006-01-01 |
88 | SciFinder | 10.2174/1570163054064693 | Exploration tools for drug discovery and beyon... | 2005-06-01 |
89 | SciFinder | 10.1021/ed082p652 | A literature exercise using SciFinder Scholar ... | 2005-01-01 |
90 | SciFinder | 10.1002/asi.10192 | Analysis of SciFinder scholar and web of scien... | 2002-12-01 |
91 rows × 4 columns