Crossref API in Python#
By Avery Fernandez, Vincent F. Scalfani, and Michael T. Moen
The Crossref API provides metadata about publications, including articles, books, and conference proceedings. This metadata spans items such as author details, journal details, references, and DOIs (Digital Object Identifiers). Working with Crossref allows for programmatic access to bibliographic information and can streamline large-scale metadata retrieval.
Please see the following resources for more information on API usage:
Documentation
Terms
Data Reuse
NOTE: The Crossref API limits requests to a maximum of 50 per second.
These recipe examples were tested on January 28, 2026.
Note: From our testing, we have found that the Crossref metadata across publishers and even journals can vary considerably. As a result, it can be easier to work with one journal at a time when using the Crossref API (particularly when trying to extract selected data from records).
Setup#
The following external libraries need to be installed into your environment to run the code examples in this tutorial:
We import the libraries used in this tutorial below:
from dotenv import load_dotenv
import os
from pprint import pprint
import requests
from time import sleep
Import Email#
It is important to provide an email address when making requests to the Crossref API. This is used to contact you in case of any issues with your requests.
We keep our email in a separate file, a .env file, and use the dotenv library to access it. If you use this method, create a file named .env in the same directory as this notebook and add the following line to it:
CROSSREF_EMAIL=PUT_YOUR_EMAIL_HERE
load_dotenv()
try:
email = os.environ['CROSSREF_EMAIL']
except KeyError:
print("Email not found in environment. Please set CROSSREF_EMAIL in your .env file.")
else:
print("Environment and email successfully loaded.")
Environment and email successfully loaded.
1. Basic Crossref API Call#
In this section, we perform a basic API call to the Crossref service to retrieve metadata for a single DOI.
We will:
Build the Crossref endpoint using our base URL, DOI, and the
mailtoparameter.Retrieve the response.
Examine and parse the JSON data.
# Base URL for Crossref works
WORKS_URL = "https://api.crossref.org/works/"
# Example DOI to retrieve metadata for
doi = "10.1186/1758-2946-4-12"
response = requests.get(f"{WORKS_URL}{doi}?mailto={email}")
# Status code 200 indicates success
response.status_code
200
This calls the Crossref API to retrieve metadata for a single DOI, but the data is in a JSON format. We can extract the information we need from the call using .json().
data = response.json()
# Print response structure
pprint(data, depth=1)
{'message': {...},
'message-type': 'work',
'message-version': '1.0.0',
'status': 'ok'}
Extract Data from API Response#
In the snippet below, we parse and extract some key fields from the response:
Journal title via the
container-titlekey.Article title via the
titlekey.Author names via the
authorkey.Bibliographic references via the
referencekey.
# Extract journal title
data["message"]["container-title"]
['Journal of Cheminformatics']
# Extract article title
data["message"]["title"]
['The Molecule Cloud - compact visualization of large collections of molecules']
# Extract author names
for author in data["message"]["author"]:
print(f"{author["given"]} {author["family"]}")
Peter Ertl
Bernhard Rohde
# Extract the first 75 characters of each reference for demonstration
bib_refs = [ref["unstructured"][:75] for ref in data["message"]["reference"]]
bib_refs
['Martin E, Ertl P, Hunt P, Duca J, Lewis R: Gazing into the crystal ball; th',
'Langdon SR, Brown N, Blagg J: Scaffold diversity of exemplified medicinal c',
'Blum LC, Reymond J-C: 970 Million druglike small molecules for virtual scre',
'Dubois J, Bourg S, Vrain C, Morin-Allory L: Collections of compounds - how ',
'Medina-Franco JL, Martinez-Mayorga K, Giulianotti MA, Houghten RA, Pinilla ',
'Schuffenhauer A, Ertl P, Roggo S, Wetzel S, Koch MA, Waldmann H: The Scaffo',
'Langdon S, Ertl P, Brown N: Bioisosteric replacement and scaffold hopping i',
'Lipkus AH, Yuan Q, Lucas KA, Funk SA, Bartelt WF, Schenck RJ, Trippe AJ: St',
'mib 2010.10, Molinspiration Cheminformatics: \n http://ww',
'Bernhard R: Avalon Cheminformatics Toolkit. \n http://sou',
'Wang Y, Bolton E, Dracheva S, Karapetyan K, Shoemaker BA, Suzek TO, Wang J,',
'Irwin JJ, Shoichet BK: ZINC\u2009−\u2009a free database of commercially available com',
'Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, Mc',
'Welsch ME, Snyder SA, Stockwell BR: Privileged scaffolds for library design',
'Ertl P: Cheminformatics analysis of organic substituents: Identification of',
'TagCrowd: \n http://tagcrowd.com']
2. Crossref API Call with a Loop#
In this section, we want to request metadata from multiple DOIs at once. We will:
Create a list of several DOIs.
Loop through that list, calling the Crossref API for each DOI.
Store each response in a new list.
Parse specific data, such as article titles and affiliations.
Note: We include a one-second sleep (
time.sleep(1)) between requests to respect Crossref’s policies. Crossref has usage guidelines that discourage extremely rapid repeated requests. Please also check out Crossref’s public data file for bulk downloads.
dois = [
'10.1021/acsomega.1c03250',
'10.1021/acsomega.1c05512',
'10.1021/acsomega.8b01647',
'10.1021/acsomega.1c04287',
'10.1021/acsomega.8b01834'
]
# Loop over each DOI, request metadata, and store the data
doi_metadata = []
for doi in dois:
response = requests.get(f"{WORKS_URL}{doi}?mailto={email}")
data = response.json()
doi_metadata.append(data)
sleep(1) # Add a short delay to avoid overwhelming the API
# Extract article titles
titles = [article["message"]["title"] for article in doi_metadata]
titles
[['Navigating into the Chemical Space of Monoamine Oxidase Inhibitors by Artificial Intelligence and Cheminformatics Approach'],
['Impact of Artificial Intelligence on Compound Discovery, Design, and Synthesis'],
['How Precise Are Our Quantitative Structure–Activity Relationship Derived Predictions for New Query Chemicals?'],
['Applying Neuromorphic Computing Simulation in Band Gap Prediction and Chemical Reaction Classification'],
['QSPR Modeling of the Refractive Index for Diverse Polymers Using 2D Descriptors']]
# Extract author affiliations for each article
for idx, entry in enumerate(doi_metadata):
authors = entry.get("message", {}).get("author", [])
print(f"DOI {idx + 1}:")
for author in authors:
# Some authors may not have an affiliation key, so we use get with a default
affiliation_list = author.get("affiliation", [])
if affiliation_list:
print(f" - {affiliation_list[0].get("name", "No affiliation name")}")
else:
print(" - No affiliation provided")
print()
DOI 1:
- Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
- Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
- Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
- Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
- Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
- Department of Pharmaceutics and Industrial Pharmacy, College of Pharmacy, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
- Department of Pharmaceutical Chemistry, College of Pharmacy, Jouf University, Sakaka, Al Jouf 72341, Saudi Arabia
- Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
DOI 2:
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
DOI 3:
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India
- Interdisciplinary Center for Nanotoxicity, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, Jackson, Mississippi 39217, United States
DOI 4:
- Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, Ohio 43210, United States
- Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, Ohio 43210, United States
- Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, Ohio 43210, United States
- Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, Ohio 43210, United States
DOI 5:
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Educational and Research (NIPER), Chunilal Bhawan, 168, Manikata Main Road, 700054 Kolkata, India
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58108-6050, United States
- Drug Theoretics and Cheminformatics Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, 700032 Kolkata, India
3. Retrieve Journal Information#
Crossref also provides an endpoint to query journal metadata using the ISSN. In this section, we:
Use the
journalsendpoint.Provide an ISSN.
Inspect the returned JSON data.
# Base URL for journal queries
JOURNALS_URL = "https://api.crossref.org/journals/"
# Example ISSN for the journal BMC Bioinformatics
issn = "1471-2105"
response = requests.get(f"{JOURNALS_URL}{issn}?mailto={email}")
data = response.json()
# Print structure of the response message
pprint(data["message"], depth=1)
{'ISSN': [...],
'breakdowns': {...},
'counts': {...},
'coverage': {...},
'coverage-type': {...},
'flags': {...},
'issn-type': [...],
'last-status-check-time': 1759277629723,
'publisher': 'Springer (Biomed Central Ltd.)',
'subjects': [],
'title': 'BMC Bioinformatics'}
# Extract total number of articles from the journal in Crossref
data["message"]["counts"]["total-dois"]
12831
# Extract percentage of articles from the journal with abstracts in Crossref
data["message"]["coverage"]["abstracts-current"]
0.787422497785651
4. Get Article DOIs for a Journal#
We can get all article DOIs for a given journal and year range by combining the journals endpoint with filters.
For example, to retrieve all DOIs for BMC Bioinformatics published in 2014, we filter between the start date (from-pub-date) and end date (until-pub-date) of 2014.
Note: By default, the API only returns the first 20 results. We can specify
rowsto increase this up to 1000. If the total number of results is greater than 1000, we can use theoffsetparameter to page through the results in multiple calls.
Below, we demonstrate:
Filtering to get only DOIs from 2014.
Increasing the
rowsto 700.Pushing beyond the 1000-row limit by using
offset.
Retrieve and Display First 20 DOIs#
params = {
"filter": "from-pub-date:2014,until-pub-date:2014",
"select": "DOI",
"mailto": email
}
response = requests.get(f"{JOURNALS_URL}{issn}/works", params=params)
doi_data_2014 = response.json()
# Print DOIs from the response
doi_data_2014["message"]["items"]
[{'DOI': '10.1186/1471-2105-15-38'},
{'DOI': '10.1186/1471-2105-15-s10-p35'},
{'DOI': '10.1186/1471-2105-15-s10-p24'},
{'DOI': '10.1186/1471-2105-15-122'},
{'DOI': '10.1186/1471-2105-15-24'},
{'DOI': '10.1186/s12859-014-0397-8'},
{'DOI': '10.1186/1471-2105-15-16'},
{'DOI': '10.1186/s12859-014-0411-1'},
{'DOI': '10.1186/1471-2105-15-268'},
{'DOI': '10.1186/1471-2105-15-119'},
{'DOI': '10.1186/1471-2105-15-s6-s3'},
{'DOI': '10.1186/1471-2105-15-310'},
{'DOI': '10.1186/1471-2105-15-335'},
{'DOI': '10.1186/1471-2105-15-222'},
{'DOI': '10.1186/1471-2105-15-337'},
{'DOI': '10.1186/1471-2105-15-95'},
{'DOI': '10.1186/1471-2105-15-s9-s12'},
{'DOI': '10.1186/1471-2105-15-254'},
{'DOI': '10.1186/1471-2105-15-152'},
{'DOI': '10.1186/1471-2105-15-333'}]
Increase Rows to Retrieve More Than 20 DOIs#
# Add the rows parameter to increase the number of results
params = {
"filter": "from-pub-date:2014,until-pub-date:2014",
"select": "DOI",
"rows": 700,
"mailto": email,
}
response = requests.get(f"{JOURNALS_URL}{issn}/works", params=params)
response.raise_for_status()
doi_data_all = response.json()
# Extract the DOIs from the response
dois_list = []
for item in doi_data_all["message"]["items"]:
dois_list.append(item.get("DOI", "NoDOI"))
print("Number of DOIs retrieved:", len(dois_list))
print("First 20 DOIs:")
pprint(dois_list[:20])
Number of DOIs retrieved: 619
First 20 DOIs:
['10.1186/1471-2105-15-38',
'10.1186/1471-2105-15-s10-p35',
'10.1186/1471-2105-15-s10-p24',
'10.1186/1471-2105-15-122',
'10.1186/1471-2105-15-24',
'10.1186/s12859-014-0397-8',
'10.1186/1471-2105-15-16',
'10.1186/s12859-014-0411-1',
'10.1186/1471-2105-15-268',
'10.1186/1471-2105-15-119',
'10.1186/1471-2105-15-s6-s3',
'10.1186/s12859-014-0376-0',
'10.1186/1471-2105-15-310',
'10.1186/1471-2105-15-335',
'10.1186/1471-2105-15-192',
'10.1186/1471-2105-15-95',
'10.1186/1471-2105-15-s9-s12',
'10.1186/1471-2105-15-254',
'10.1186/1471-2105-15-152',
'10.1186/1471-2105-15-333']
Paged Retrieval with Offsets#
If we need more than 1000 records, we can combine rows=1000 with the offset parameter. We:
Determine the total number of results (
total-results).Calculate how many loops we need based on 1000 items per page.
For each page, we adjust the
offsetby1000 * n.Collect all DOIs into one large list.
# First, get total number of results to see if we exceed 1000
params = {
"filter": "from-pub-date:2014,until-pub-date:2016",
"select": "DOI",
"mailto": email,
"rows": 1000
}
response = requests.get(f"{JOURNALS_URL}{issn}/works", params=params)
initial_data = response.json()
num_results = initial_data["message"].get("total-results", 0)
print("Total results for 2014-2016:", num_results)
Total results for 2014-2016: 1772
# Page through results if more than 1000
journal_dois = []
# Calculate how many pages we need
pages_needed = (num_results // 1000) + 1 # integer division, then add 1 for remainder
for n in range(pages_needed):
# Build URL using offset
params = {
"filter": "from-pub-date:2014,until-pub-date:2016",
"select": "DOI",
"rows": 1000,
"mailto": email,
"offset": 1000 * n
}
response = requests.get(f"{JOURNALS_URL}{issn}/works", params=params)
response.raise_for_status()
page_data = response.json()
items = page_data["message"]["items"]
for record in items:
journal_dois.append(record.get("DOI", "NoDOI"))
sleep(1) # Important to respect Crossref usage guidelines
# Print number of DOIs extracted
len(journal_dois)
1772
# Sample DOIs from 1000-1010
journal_dois[1000:1010]
['10.1186/1471-2105-15-116',
'10.1186/s12859-016-1178-3',
'10.1186/1471-2105-15-s12-s9',
'10.1186/1471-2105-15-316',
'10.1186/s12859-016-1233-0',
'10.1186/s12859-015-0656-3',
'10.1186/s12859-016-1327-8',
'10.1186/s12859-016-1039-0',
'10.1186/s12859-016-1035-4',
'10.1186/s12859-015-0646-5']