Crossref API in Python#

By Avery Fernandez and Vincent F. Scalfani

The Crossref API provides metadata about publications, including articles, books, and conference proceedings. This metadata spans items such as author details, journal details, references, and DOIs (Digital Object Identifiers). Working with Crossref allows for programmatic access to bibliographic information and can streamline large-scale metadata retrieval.

Please see the following resources for more information on API usage:

NOTE: The Crossref API limits requests to a maximum of 50 per second.

These recipe examples were tested on April 18, 2025.

Note: From our testing, we have found that the Crossref metadata across publishers and even journals can vary considerably. As a result, it can be easier to work with one journal at a time when using the Crossref API (particularly when trying to extract selected data from records).

Setup#

The following external libraries need to be installed into your enviornment to run the code examples in this tutorial:

We import the libraries used in this tutorial below:

import json
import requests
from pprint import pprint
from time import sleep
from dotenv import load_dotenv
import os

Import Email#

It is important to provide an email address when making requests to the Crossref API. This is used to contact you in case of any issues with your requests.

We keep our email in a separate file, a .env file, and use the dotenv library to access it. If you use this method, create a file named .env in the same directory as this notebook and add the following line to it:

CROSSREF_EMAIL=PUT_YOUR_EMAIL_HERE
load_dotenv()
try:
    email = os.environ['CROSSREF_EMAIL']
except KeyError:
    print("Email not found in environment. Please set CROSSREF_EMAIL in your .env file.")
else:
    print("Environment and email successfully loaded.")
Environment and email successfully loaded.

1. Basic Crossref API Call#

In this section, we perform a basic API call to the Crossref service to retrieve metadata for a single DOI.

We will:

  1. Build the Crossref endpoint using our base URL, DOI, and the mailto parameter.

  2. Retrieve the response.

  3. Examine and parse the JSON data.

# Base URL for Crossref works
base_url = "https://api.crossref.org/works/" 
# Example DOI to retrieve metadata for
doi = "10.1186/1758-2946-4-12"

try:
    response = requests.get(f"{base_url}{doi}?mailto={email}")
    response.raise_for_status()  # Raises an HTTPError if an unsuccessful status code
except requests.exceptions.RequestException as e:
    print({"error": f"Request failed: {str(e)}"})

This calls the Crossref API to retrieve metadata for a single DOI, but the data is in a JSON format. We can extract the information we need from the call using Python.

try:
    api_data = response.json()
    print(api_data['status'])
except json.JSONDecodeError as e:
    print({"error": f"Failed to decode JSON: {str(e)}"})
    api_data = {}
ok

Select Some Specific Data#

In the snippet below, we parse and extract some key fields from the response:

  1. Journal title via the container-title key.

  2. Article title via the title key.

  3. Author names via the author key.

  4. Bibliographic references via the reference key.

if api_data:
    # Extract Journal title
    try:
        journal_title = api_data["message"].get("container-title", ["Not available"])
        print("Journal Title:", journal_title)
    except KeyError:
        print("Error: 'container-title' not found in response.")

    # Extract Article title
    try:
        article_title = api_data["message"].get("title", ["Not available"])
        print("Article Title:", article_title)
    except KeyError:
        print("Error: 'title' not found in response.")

    # Extract Author Names
    print("\nAuthors:")
    try:
        authors = api_data["message"].get("author", [])
        for au in authors:
            given = au.get("given", "")
            family = au.get("family", "")
            print(f" - {given} {family}")
    except KeyError:
        print("Error: 'author' not found in response.")

    # Extract Bibliography References
    print("\nBibliography References (first 5):")
    bib_refs = []
    try:
        references = api_data["message"].get("reference", [])
        for ref in references:
            bib_refs.append(ref.get("unstructured", ""))
        pprint(bib_refs[:5])
    except KeyError:
        print("Error: 'reference' not found in response.")
Journal Title: ['Journal of Cheminformatics']
Article Title: ['The Molecule Cloud - compact visualization of large collections of molecules']

Authors:
 - Peter Ertl
 - Bernhard Rohde

Bibliography References (first 5):
['Martin E, Ertl P, Hunt P, Duca J, Lewis R: Gazing into the crystal ball; the '
 'future of computer-aided drug design. J Comp-Aided Mol Des. 2011, 26: 77-79.',
 'Langdon SR, Brown N, Blagg J: Scaffold diversity of exemplified medicinal '
 'chemistry space. J Chem Inf Model. 2011, 26: 2174-2185.',
 'Blum LC, Reymond J-C: 970 Million druglike small molecules for virtual '
 'screening in the chemical universe database GDB-13. J Am Chem Soc. 2009, '
 '131: 8732-8733. 10.1021/ja902302h.',
 'Dubois J, Bourg S, Vrain C, Morin-Allory L: Collections of compounds - how '
 'to deal with them?. Cur Comp-Aided Drug Des. 2008, 4: 156-168. '
 '10.2174/157340908785747410.',
 'Medina-Franco JL, Martinez-Mayorga K, Giulianotti MA, Houghten RA, Pinilla '
 'C: Visualization of the chemical space in drug discovery. Cur Comp-Aided '
 'Drug Des. 2008, 4: 322-333. 10.2174/157340908786786010.']

Save and Load JSON Data#

It can be handy to store the response to a file so that you do not need to call the API again for the same metadata. Below, we show how to save the JSON data and load it back from disk.

# Save JSON data to a file
try:
    with open('my_data.json', 'w') as outfile:
        json.dump(api_data, outfile)
    print("Successfully saved JSON data to 'my_data.json'.")
except IOError as e:
    print(f"Error saving to file: {str(e)}")

# Load JSON data from a file
try:
    with open('my_data.json','r') as infile:
        loaded_data = json.load(infile)
    print("Successfully loaded JSON data from 'my_data.json'.")
    # Optionally, verify a field
    pprint(loaded_data.get("message", {}).get("title", "Not found"))
except IOError as e:
    print(f"Error loading from file: {str(e)}")
Successfully saved JSON data to 'my_data.json'.
Successfully loaded JSON data from 'my_data.json'.
['The Molecule Cloud - compact visualization of large collections of molecules']

2. Crossref API Call with a Loop#

In this section, we want to request metadata from multiple DOIs at once. We will:

  1. Create a list of several DOIs.

  2. Loop through that list, calling the Crossref API for each DOI.

  3. Store each response in a new list.

  4. Parse specific data, such as article titles and affiliations.

Note: We include a one-second sleep (time.sleep(1)) between requests to respect Crossref’s policies. Crossref has usage guidelines that discourage extremely rapid repeated requests. Please also check out Crossref’s public data file for bulk downloads.

doi_list = [
    '10.1021/acsomega.1c03250',
    '10.1021/acsomega.1c05512',
    '10.1021/acsomega.8b01647',
    '10.1021/acsomega.1c04287',
    '10.1021/acsomega.8b01834'
]

doi_metadata = []
# Loop over each DOI, request metadata, and store the data
for d in doi_list:
    try:
        response = requests.get(f"{base_url}{d}?mailto={email}")
        response.raise_for_status()
        data = response.json()
        doi_metadata.append(data)
    except requests.exceptions.RequestException as e:
        print({"error": f"Request failed for DOI {d}: {str(e)}"})
    except json.JSONDecodeError as e:
        print({"error": f"Failed to decode JSON for DOI {d}: {str(e)}"})
    # Adding a short delay to avoid overwhelming the API
    sleep(1)

# Extract article titles
print("Article Titles:\n")
for item in doi_metadata:
    title = item.get("message", {}).get("title", ["No Title"])[0]
    print(title)

# Extract author affiliations for each article
print("\nAuthor Affiliations:\n")
for idx, entry in enumerate(doi_metadata):
    authors = entry.get("message", {}).get("author", [])
    print(f"DOI {idx + 1}:")
    for au in authors:
        # Some authors may not have an affiliation key, so we use get with a default
        affiliation_list = au.get("affiliation", [])
        if affiliation_list:
            print(" -", affiliation_list[0].get("name", "No affiliation name"))
        else:
            print(" - No affiliation provided")
    print()
Article Titles:

Navigating into the Chemical Space of Monoamine Oxidase Inhibitors by Artificial Intelligence and Cheminformatics Approach
Impact of Artificial Intelligence on Compound Discovery, Design, and Synthesis
How Precise Are Our Quantitative Structure–Activity Relationship Derived Predictions for New Query Chemicals?
Applying Neuromorphic Computing Simulation in Band Gap Prediction and Chemical Reaction Classification
QSPR Modeling of the Refractive Index for Diverse Polymers Using 2D Descriptors

Author Affiliations:

DOI 1:
 - Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
 - Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
 - Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
 - Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
 - Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
 - Department of Pharmaceutics and Industrial Pharmacy, College of Pharmacy, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
 - Department of Pharmaceutical Chemistry, College of Pharmacy, Jouf University, Sakaka, Al Jouf 72341, Saudi Arabia
 - Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India

DOI 2:
 - Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
 - Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
 - Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany

DOI 3:
 - Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India
 - Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India
 - Interdisciplinary Center for Nanotoxicity, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, Jackson, Mississippi 39217, United States

DOI 4:
 - Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, Ohio 43210, United States
 - Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, Ohio 43210, United States
 - Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, Ohio 43210, United States
 - Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, Ohio 43210, United States

DOI 5:
 - Department of Pharmacoinformatics, National Institute of Pharmaceutical Educational and Research (NIPER), Chunilal Bhawan, 168, Manikata Main Road, 700054 Kolkata, India
 - Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58108-6050, United States
 - Drug Theoretics and Cheminformatics Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, 700032 Kolkata, India

3. Retrieve Journal Information#

Crossref also provides an endpoint to query journal metadata using the ISSN. In this section, we:

  1. Use the journals endpoint.

  2. Provide an ISSN.

  3. Inspect the returned JSON data.

# Base URL for journal queries
jbase_url = "https://api.crossref.org/journals/"
# Example ISSN for the journal BMC Bioinformatics
issn = "1471-2105"

try:
    response = requests.get(f"{jbase_url}{issn}?mailto={email}")
    response.raise_for_status()
    jour_data = response.json()
    print(jour_data['status'])
except requests.exceptions.RequestException as e:
    print({"error": f"Request failed: {str(e)}"})
except json.JSONDecodeError as e:
    print({"error": f"Failed to decode JSON: {str(e)}"})
    jour_data = {}
ok

4. Get Article DOIs for a Journal#

We can get all article DOIs for a given journal and year range by combining the journals endpoint with filters. For example, to retrieve all DOIs for BMC Bioinformatics published in 2014, we filter between the start date (from-pub-date) and end date (until-pub-date) of 2014.

Note: By default, the API only returns the first 20 results. We can specify rows to increase this up to 1000. If the total number of results is greater than 1000, we can use the offset parameter to page through the results in multiple calls.

Below, we demonstrate:

  1. Filtering to get only DOIs from 2014.

  2. Increasing the rows to 700.

  3. Pushing beyond the 1000-row limit by using offset.

Retrieve and Display First 20 DOIs#

try:
    # We will use params to make the query string more readable
    params = {
        "filter": "from-pub-date:2014,until-pub-date:2014",
        "select": "DOI",
        "mailto": email
    }
    response = requests.get(f"{jbase_url}{issn}/works",params=params)
    response.raise_for_status()
    doi_data_2014 = response.json()
    pprint(doi_data_2014)
    print("\nThe default is 20 results.")
except requests.exceptions.RequestException as e:
    print({"error": f"Request failed: {str(e)}"})
except json.JSONDecodeError as e:
    print({"error": f"Failed to decode JSON: {str(e)}"})
    doi_data_2014 = {}
{'message': {'facets': {},
             'items': [{'DOI': '10.1186/1471-2105-15-s10-p32'},
                       {'DOI': '10.1186/1471-2105-15-s6-s3'},
                       {'DOI': '10.1186/1471-2105-15-s16-s13'},
                       {'DOI': '10.1186/s12859-014-0411-1'},
                       {'DOI': '10.1186/1471-2105-15-s10-p24'},
                       {'DOI': '10.1186/1471-2105-15-318'},
                       {'DOI': '10.1186/1471-2105-15-s4-s1'},
                       {'DOI': '10.1186/1471-2105-15-s11-i1'},
                       {'DOI': '10.1186/1471-2105-15-230'},
                       {'DOI': '10.1186/s12859-014-0376-0'},
                       {'DOI': '10.1186/1471-2105-15-192'},
                       {'DOI': '10.1186/1471-2105-15-s14-s1'},
                       {'DOI': '10.1186/1471-2105-15-s10-p33'},
                       {'DOI': '10.1186/1471-2105-15-122'},
                       {'DOI': '10.1186/1471-2105-15-105'},
                       {'DOI': '10.1186/1471-2105-15-s10-p6'},
                       {'DOI': '10.1186/1471-2105-15-101'},
                       {'DOI': '10.1186/1471-2105-15-s10-p35'},
                       {'DOI': '10.1186/1471-2105-15-61'},
                       {'DOI': '10.1186/1471-2105-15-24'}],
             'items-per-page': 20,
             'query': {'search-terms': None, 'start-index': 0},
             'total-results': 619},
 'message-type': 'work-list',
 'message-version': '1.0.0',
 'status': 'ok'}

The default is 20 results.

Increase Rows to Retrieve More Than 20 DOIs#

try:
    # Add the rows parameter to increase the number of results
    params = {
        "filter": "from-pub-date:2014,until-pub-date:2014",
        "select": "DOI",
        "rows": 700,
        "mailto": email,
    }
    response = requests.get(f"{jbase_url}{issn}/works", params=params)
    response.raise_for_status()
    doi_data_all = response.json()
except requests.exceptions.RequestException as e:
    print({"error": f"Request failed: {str(e)}"})
except json.JSONDecodeError as e:
    print({"error": f"Failed to decode JSON: {str(e)}"})
    doi_data_all = {}

# Extract the DOIs from the response
dois_list = []
if "message" in doi_data_all and "items" in doi_data_all["message"]:
    for item in doi_data_all["message"]["items"]:
        dois_list.append(item.get("DOI", "NoDOI"))

print("Number of DOIs retrieved:", len(dois_list))
print("First 20 DOIs:")
pprint(dois_list[:20])
Number of DOIs retrieved: 619
First 20 DOIs:
['10.1186/1471-2105-15-s10-p32',
 '10.1186/1471-2105-15-s6-s3',
 '10.1186/1471-2105-15-s16-s13',
 '10.1186/s12859-014-0411-1',
 '10.1186/1471-2105-15-s10-p24',
 '10.1186/1471-2105-15-318',
 '10.1186/1471-2105-15-s4-s1',
 '10.1186/1471-2105-15-s11-i1',
 '10.1186/1471-2105-15-230',
 '10.1186/s12859-014-0376-0',
 '10.1186/1471-2105-15-192',
 '10.1186/1471-2105-15-s14-s1',
 '10.1186/1471-2105-15-s10-p33',
 '10.1186/1471-2105-15-122',
 '10.1186/1471-2105-15-105',
 '10.1186/1471-2105-15-s10-p6',
 '10.1186/s12859-014-0397-8',
 '10.1186/1471-2105-15-s10-p35',
 '10.1186/1471-2105-15-61',
 '10.1186/1471-2105-15-24']

Paged Retrieval with Offsets#

If we need more than 1000 records, we can combine rows=1000 with the offset parameter. We:

  1. Determine the total number of results (total-results).

  2. Calculate how many loops we need based on 1000 items per page.

  3. For each page, we adjust the offset by 1000 * n.

  4. Collect all DOIs into one large list.

# First, get total number of results to see if we exceed 1000.
try:
    params = {
        "filter": "from-pub-date:2014,until-pub-date:2016",
        "select": "DOI",
        "mailto": email,
        "rows": 1000
    }
    response = requests.get(f"{jbase_url}{issn}/works", params=params)
    response.raise_for_status()
    initial_data = response.json()
except requests.exceptions.RequestException as e:
    print({"error": f"Request failed: {str(e)}"})
except json.JSONDecodeError as e:
    print({"error": f"Failed to decode JSON: {str(e)}"})
    initial_data = {}

num_results = 0
try:
    num_results = initial_data["message"].get("total-results", 0)
except (KeyError, TypeError):
    print("Could not retrieve total-results from the initial response.")
    num_results = 0

print("Total results for 2014-2016:", num_results)

# Page through results if more than 1000
doi_list2 = []
# Calculate how many pages we might need
pages_needed = (num_results // 1000) + 1  # integer division, then add 1 for remainder

for n in range(pages_needed):
    try:
        # Build URL using offset
        params = {
            "filter": "from-pub-date:2014,until-pub-date:2016",
            "select": "DOI",
            "rows": 1000,
            "mailto": email,
            "offset": 1000 * n
        }
        response = requests.get(f"{jbase_url}{issn}/works", params=params)
        response.raise_for_status()
        page_data = response.json()
    except requests.exceptions.RequestException as e:
        print({"error": f"Request failed: {str(e)}"})
        continue    
    # If there's an error or no "message" key, we skip.
    if "message" not in page_data or "items" not in page_data["message"]:
        continue

    items = page_data["message"]["items"]
    for record in items:
        doi_list2.append(record.get("DOI", "NoDOI"))
        
    # Important to respect Crossref usage guidelines
    sleep(1)

print(f"\nTotal DOIs gathered: {len(doi_list2)}")
print("Sample DOIs from 1000-1020:")
pprint(doi_list2[1000:1020])
Total results for 2014-2016: 1772

Total DOIs gathered: 1772
Sample DOIs from 1000-1020:
['10.1186/s12859-016-1224-1',
 '10.1186/s12859-016-1113-7',
 '10.1186/s12859-016-1363-4',
 '10.1186/s12859-015-0861-0',
 '10.1186/s12859-016-1011-z',
 '10.1186/1471-2105-15-77',
 '10.1186/1471-2105-15-322',
 '10.1186/s12859-015-0636-7',
 '10.1186/1471-2105-16-s3-a4',
 '10.1186/1471-2105-15-334',
 '10.1186/s12859-014-0428-5',
 '10.1186/1471-2105-15-114',
 '10.1186/1471-2105-15-332',
 '10.1186/1471-2105-15-237',
 '10.1186/s12859-015-0644-7',
 '10.1186/s12859-016-1120-8',
 '10.1186/s12859-015-0526-z',
 '10.1186/s12859-016-1164-9',
 '10.1186/s12859-016-1012-y',
 '10.1186/1471-2105-15-291']