ScienceDirect API in Python#
By Avery Fernandez and Vincent F. Scalfani
These recipe examples demonstrate how to use Elsevier’s ScienceDirect API to retrieve full-text articles in various formats (XML, text).
This tutorial content is intended to help facilitate academic research. Please check your institution for their Text and Data Mining or related License Agreement with Elsevier.
Please see the following resources for more information on API usage:
Documentation
Terms
Data Reuse
NOTE: See your institution’s rate limit with ScienceDirect API Terms of Use.
If you have copyright or other related text and data mining questions, please contact The University of Alabama Libraries or your respective library/institution.
These recipe examples were tested on May 7, 2025.
Setup#
Import Libraries#
The following external libraries need to be installed into your environment to run the code examples in this tutorial:
We import the libraries used in this tutorial below:
import requests
from time import sleep
from dotenv import load_dotenv
import os
import xml.etree.ElementTree as ET
Import API Key#
An API key is required to access the ScienceDirect API. You can sign up for one at the Elsevier Developer Portal.
We keep our API key in a .env file and use the dotenv library to access it. If you would like to use this method, create a file named .env in the same directory as this notebook and add the following line to it:
SCIENCE_DIRECT_API_KEY=PUT_YOUR_API_KEY_HERE
load_dotenv()
try:
API_KEY = os.environ["SCIENCE_DIRECT_API_KEY"]
except KeyError:
print("API key not found. Please set 'SCIENCE_DIRECT_API_KEY' in your .env file.")
else:
print("Environment and API key successfully loaded.")
Environment and API key successfully loaded.
1. Retrieving the Full-Text of a Single Article#
In this section, we’ll show how to retrieve full-text XML and plain text for a single DOI using the ScienceDirect Article API.
Identifier Note#
We will use DOIs as the article identifiers throughout this tutorial. The Elsevier ScienceDirect Article (Full-Text) API also accepts other identifiers like Scopus IDs and PubMed IDs. For more details on constructing custom DOI lists from other sources, refer to Crossref or Scopus API tutorials.
Retrieve Full-Text XML of an Article#
Steps:
Construct the API endpoint URL.
Make a GET request specifying
text/xmlin thehttpAcceptquery.Save the retrieved data to an XML file.
ELSEVIER_URL = "https://api.elsevier.com/content/article/doi/"
# This article is under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1016/j.mtcata.2025.100092
doi1 = '10.1016/j.mtcata.2025.100092'
try:
fulltext_xml_response = requests.get(
f"{ELSEVIER_URL}{doi1}?APIKey={API_KEY}&httpAccept=text/xml"
)
fulltext_xml_response.raise_for_status()
except requests.exceptions.RequestException as e:
print(f"Error retrieving XML data for DOI {doi1}: {e}")
fulltext_xml_response = None
if fulltext_xml_response:
with open('fulltext1.xml', 'w', encoding='utf-8') as outfile:
outfile.write(fulltext_xml_response.text)
print("XML full text downloaded and saved as fulltext1.xml.")
XML full text downloaded and saved as fulltext1.xml.
Retrieve Plain Text of an Article#
To get the plain text from the article, specify text/plain in the query instead of text/xml. The steps are the same as above, but we will:
Use a different DOI.
Request
text/plain.Save the output to
.txt.
# This article is under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1016/j.mtcata.2025.100092
doi2 = '10.1016/j.mtcata.2025.100092'
try:
fulltext_plain_response = requests.get(
f"{ELSEVIER_URL}{doi2}?APIKey={API_KEY}&httpAccept=text/plain"
)
fulltext_plain_response.raise_for_status()
except requests.exceptions.RequestException as e:
print(f"Error retrieving plain text data for DOI {doi2}: {e}")
fulltext_plain_response = None
if fulltext_plain_response:
with open('fulltext2.txt', 'w', encoding='utf-8') as outfile:
outfile.write(fulltext_plain_response.text)
print("Plain text full text downloaded and saved as fulltext2.txt.")
Plain text full text downloaded and saved as fulltext2.txt.
2. Retrieve Multiple Articles in a Loop#
In many research scenarios, you’ll want to retrieve multiple articles at once. Here, we:
Loop over a list of DOIs.
Request full text (plain or XML).
Save each article to a separate file.
Sleep for 1 second between calls to respect rate limits.
Tip: For large-scale text/data mining, always follow Elsevier’s usage policies, which include rate limits, usage quotas, and more advanced usage guidelines.
dois = [
# License under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1021/acs.oprd.4c00527
'10.1016/j.gresc.2024.11.007',
# License under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1016/j.mtcata.2025.100092
'10.1016/j.ultsonch.2025.107257',
# License under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1016/j.jobcr.2025.02.005
'10.1016/j.ces.2025.121347',
# License under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1016/j.jobcr.2025.02.005
'10.1016/j.gresc.2025.02.001',
# License under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1016/j.jobcr.2025.02.005
'10.1016/j.jobcr.2025.02.005'
]
for i, doi in enumerate(dois, start=1):
print(f"Retrieving article {i} of {len(dois)}: DOI = {doi}")
try:
# Here, we request plain text. For XML, replace 'text/plain' with 'text/xml'.
article_response = requests.get(
f"{ELSEVIER_URL}{doi}?APIKey={API_KEY}&httpAccept=text/plain"
)
article_response.raise_for_status()
except requests.exceptions.RequestException as e:
print(f"Failed to retrieve article {doi}: {e}")
continue
# Replace '/' in the DOI with underscores to form a valid filename
doi_filename = doi.replace('/', '_')
output_filename = f"{doi_filename}_plain_text.txt"
with open(output_filename, 'w', encoding='utf-8') as outfile:
outfile.write(article_response.text)
print(f"Saved full text to {output_filename}.")
# Sleep to avoid hitting rate limits too quickly
sleep(1)
print("Finished retrieving all DOIs.")
Retrieving article 1 of 5: DOI = 10.1016/j.gresc.2024.11.007
Saved full text to 10.1016_j.gresc.2024.11.007_plain_text.txt.
Retrieving article 2 of 5: DOI = 10.1016/j.ultsonch.2025.107257
Saved full text to 10.1016_j.ultsonch.2025.107257_plain_text.txt.
Retrieving article 3 of 5: DOI = 10.1016/j.ces.2025.121347
Saved full text to 10.1016_j.ces.2025.121347_plain_text.txt.
Retrieving article 4 of 5: DOI = 10.1016/j.gresc.2025.02.001
Saved full text to 10.1016_j.gresc.2025.02.001_plain_text.txt.
Retrieving article 5 of 5: DOI = 10.1016/j.jobcr.2025.02.005
Saved full text to 10.1016_j.jobcr.2025.02.005_plain_text.txt.
Finished retrieving all DOIs.
3. Parsing XML#
Many researchers need structured data (e.g., titles, abstracts, authors) from the XML. Below, we:
Send a GET request for XML.
Use Python’s
xml.etree.ElementTreeto parse the XML.Extract desired metadata (title, abstract, authors, open access status, etc.).
Note: The XML structure can vary between journals. For more robust parsing, consider libraries like
lxmlor specialized text-mining pipelines.
Elsevier provides full-text content as XML, you can see an example here in their API documentation and learn more about the schema used:
# This article is under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1016/j.mtcata.2025.100092
# Ambient urea synthesis via electrocatalytic C–N coupling
# Chen Chen
doi_example = '10.1016/j.mtcata.2025.100092'
xml_data = None
try:
# Request the XML content
response = requests.get(
f"{ELSEVIER_URL}{doi_example}?APIKey={API_KEY}&httpAccept=text/xml"
)
response.raise_for_status()
xml_data = response.content
print("XML data retrieved successfully.")
except requests.exceptions.RequestException as e:
print(f"Error retrieving XML data for DOI {doi_example}: {e}")
except Exception as e:
print(f"Unexpected error occurred: {e}")
root = None
if xml_data:
try:
root = ET.fromstring(xml_data)
print("XML data parsed successfully.")
except ET.ParseError as e:
print(f"Error parsing XML data: {e}")
else:
print("No XML data to parse.")
XML data retrieved successfully.
XML data parsed successfully.
Inspecting the XML Root#
Below, we print part of the top-level XML structure to see how data are organized. Often, coredata contains most of the relevant metadata. In order to understand the XML structure, we will print out a bit of the structure of the XML file.
if root is not None: # Check if the XML root element exists
# Get the first child of the root element, if it exists
first_child = list(root)[0] if len(list(root)) > 0 else None
if first_child is not None: # Check if the first child exists
# Print the tag name of the first child
print(f"{first_child.tag}")
# Iterate over the subchildren of the first child
for i, subchild in enumerate(list(first_child)):
# Print the tag name of each subchild
print(f"Subchild tag: {subchild.tag}")
# Stop printing after the first 5 subchildren
if i > 5:
break
else:
# If the root has no children, print a message
print("Root has no children.")
else:
# If the root element does not exist, print a message
print("No root element to inspect.")
{http://www.elsevier.com/xml/svapi/article/dtd}coredata
Subchild tag: {http://prismstandard.org/namespaces/basic/2.0/}url
Subchild tag: {http://purl.org/dc/elements/1.1/}identifier
Subchild tag: {http://www.elsevier.com/xml/svapi/article/dtd}eid
Subchild tag: {http://prismstandard.org/namespaces/basic/2.0/}doi
Subchild tag: {http://www.elsevier.com/xml/svapi/article/dtd}pii
Subchild tag: {http://purl.org/dc/elements/1.1/}title
Subchild tag: {http://prismstandard.org/namespaces/basic/2.0/}publicationName
XML Helper Functions#
These functions allow us to look up elements whose tags end with a specific string (e.g., title, subject, etc.), which is common in Elsevier’s XML structure.
def get_element(parent_element: ET.Element, element_end: str):
"""
Retrieve the first occurrence of an element whose tag ends with `element_end`.
:param parent_element: ET.Element
The parent element to search within.
:param element_end: str
The suffix of the child element's tag to look for.
:return: ET.Element or None
The first matching element, or None if none is found.
"""
if parent_element is not None:
for child in parent_element:
if child.tag.endswith(element_end):
return child
return None
def get_elements(parent_element: ET.Element, element_end: str) -> list:
"""
Retrieve all occurrences of elements whose tag ends with `element_end`.
:param parent_element: ET.Element
The parent element to search within.
:param element_end: str
The suffix of the child elements' tag to look for.
:return: list
A list of all matching elements.
"""
elements = []
if parent_element is not None:
for child in parent_element:
if child.tag.endswith(element_end):
elements.append(child)
return elements
print("Helper functions for XML parsing defined.")
Helper functions for XML parsing defined.
Extracting Core Data#
Using get_element and get_elements, we can parse out key metadata:
Title
Abstract
Authors
Open Access Status
Subjects
if root is not None:
core_data = get_element(root, "coredata")
if core_data is not None:
# Title
title_elem = get_element(core_data, "title")
title_text = title_elem.text if title_elem is not None else 'N/A'
# Abstract
abstract_elem = get_element(core_data, "description")
if abstract_elem is not None and abstract_elem.text:
abstract_text = abstract_elem.text.strip()
else:
abstract_text = 'N/A'
# Authors
authors_elems = get_elements(core_data, "creator")
authors_list = [elem.text for elem in authors_elems if elem.text]
# Open Access status
open_access_elem = get_element(core_data, "openaccessArticle")
open_access_status = open_access_elem.text if open_access_elem is not None else 'N/A'
# Subjects
subjects_elems = get_elements(core_data, "subject")
subjects_list = [elem.text for elem in subjects_elems if elem.text]
# Print retrieved metadata
print("\n--- Extracted Metadata ---")
print(f"Title: {title_text}")
print(f"Abstract: {abstract_text}")
print("Authors:")
for author in authors_list:
print(f" - {author}")
print(f"Open Access: {open_access_status}")
print("Subjects:")
for subject in subjects_list:
print(f" - {subject}")
else:
print("No core data found in the XML.")
else:
print("No root element; cannot extract metadata.")
--- Extracted Metadata ---
Title: Ambient urea synthesis via electrocatalytic C–N coupling
Abstract: The construction of C–N bond and synthesis of N-containing compounds directly from N2 is an extremely attractive subject. The co-electrolysis system coupled with renewable electricity provides one of the potential options for the green and controllable C–N bond construction under ambient conditions, bypassing the intermediate process of ammonia synthesis. In this review, we have summarized the recent progress in ambient urea synthesis via electrocatalytic C–N coupling from CO2 and nitrogenous species. The reaction mechanisms studies of N2 and CO2 coupling has been mainly highlighted, and the coupling enhancement strategies are emphasized for the coupling of nitrate and CO2, including intermediate adsorption regulation, functional synergy, site reconstitution and local-environment construction. Moreover, promising directions and remaining challenges are outlined, encompassing the mechanism study combining theory and experiment, reactant source and product application, optimization of urea synthesis evaluation system and the development of devices aiming to coupling system. This review aims to guide further advancements in electrocatalytic C–N coupling, facilitating the efficient and sustainable synthesis of urea for a broad spectrum of applications.
Authors:
- Chen, Chen
Open Access: true
Subjects:
- Electrocatalysis
- Urea synthesis
- C–N coupling
- Adsorption configuration
- Reaction mechanism
4. Extract Figure Captions#
This example shows how to isolate figure captions within the XML.
We retrieve XML for a specified DOI.
We navigate to the
floats>figureelements.We extract figure labels, captions, and then write them to a file.
# This article is under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1016/j.mtcata.2025.100092
# Ambient urea synthesis via electrocatalytic C–N coupling
# Chen Chen
doi_example = '10.1016/j.mtcata.2025.100092'
xml_data = None
try:
# Request XML content
response = requests.get(
f"{ELSEVIER_URL}{doi_example}?APIKey={API_KEY}&httpAccept=text/xml"
)
response.raise_for_status()
xml_data = response.content
print("XML data retrieved successfully.")
except requests.exceptions.RequestException as e:
print(f"Error retrieving XML data for DOI {doi_example}: {e}")
except Exception as e:
print(f"Unexpected error occurred: {e}")
root = None
if xml_data:
try:
root = ET.fromstring(xml_data)
print("XML data parsed successfully.")
except ET.ParseError as e:
print(f"Error parsing XML data: {e}")
else:
print("No XML data to parse.")
def get_text_from_figure(caption_elem: ET.Element) -> str:
"""
Retrieve the text content from a figure caption element.
:param caption_elem: ET.Element
A figure element from the XML.
:return: str
Combined label and caption text.
"""
text = ""
# Attempt to get label
label_el = get_element(caption_elem, "label")
if label_el is not None and label_el.text:
text += label_el.text + "\n"
# Attempt to get the main caption
caption_text_el = get_element(caption_elem, "caption")
if caption_text_el is not None:
text += "".join(caption_text_el.itertext()).strip() + "\n"
return text
if root is not None:
original_text_section = get_element(root, "originalText")
doc_section = get_element(original_text_section, "doc")
serial_item_section = get_element(doc_section, "serial-item")
article_section = get_element(serial_item_section, "article")
floats_section = get_element(article_section, "floats")
figures = get_elements(floats_section, "figure")
if figures:
print("\n--- Figure Captions ---")
with open("captions.txt", 'w', encoding='utf-8') as outfile:
outfile.write("--- Figure Captions ---\n")
for figure in figures:
figure_text = get_text_from_figure(figure)
print(figure_text)
with open("captions.txt", 'a', encoding='utf-8') as outfile:
outfile.write(figure_text + "\n")
else:
print("No figures found.")
else:
print("No root element to inspect.")
XML data retrieved successfully.
XML data parsed successfully.
--- Figure Captions ---
Fig. 1
(a) Comparison of industrial urea synthesis route and alternative electrocatalytic protocol. The rt and atm indicate room temperature and atmospheric pressure respectively. (b) Present-day ammonia and urea volumes and uses. All data are in million metric tonnes of nitrogen (Mt N) per year using production data for 2020.
Fig. 2
Electrocatalytic C−N coupling of N2 and CO2 and reaction mechanisms.
Fig. 3
Premise towards C–N coupling: side-on adsorption of N2. (a) Screening of N2 adsorption configuration and adsorption energy over diatomic sites. (b) Mass-to-charge ratio analysis of isotope-labelled urea products. (c) Schematic diagram illustrating one-step and two-step coupling processes.
Fig. 4
Strategies towards electrocatalytic C−N coupling from nitrate and CO2. (a) Adsorption regulation. (b) Functional synergy. (c) Site reconstitution and (d) Local-environment construction.
Fig. 5
Adsorption configuration regulation of N-containing species by oxygen vacancy. (a) SFG signals of intermediate species on oxygen vacancy enriched catalyst. Comparison of the coupling energy barrier of *NO, *N, *NH, and *NH2 with *CO and protonation on (b) Oxygen vacancy enriched sample and (c) Oxygen vacancy deficient sample. (d) Schematic diagram of oxygen vacancy mediated reaction pathway changes.
Fig. 6
Function synergy to boost urea synthesis. (a) Synergy of nitrate and CO2 co-activation. (b) Synergy of electrochemical and non-electrochemical steps.
Fig. 7
Electrochemical reconstitution towards urea synthesis. (a) Reconstitution resistance. (b) Partial reconstitution. (c) Dynamic reconstitution.
Fig. 8
Local microenvironment construction induced by alkaline cations. (a) Urea yield rates, (b) Faradaic efficiencies and (c) partial currents of urea with various concentrations of K+ at –1.5 V. (d) Schematic diagram of K+-participated urea synthesis path.
5. Extract Article Full-Text from XML#
Finally, we’ll demonstrate how to retrieve the entire body of the article from the XML by:
Locating the
headandbodysections.Iterating through all text nodes.
Writing out a combined string to a local file.
# This article is under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1016/j.mtcata.2025.100092
# Ambient urea synthesis via electrocatalytic C–N coupling
# Chen Chen
doi_example = '10.1016/j.mtcata.2025.100092'
xml_data = None
try:
# Request XML content
response = requests.get(
f"{ELSEVIER_URL}{doi_example}?APIKey={API_KEY}&httpAccept=text/xml"
)
response.raise_for_status()
xml_data = response.content
print("XML data retrieved successfully.")
except requests.exceptions.RequestException as e:
print(f"Error retrieving XML data for DOI {doi_example}: {e}")
except Exception as e:
print(f"Unexpected error occurred: {e}")
root = None
if xml_data:
try:
root = ET.fromstring(xml_data)
print("XML data parsed successfully.")
except ET.ParseError as e:
print(f"Error parsing XML data: {e}")
else:
print("No XML data to parse.")
if root is not None:
original_text_section = get_element(root, "originalText")
doc_section = get_element(original_text_section, "doc")
serial_item_section = get_element(doc_section, "serial-item")
article_section = get_element(serial_item_section, "article")
header_section = get_element(article_section, "head")
body_section = get_element(article_section, "body")
full_text = ""
# Helper function to gather text from elements
def gather_text(element: ET.Element) -> str:
"""
Gather all text from an Element, replacing non-visible whitespace with single newlines.
"""
if element is None:
return ""
text_chunks = []
for subelement in element:
for txt in subelement.itertext():
if not txt.strip():
text_chunks.append("\n")
else:
text_chunks.append(txt)
combined = "".join(text_chunks).strip()
return combined
# Extract header text
if header_section is not None:
header_text = gather_text(header_section)
full_text += header_text + "\n\n\n"
# Extract body text
if body_section is not None:
body_text = gather_text(body_section)
full_text += body_text
with open("fulltext.txt", 'w', encoding='utf-8') as outfile:
outfile.write(full_text)
print("Full text saved to fulltext.txt.")
else:
print("No root element to inspect.")
XML data retrieved successfully.
XML data parsed successfully.
Full text saved to fulltext.txt.
# Output a portion of the full text to the console
print(full_text[:250])
Ambient urea synthesis via electrocatalytic C–N coupling
Chen
Chen
Writing – original draft
Validation
Resources
Investigation
Data curation
Conceptualization
State Key Laboratory of Chemo/Bio-Sensing and Chemometrics, College of Chemistry and Che