Springer Nature API in Python#
By Avery Fernandez and Vincent F. Scalfani
These recipe examples use the Springer Nature Open Access API to retrieve metadata and full-text content. The Springer Nature Open Access API includes about 1.5 million full-text records.
There is also a Full-Text API for subscription content. Please check with your institution for their Text and Data Mining or related License Agreement with Springer Nature.
Please see the following resources for more information on API usage:
Documentation
Terms
Data Reuse
NOTE: Check with your institution to determine your API rate limit with Springer Nature.
If you have copyright or other related text and data mining questions, please contact The University of Alabama Libraries or your respective library/institution.
These recipe examples were tested on August 26, 2025.
Setup#
Import Libraries#
The following external libraries need to be installed into your enviornment to run the code examples in this tutorial:
We import the libraries used in this tutorial below:
import requests
from time import sleep
from pprint import pprint
import xml.etree.ElementTree as ET
from dotenv import load_dotenv
import os
Import API Key#
An API key is required to access the Springer Nature API. You can sign up for one at the Springer Nature Developer Portal.
We keep our API key in a separate file, a .env
file, and use the dotenv
library to access it. If you use this method, create a file named .env
in the same directory as this notebook and add the following line to it:
SPRINGER_API_KEY=PUT_YOUR_API_KEY_HERE
load_dotenv()
try:
API_KEY = os.environ["SPRINGER_API_KEY"]
except KeyError:
print("API key not found. Please set 'SPRINGER_API_KEY' in your .env file.")
else:
print("Environment and API key successfully loaded.")
Environment and API key successfully loaded.
1. Retrieve Full-Text JATS XML of an Article#
In this section, we demonstrate how to retrieve the JATS XML content for a specific article based on its DOI.
The JATS XML format is a standard intended for tagging, archiving, and exchanging journal articles. The Springer Nature Open Access API allows us to retrieve articles in JATS XML format.
Key parameters:
base_url
: The base URL for the Springer Nature API (Open Access JATS endpoint).q=(doi:DOI)
: The query parameter used to search for an article based on its DOI.api_key
: The query parameter used to pass our valid API key.
More details about the parameters can be found at Springer Nature Developer Portal.
You can also play around with the API using the Springer API Playground.
base_url = 'https://api.springernature.com/openaccess/jats'
# Example article from SpringerOpen Brain Informatics
# This article is under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1186/s40708-025-00250-5
doi = '10.1186/s40708-025-00250-5'
try:
response = requests.get(f'{base_url}?q=(doi:"{doi}")&api_key={API_KEY}', timeout=30)
response.raise_for_status() # Raise an HTTPError if the response was unsuccessful
with open('fulltext.jats', 'w', encoding='utf-8') as outfile:
outfile.write(response.text)
print(f"JATS XML successfully retrieved for DOI {doi}. Saved to fulltext.jats")
except requests.exceptions.RequestException as e:
print(f"Error retrieving JATS XML for DOI {doi}: {e}")
JATS XML successfully retrieved for DOI 10.1186/s40708-025-00250-5. Saved to fulltext.jats
2. Retrieve Full-Text in a Loop#
In many cases, you may have a list of DOIs and want to retrieve the full-text for each of them. Below, we loop over a set of DOIs, retrieve the JATS XML, and store each one in a separate file.
A short delay (sleep(1)
) is used to avoid hitting rate limits.
base_url = 'https://api.springernature.com/openaccess/jats'
dois = [
# Licensed under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1186/s40708-025-00250-5
'10.1186/s40708-025-00250-5',
# Licensed under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1186/s40708-024-00247-6
'10.1186/s40708-024-00247-6',
# Licensed under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1186/s40708-024-00243-w
'10.1186/s40708-024-00243-w',
# Licensed under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1186/s40708-023-00202-x
'10.1186/s40708-023-00202-x',
# Licensed under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1186/s40708-023-00204-9
'10.1186/s40708-023-00204-9',
]
for i, doi in enumerate(dois, start=1):
print(f"Retrieving JATS XML for DOI {doi} ({i}/{len(dois)})...")
try:
response = requests.get(f'{base_url}?q=(doi:"{doi}")&api_key={API_KEY}', timeout=30)
response.raise_for_status()
# Prepare filename by safely replacing potential invalid path characters
doi_name = doi.replace('/', '_').replace('"', '')
output_file = f'{doi_name}_jats_text.jats'
with open(output_file, 'w', encoding='utf-8') as outfile:
outfile.write(response.text)
print(f"JATS XML retrieved for DOI {doi}. Saved to {output_file}.")
except requests.exceptions.RequestException as e:
print(f"Error retrieving JATS XML for DOI {doi}: {e}")
# Delay to avoid hitting rate limits
sleep(1)
Retrieving JATS XML for DOI 10.1186/s40708-025-00250-5 (1/5)...
JATS XML retrieved for DOI 10.1186/s40708-025-00250-5. Saved to 10.1186_s40708-025-00250-5_jats_text.jats.
Retrieving JATS XML for DOI 10.1186/s40708-024-00247-6 (2/5)...
JATS XML retrieved for DOI 10.1186/s40708-024-00247-6. Saved to 10.1186_s40708-024-00247-6_jats_text.jats.
Retrieving JATS XML for DOI 10.1186/s40708-024-00243-w (3/5)...
JATS XML retrieved for DOI 10.1186/s40708-024-00243-w. Saved to 10.1186_s40708-024-00243-w_jats_text.jats.
Retrieving JATS XML for DOI 10.1186/s40708-023-00202-x (4/5)...
JATS XML retrieved for DOI 10.1186/s40708-023-00202-x. Saved to 10.1186_s40708-023-00202-x_jats_text.jats.
Retrieving JATS XML for DOI 10.1186/s40708-023-00204-9 (5/5)...
JATS XML retrieved for DOI 10.1186/s40708-023-00204-9. Saved to 10.1186_s40708-023-00204-9_jats_text.jats.
3. Acquire and Parse Metadata (JSON)#
Alternatively, you can retrieve only the metadata in JSON format by switching the base URL to the json
endpoint. Then, you can parse relevant fields (e.g., abstract, publication date, etc.).
base_url = 'https://api.springernature.com/openaccess/json'
# Example article from SpringerOpen Brain Informatics
# This article is under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1186/s40708-025-00250-5
# Harnessing the synergy of statistics and \
# deep learning for BCI competition 4 dataset 4: a novel approach
# Gauttam Jangir, Nisheeth Joshi & Gaurav Purohit
doi = '10.1186/s40708-025-00250-5'
metadata_response = {}
try:
response = requests.get(f'{base_url}?q=(doi:"{doi}")&api_key={API_KEY}', timeout=30)
response.raise_for_status()
metadata_response = response.json()
except requests.exceptions.RequestException as e:
print(f"Error retrieving JSON metadata for DOI {doi}: {e}")
metadata_response = {}
metadata_response.keys()
dict_keys(['apiMessage', 'query', 'result', 'records', 'facets'])
Below is an example of how to retrieve specific fields from the metadata, such as the article’s abstract, DOI, publication date, publication name, and title.
# metadata_response usually has keys: ['apiMessage', 'query', 'records']
api_message = metadata_response.get('apiMessage')
query_info = metadata_response.get('query')
records = metadata_response.get('records', [])
print("API Message:", api_message)
print("Query:", query_info)
if records:
# Take the first record if available
first_record = records[0]
print("Abstract:", first_record.get('abstract', {}).get('p', ''))
print("DOI:", first_record.get('doi'))
print("Online Date:", first_record.get('onlineDate'))
print("Print Date:", first_record.get('printDate'))
print("Publication Name:", first_record.get('publicationName'))
print("Title:", first_record.get('title'))
# Get the authors
authors = [author.get('creator') for author in first_record.get('creators', [])]
print("Authors:", authors)
else:
print("No 'records' were returned in the JSON response.")
API Message: This JSON was provided by Springer Nature
Query: (doi:"10.1186/s40708-025-00250-5")
Abstract: Human brain signal processing and finger’s movement coordination is a complex mechanism. In this mechanism finger’s movement is mostly performed for every day’s task. It is well known that to capture such movement EEG or ECoG signals are used. In this order to find the patterns from these signals is important. The BCI competition 4 dataset 4 is one such standard dataset of ECoG signals for individual finger movement provided by University of Washington, USA. In this work, this dataset is, statistically analyzed to understand the nature of data and outliers in it. Effectiveness of pre-processing algorithm is then visualized. The cleaned dataset has dual polarity and gaussian distribution nature which makes Tanh activation function suitable for the neural network BC4D4 model. BC4D4 uses Convolutional neural network for feature extraction, dense neural network for pattern identification and incorporating dropout & regularization making the proposed model more resilient. Our model outperforms the state of the art work on the dataset 4 achieving 0.85 correlation value that is 1.85X (Winner of BCI competition 4, 2012) & 1.25X (Finger Flex model, 2022).
DOI: 10.1186/s40708-025-00250-5
Online Date: 2025-02-15
Print Date: None
Publication Name: Brain Informatics
Title: Harnessing the synergy of statistics and deep learning for BCI competition 4 dataset 4: a novel approach
Authors: ['Jangir, Gauttam', 'Joshi, Nisheeth', 'Purohit, Gaurav']
4. Parsing XML for Metadata#
Sometimes you may want to extract specific pieces of data (e.g., title, abstract, authors, subjects) directly from the JATS XML instead of the JSON. In this example, we use Python’s xml.etree.ElementTree
to parse the XML.
The XML structure has a <records>
tag containing one or more <article>
tags. Each <article>
has a <front>
section for metadata, a <body>
for main text, and possibly <back>
for references, etc.
base_url = 'https://api.springernature.com/openaccess/jats'
# example article from SpringerOpen Brain Informatics
# This article is under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1186/s40708-025-00250-5
# Harnessing the synergy of statistics and \
# deep learning for BCI competition 4 dataset 4: a novel approach
# Gauttam Jangir, Nisheeth Joshi & Gaurav Purohit
doi = '10.1186/s40708-025-00250-5'
xml_data = None
try:
response = requests.get(f'{base_url}?q=(doi:"{doi}")&api_key={API_KEY}', timeout=30)
response.raise_for_status() # Raise an HTTPError if the response was unsuccessful
xml_data = response.text
print(f"JATS XML successfully retrieved for DOI {doi}.")
except requests.exceptions.RequestException as e:
print(f"Error retrieving JATS XML for DOI {doi}: {e}")
JATS XML successfully retrieved for DOI 10.1186/s40708-025-00250-5.
root = None
if xml_data:
try:
root = ET.fromstring(xml_data)
print("XML data successfully parsed.")
except ET.ParseError as e:
print(f"Error parsing XML data: {e}")
XML data successfully parsed.
article_data = {
'title': None,
'abstract': None,
'authors': [],
'subjects': [],
}
if root is not None:
# Assume there's at least one article under records.
first_article = root.find('.//records/article')
if first_article is not None:
# Title
title_elem = first_article.find('.//front/article-meta/title-group/article-title')
if title_elem is not None:
article_data['title'] = title_elem.text
# Abstract
abstract_elem = first_article.find('.//front/article-meta/abstract/p')
if abstract_elem is not None:
article_data['abstract'] = abstract_elem.text
# Authors
authors = first_article.findall('.//front/article-meta/contrib-group/contrib/name')
for author in authors:
# Each author element may have multiple child tags (given, surname, etc.).
full_name = " ".join(author.itertext())
article_data['authors'].append(full_name)
# Subjects (keywords)
subjects = first_article.findall('.//front/article-meta/kwd-group/kwd')
for subject in subjects:
article_data['subjects'].append(subject.text)
else:
print("No article data found in the XML.")
else:
print("No valid XML data to parse.")
pprint(article_data)
{'abstract': 'Human brain signal processing and finger’s movement coordination '
'is a complex mechanism. In this mechanism finger’s movement is '
'mostly performed for every day’s task. It is well known that to '
'capture such movement EEG or ECoG signals are used. In this '
'order to find the patterns from these signals is important. The '
'BCI competition 4 dataset 4 is one such standard dataset of ECoG '
'signals for individual finger movement provided by University of '
'Washington, USA. In this work, this dataset is, statistically '
'analyzed to understand the nature of data and outliers in it. '
'Effectiveness of pre-processing algorithm is then visualized. '
'The cleaned dataset has dual polarity and gaussian distribution '
'nature which makes Tanh activation function suitable for the '
'neural network BC4D4 model. BC4D4 uses Convolutional neural '
'network for feature extraction, dense neural network for pattern '
'identification and incorporating dropout & regularization making '
'the proposed model more resilient. Our model outperforms the '
'state of the art work on the dataset 4 achieving 0.85 '
'correlation value that is 1.85X (Winner of BCI competition 4, '
'2012) & 1.25X (Finger Flex model, 2022).',
'authors': ['Jangir Gauttam', 'Joshi Nisheeth', 'Purohit Gaurav'],
'subjects': ['BCI (Brain Computer Interface)',
'EEG (electroencephalogram)',
'Electrocorticography (ECoG)',
'Event Related Potential (ERP)',
'Motor Imagery (MI)',
'Visual Evoked Potential (VEP)',
'Psychology and Cognitive Sciences'],
'title': 'Harnessing the synergy of statistics and deep learning for BCI '
'competition 4 dataset 4: a novel approach'}
5. Parsing XML for Figure Captions#
Figure captions often appear under <fig>
tags inside the <body>
element. Each figure may have a <label>
tag for the figure number and a <caption>
tag for the figure’s description.
base_url = 'https://api.springernature.com/openaccess/jats'
# Example article from SpringerOpen Brain Informatics
# This article is under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1186/s40708-025-00250-5
# Harnessing the synergy of statistics and \
# deep learning for BCI competition 4 dataset 4: a novel approach
# Gauttam Jangir, Nisheeth Joshi & Gaurav Purohit
doi = '10.1186/s40708-025-00250-5'
xml_data = None
try:
response = requests.get(f'{base_url}?q=(doi:"{doi}")&api_key={API_KEY}', timeout=30)
response.raise_for_status() # Raise an HTTPError if the response was unsuccessful
xml_data = response.text
print(f"JATS XML successfully retrieved for DOI {doi}.")
except requests.exceptions.RequestException as e:
print(f"Error retrieving JATS XML for DOI {doi}: {e}")
JATS XML successfully retrieved for DOI 10.1186/s40708-025-00250-5.
root = None
if xml_data:
try:
root = ET.fromstring(xml_data)
print("XML data successfully parsed.")
except ET.ParseError as e:
print(f"Error parsing XML data: {e}")
else:
print("No valid XML data to parse.")
XML data successfully parsed.
# Initialize an empty list to store figure data
figures_data = []
if root is not None:
# Find all <fig> elements within the <body> of the XML
figures = root.findall('.//body//fig')
for fig in figures:
# Extract the <label> element (e.g., "Figure 1") if it exists
label = fig.find('label')
# Extract the <caption> element (e.g., description of the figure) if it exists
caption = fig.find('caption')
# Get the text content of the label, or use an empty string if not present
label_text = label.text if label is not None else ""
# Get the text content of the caption, joining all inner text, or use an empty string if not present
caption_text = "".join(caption.itertext()) if caption is not None else ""
# Append the figure's label and caption as a dictionary to the figures_data list
figures_data.append({
'label': label_text,
'caption': caption_text.strip() # Remove any leading/trailing whitespace
})
else:
# If the XML root is not valid, print an error message
print("No valid XML data to parse.")
# Check if any figures were found and processed
if figures_data:
print("Figures data:")
# Iterate through the collected figures and print their details
for i, fig_data in enumerate(figures_data, start=1):
print(f"Figure {i}:")
print(f"Label: {fig_data['label']}")
print(f"Caption: {fig_data['caption']}")
print()
else:
# If no figures were found, print a message indicating this
print("No figures data found in the XML.")
Figures data:
Figure 1:
Label: Fig. 1
Caption: Capturing individual finger flexion [57, 58]
Figure 2:
Label: Fig. 2
Caption: Subject 1 fingers
Figure 3:
Label: Fig. 3
Caption: Box Plot (five-point summary)
Figure 4:
Label: Fig. 4
Caption: Subject 2 fingers
Figure 5:
Label: Fig. 5
Caption: Subject 3 fingers
Figure 6:
Label: Fig. 6
Caption: Unusual data point (Outlier) in dataset
Figure 7:
Label: Fig. 7
Caption: Isolation forest tree
Figure 8:
Label: Fig. 8
Caption: Histogram of subject 1
Figure 9:
Label: Fig. 9
Caption: Subject 1 fingers after isolation forest
Figure 10:
Label: Fig. 10
Caption: Histogram of subject 2
Figure 11:
Label: Fig. 11
Caption: Subject 2 fingers after isolation forest
Figure 12:
Label: Fig. 12
Caption: Histogram of subject 3
Figure 13:
Label: Fig. 13
Caption: Subject 3 fingers after isolation forest
Figure 14:
Label: Fig. 14
Caption: BC4D4 model architecture
Figure 15:
Label: Fig. 15
Caption: Activation functions
Figure 16:
Label: Fig. 16
Caption: BC4D4 model layered architecture
Figure 17:
Label: Fig. 17
Caption: Correlation value of BC4D4 with softsign & tanh
Figure 18:
Label: Fig. 18
Caption: Models comparison
6. Extracting Full-Text from the Body#
Finally, we can extract a rough “plain text” version of the article body by iterating through each element (e.g., <sec>
, <p>
), capturing the text, and joining it into a single string. This can help with quick text-based analyses.
base_url = 'https://api.springernature.com/openaccess/jats'
# Example article from SpringerOpen Brain Informatics
# This article is under CC-BY-4.0 license https://creativecommons.org/licenses/by/4.0/
# https://doi.org/10.1186/s40708-025-00250-5
# Harnessing the synergy of statistics and \
# deep learning for BCI competition 4 dataset 4: a novel approach
# Gauttam Jangir, Nisheeth Joshi & Gaurav Purohit
doi = '10.1186/s40708-025-00250-5'
xml_data = None
try:
response = requests.get(f'{base_url}?q=(doi:"{doi}")&api_key={API_KEY}', timeout=30)
response.raise_for_status() # Raise an HTTPError if the response was unsuccessful
xml_data = response.text
print(f"JATS XML successfully retrieved for DOI {doi}.")
except requests.exceptions.RequestException as e:
print(f"Error retrieving JATS XML for DOI {doi}: {e}")
JATS XML successfully retrieved for DOI 10.1186/s40708-025-00250-5.
root = None
if xml_data:
try:
root = ET.fromstring(xml_data)
print("XML data successfully parsed.")
except ET.ParseError as e:
print(f"Error parsing XML data: {e}")
else:
print("No valid XML data to parse.")
XML data successfully parsed.
full_text = None
if root is not None:
# Find the body element
body = root.find('.//body')
if body is not None:
# We'll store text in a list, then join them.
text_parts = []
# We can iterate over each top-level child in the body.
# Typically <sec> tags hold paragraphs, etc.
for section in body:
# For each subsection, gather all text.
# Using .itertext() collects all text within that element.
for sub_section in section:
text_parts.append("".join(sub_section.itertext()))
# Combine everything
full_text = "\n".join(text_parts)
full_text = full_text.strip()
else:
print("No body content found in the XML.")
else:
print("No valid XML data to parse.")
if full_text:
with open('fulltext.txt', 'w', encoding='utf-8') as outfile:
outfile.write(full_text)
print("Full text extracted and saved to 'fulltext.txt'.")
else:
print("No full text content found in the XML.")
Full text extracted and saved to 'fulltext.txt'.
# Output a portion of the full text to the console
print(full_text[:395])
Introduction
THe brain is the most active organ of the human body that takes input, processes them, and gives output. Fingers play an important role in human life that is why one of the active fields for rehabilitation is the Brain-Computer Interface (BCI) where fingers and EEG signals are studied together for the normal routine return of a physically challenged or locomotive disabled person.