Sage Journals Text and Data Mining in Python

Sage Journals Text and Data Mining in Python#

by Michael T. Moen

Sage Journals allow downloading of articles for which you have legitimate access (e.g. open access articles and those included in your institution’s subscription) for non-commercial text and data mining (see restrictions in terms below). Access to text and data mining with Sage resources requires prior approval. Contact UA Libraries or your institution to check their agreement and enable access. Please see the following resources below for more information on Sage text and data mining, API usage, and policies/terms:

This tutorial content is intended to help facilitate academic research. Please check your institution for their Text and Data Mining or related License Agreement with Sage Journals.

Documentation
- Sage Journals
Terms
Data Reuse
- Sage Policy on Text and Data Mining (TDM) and Artificial Intelligence (AI)

NOTE: Please see access details and rate limit requests for this API in the official documentation.

These recipe examples were tested on February 10, 2026.

This recipe uses the CrossRef API to obtain the full-text URLs of the articles, as recommended in Sage’s Text and Data Mining overview. For more information on usage for this API, please see our CrossRef cookbook tutorials and the text and data mining for researchers page of CrossRef’s API documentation.

Setup#

Import Libraries#

The following external libraries need to be installed into your environment to run the code examples in this tutorial:

We import the libraries used in this tutorial below:

import requests
from dotenv import load_dotenv
import os
from time import sleep

Import Email#

The CrossRef API requires users to provide an email address in API requests.

We keep our email address in a .env file and use the dotenv library to access it. If you would like to use this method, create a .env file and add the following line to it:

EMAIL=PUT_YOUR_EMAIL_HERE

load_dotenv()
try:
    EMAIL = os.environ["EMAIL"]
except KeyError:
    print("EMAIL not found. Please set 'EMAIL' in your .env file.")

Enable Text and Data Mining with Sage#

Access to text and data mining on Sage requires approval. Contact UA Libraries or your institution to check their agreement and enable access.

1. Retrieve a Full-Text Article as a PDF#

To begin, let’s consider a simple example where we retrieve the full text of an article.

For this example, we look at the following article licensed under CC BY 4.0:

https://doi.org/10.1177/14759217221075241

Sage permits non-commercial TDM for articles to those you have legitimate access to. If you can view the full text for the article of the DOI above in your browser, you should be able to access it programmatically below once you receive approval by Sage.

def get_pdf_url(doi : str) -> str:
    """Use the CrossRef API to obtain the PDF TDM link for the given DOI"""
    data = requests.get(f"https://api.crossref.org/works/{doi}?mailto={EMAIL}").json()
    for link in data["message"]["link"]:
        if (link["content-type"] == "application/pdf" and
            link["intended-application"] == "text-mining"):
            return link["URL"]

doi = "https://doi.org/10.1177/14759217221075241"
full_text_url = get_pdf_url(doi)
full_text_url

'https://journals.sagepub.com/doi/pdf/10.1177/14759217221075241'

With the URL for the article full text, we can now retrieve the data from Sage.

def get_article_full_text(url : str):
    """Retrieve the full text of an article from Sage"""
    response = requests.get(url)
    if response.status_code == 200:
        # Status code 200 indicates success
        if "https://journals.sagepub.com/doi/abs/" in response.url:
            # If you do not have access to an article, your query will redirect to the abstract
            print("ERROR: You do not appear to have access to this article's full text.")
        else:
            return response
    elif response.status_code == 403:
        # Status code 403 indicates that the requested object is forbidden
        print("ERROR: Access to TDM on Sage requires approval.")
        print("Contact UA Libraries or your institution for more guidance.")
    else:
        print(f"ERROR: {response.status_code}")
    return None

response = get_article_full_text(full_text_url)

Since our query was successful, we download the full-text article as a PDF below:

def download_full_text(response : requests.models.Response, filename : str) -> None:
    """Download the full text for an article"""
    with open(filename, "wb") as f:
        f.write(response.content)

download_full_text(response, "article.pdf")

2. Retrieve Full-Text PDF Articles in a Loop#

Using the functions defined in the previous example, we can retrieve the full text of several articles in a loop.

# These articles are licensed under CC BY 4.0: https://creativecommons.org/licenses/by/4.0/
dois = [
    "https://doi.org/10.3233/NAI-240767",
    "https://doi.org/10.1177/20539517221145372",
    "https://doi.org/10.1177/09544062231164575",
    "https://doi.org/10.1177/2053951717743530",
    "https://doi.org/10.1177/00405175221145571"
]

for idx, doi in enumerate(dois):
    url = get_pdf_url(doi)
    response = get_article_full_text(url)
    sleep(1)
    if not response:
        print(f"ERROR: Could not download {url}")
        continue
    filename = f"article{idx+1}.pdf"
    download_full_text(response, filename)
    print(f"{url} downloaded as {filename}")

https://journals.sagepub.com/doi/pdf/10.3233/NAI-240767 downloaded as article1.pdf
https://journals.sagepub.com/doi/pdf/10.1177/20539517221145372 downloaded as article2.pdf
https://journals.sagepub.com/doi/pdf/10.1177/09544062231164575 downloaded as article3.pdf
http://journals.sagepub.com/doi/pdf/10.1177/2053951717743530 downloaded as article4.pdf
https://journals.sagepub.com/doi/pdf/10.1177/00405175221145571 downloaded as article5.pdf

3. Retrieve a Full-Text Article as a XML#

This example uses the same article as section 1, retrieving the data as XML rather than a PDF.

def get_xml_url(doi : str) -> str:
    """Use the CrossRef API to obtain the XML TDM link for the given DOI"""
    data = requests.get(f"https://api.crossref.org/works/{doi}?mailto={EMAIL}").json()
    for link in data["message"]["link"]:
        if (link["content-type"] == "application/xml" and
            link["intended-application"] == "text-mining"):
            return link["URL"]

doi = "https://doi.org/10.1177/14759217221075241"
full_text_url = get_xml_url(doi)
full_text_url

'https://journals.sagepub.com/doi/full-xml/10.1177/14759217221075241'

response = get_article_full_text(full_text_url)

download_full_text(response, "article.xml")

4. Retrieve Full-Text XML Articles in a Loop#

This example uses the same articles from section 2, retrieving the data as XML rather than PDFs.

# We use the same list of DOIs from section 2
for idx, doi in enumerate(dois):
    url = get_xml_url(doi)
    response = get_article_full_text(url)
    sleep(1)
    if not response:
        print(f"ERROR: Could not download {url}")
        continue
    filename = f"article{idx+1}.xml"
    download_full_text(response, filename)
    print(f"{url} downloaded as {filename}")

https://journals.sagepub.com/doi/full-xml/10.3233/NAI-240767 downloaded as article1.xml
https://journals.sagepub.com/doi/full-xml/10.1177/20539517221145372 downloaded as article2.xml
https://journals.sagepub.com/doi/full-xml/10.1177/09544062231164575 downloaded as article3.xml
http://journals.sagepub.com/doi/full-xml/10.1177/2053951717743530 downloaded as article4.xml
https://journals.sagepub.com/doi/full-xml/10.1177/00405175221145571 downloaded as article5.xml