Wiley Text and Data Mining (TDM) in Python#

by Michael T. Moen

Wiley TDM Terms of Use: https://onlinelibrary.wiley.com/library-info/resources/text-and-datamining

Please check with your institution for their Text and Data Mining Agreement with Wiley. This tutorial content is intended to help facillitate academic research.

The Wiley Text and Data Mining (TDM) API allows users to retrieve the full-text articles of Wiley content in PDF form.

These recipe examples were tested on January 19, 2024.

NOTE: The Wiley TDM API limits requests to a maximum of 3 requests per second.

Setup#

Text and Data Mining Token#

A token is required to access the Wiley TDM API. Sign up can be found here. Import your token below:

# First add the token as a variable called wiley_token in a file wiley_token.py
from wiley_token import wiley_token

Import Libraries#

This tutorial uses the following libraries:

import requests                     # Manages API requests
from time import sleep              # Allows staggering of API requests to conform to rate limits

1. Retrieve full-text of an article#

The Wiley TDM API returns the full-text of an article as a PDF when given the article’s DOI.

In the first example, we download the full-text of the article with the DOI “10.1002/net.22207”. This article was found on the Wiley Online Library.

# DOI of article to download
doi = '10.1002/net.22207'

# Construct URL
url = f'https://api.wiley.com/onlinelibrary/tdm/v1/articles/{doi}'

# Include token in header
headers = {
    "Wiley-TDM-Client-Token": wiley_token
}

# Make a GET request to the Wiley TDM API
response = requests.get(url, headers=headers)

# Download PDF if status code indicates success
if response.status_code == 200:

    # Name file after the DOI
    filename = f'{doi.replace('/', '_')}.pdf'

    # Write data to PDF file
    with open(filename, 'wb') as file:
        file.write(response.content)

    print(f'{filename} downloaded successfully')

# Print status code if unsuccessful
else:
    print(f'Failed to download PDF. Status code: {response.status_code}')
10.1002_net.22207.pdf downloaded successfully

2. Retrieve full-text of multiple articles#

In this example, we download 5 articles found in the Wiley Online Library:

# DOIs of articles to download
dois = [
    '10.1111/j.1467-8624.2010.01564.x',
    '10.1111/1467-8624.00164',
    '10.1111/cdev.12864',
    '10.1111/j.1467-8624.2007.00995.x',
    '10.1111/j.1467-8624.2010.01499.x',
    '10.1111/j.1467-8624.2010.0149.x'       # Invalid DOI, will throw error
]

# Include token in header
headers = {
    "Wiley-TDM-Client-Token": wiley_token
}

# Send an HTTP request for each DOI
for doi in dois:

    # Construct URL
    url = f'https://api.wiley.com/onlinelibrary/tdm/v1/articles/{doi}'

    # Make a GET request to the Wiley TDM API
    response = requests.get(url, headers=headers)

    # Download PDF if status code indicates success
    if response.status_code == 200:

        # Name file after the DOI
        filename = f'{doi.replace('/', '_')}.pdf'

        # Write data to PDF file
        with open(filename, 'wb') as file:
            file.write(response.content)

        print(f'{filename} downloaded successfully')

    # Print status code if unsuccessful
    else:
        print(f'Failed to download PDF for {doi.replace('%2f', '/')}. Status code: {response.status_code}')
    
    # Wait 1 second to be nice on Wiley's servers
    sleep(1)
10.1111_j.1467-8624.2010.01564.x.pdf downloaded successfully
10.1111_1467-8624.00164.pdf downloaded successfully
10.1111_cdev.12864.pdf downloaded successfully
10.1111_j.1467-8624.2007.00995.x.pdf downloaded successfully
10.1111_j.1467-8624.2010.01499.x.pdf downloaded successfully
Failed to download PDF for 10.1111/j.1467-8624.2010.0149.x. Status code: 404