PubChem API in Python#

By Avery Fernandez and Michael T. Moen

PubChem provides programmatic access to chemical data and bioactivity information from the National Center for Biotechnology Information (NCBI), enabling efficient retrieval and analysis of chemical structures, identifiers, properties, and associated biological activities.

Please see the following resources for more information on API usage:

NOTE: The PubChem limits requests to a maximum of 5 requests per second.

These recipe examples were tested on January 20, 2026.

Attribution: This tutorial was adapted from supporting information in:

Scalfani, V. F.; Ralph, S. C. Alshaikh, A. A.; Bara, J. E. Programmatic Compilation of Chemical Data and Literature From PubChem Using Matlab. Chemical Engineering Education, 2020, 54, 230. https://doi.org/10.18260/2-1-370.660-115508 and vfscalfani/MATLAB-cheminformatics

Setup#

The following external libraries need to be installed into your environment to run the code examples in this tutorial:

We import the libraries used in this tutorial below:

import pandas as pd
import requests
from time import sleep
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

1. PubChem Similarity#

Get Compound Image#

We can search for a compound and display an image. In this example, we look at 1-Butyl-3-methyl-imidazolium, which has a compound ID (CID) of 2734162.

BASE_URL = "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/"
compoundID = "2734162"

response = requests.get(f"{BASE_URL}cid/{compoundID}/PNG")
img = response.content

# Save PNG to file
with open(f"{compoundID}.png", "wb") as out:
    out.write(img)
# Display compound PNG with matplotlib
img = mpimg.imread(f"{compoundID}.png")
plt.imshow(img)
plt.axis("off")
plt.show()
../_images/da2bc0e3a96859d996cdf81c582c0fdb9b35317e28d99f1c349d94b216831a6a.png

Retrieve InChI and Isomeric SMILES#

An International Chemical Identifier (InChI) is a textual representation of a substance’s molecular structure.

properties = ["IsomericSMILES", "InChI"]
response = requests.get(
    f"{BASE_URL}cid/{compoundID}/property/{','.join(properties)}/JSON"
)
data = response.json()

# Display the response data
data
{'PropertyTable': {'Properties': [{'CID': 2734162,
    'SMILES': 'CCCCN1C=C[N+](=C1)C',
    'InChI': 'InChI=1S/C8H15N2/c1-3-4-5-10-7-6-9(2)8-10/h6-8H,3-5H2,1-2H3/q+1'}]}}
# Extract InChI
data["PropertyTable"]["Properties"][0]["InChI"]
'InChI=1S/C8H15N2/c1-3-4-5-10-7-6-9(2)8-10/h6-8H,3-5H2,1-2H3/q+1'

Isomeric SMILES is a textual representation of molecules that includes stereochemical and isotropic information.

# Extract Isomeric SMILES
data["PropertyTable"]["Properties"][0]["SMILES"]
'CCCCN1C=C[N+](=C1)C'

Retrieve Identifier and Property Data#

Get the following data for the retrieved compounds: InChI, IsomericSMILES, MolecularWeight, HeavyAtomCount, RotableBondCount, and Charge.

compound_data = []
properties = ["InChI", "IsomericSMILES", "MolecularWeight", 
              "HeavyAtomCount", "RotatableBondCount", "Charge"]

for cid in id_list[:25]:
    try:
        response = requests.get(
            f"{BASE_URL}cid/{cid}/property/{','.join(properties)}/JSON"
        )
        sleep(.25)
        response.raise_for_status()
        data = response.json()
        compound_data.append(data["PropertyTable"]["Properties"][0])
    except requests.exceptions.RequestException as e:
        print(f"Error fetching properties for CID {cid}: {e}")

# Convert results to a DataFrame
df = pd.DataFrame(compound_data)
df.head()
CID MolecularWeight SMILES InChI Charge RotatableBondCount HeavyAtomCount
0 61347 124.18 CCCCN1C=CN=C1 InChI=1S/C7H12N2/c1-2-3-5-9-6-4-8-7-9/h4,6-7H,... 0 3 9
1 529334 138.21 CCCCCN1C=CN=C1 InChI=1S/C8H14N2/c1-2-3-4-6-10-7-5-9-8-10/h5,7... 0 4 10
2 2734161 174.67 CCCCN1C=C[N+](=C1)C.[Cl-] InChI=1S/C8H15N2.ClH/c1-3-4-5-10-7-6-9(2)8-10;... 0 3 11
3 118785 110.16 CCCN1C=CN=C1 InChI=1S/C6H10N2/c1-2-4-8-5-3-7-6-8/h3,5-6H,2,... 0 2 8
4 12971008 252.10 CCCN1C=C[N+](=C1)C.[I-] InChI=1S/C7H13N2.HI/c1-3-4-9-6-5-8(2)7-9;/h5-7... 0 2 10