PubChem API in R#

by Vishank Patel, Adam M. Nguyen, and Michael T. Moen

PubChem provides programmatic access to chemical data and bioactivity information from the National Center for Biotechnology Information (NCBI), enabling efficient retrieval and analysis of chemical structures, identifiers, properties, and associated biological activities.

Please see the following resources for more information on API usage:

NOTE: Please see access details and rate limit requests for this API in the official documentation.

These recipe examples were tested on March 24, 2026.

Attribution: This tutorial was adapted from supporting information in:

Scalfani, V. F.; Ralph, S. C. Alshaikh, A. A.; Bara, J. E. Programmatic Compilation of Chemical Data and Literature From PubChem Using Matlab. Chemical Engineering Education, 2020, 54, 230. https://doi.org/10.18260/2-1-370.660-115508 and UA-Libraries-Research-Data-Services/MATLAB-cheminformatics

Setup#

The following packages need to be installed into your environment to run the code examples in this tutorial. These packages can be installed with install.packages().

We load the libraries used in this tutorial below:

library(httr)
library(jsonlite)
library(magick)

1. PubChem Similarity#

Get Compound Image#

We can search for a compound and display an image. In this example, we look at 1-Butyl-3-methyl-imidazolium, which has a compound ID (CID) of 2734162.

BASE_URL <- "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/"
compoundID <- "2734162"

cid_url <- paste0(BASE_URL, "cid/", compoundID, "/PNG")

# Display the image from the CID_URL
image_read(cid_url)

Retrieve InChI and Isomeric SMILES#

An International Chemical Identifier (InChI) is a textual representation of a substance’s molecular structure.

inchi_url <- paste0(BASE_URL, "cid/", compoundID, "/property/inchi/TXT")

# "$content" filters the HTTP response from the output and only returns the required output data
raw_inchi <- rawToChar(GET(inchi_url)$content)

# Clear newline character from output
inchi <- sub("\n", "", raw_inchi)
inchi
## [1] "InChI=1S/C8H15N2/c1-3-4-5-10-7-6-9(2)8-10/h6-8H,3-5H2,1-2H3/q+1"

Isomeric SMILES is a textual representation of molecules that includes stereochemical and isotropic information.

IS_url <- paste0(BASE_URL, "cid/", compoundID, "/property/IsomericSMILES/TXT")

raw_IS <- rawToChar(GET(IS_url)$content)
IS <- sub("\n", "", raw_IS)
IS
## [1] "CCCCN1C=C[N+](=C1)C"

Retrieve Identifier and Property Data#

Get the following data for the retrieved compounds: InChI, ConnectivitySMILES, MolecularWeight, IUPACname, HeavyAtomCount, RotableBondCount, and Charge.

properties <- c("InChI", "ConnectivitySMILES", "MolecularWeight", "IUPACname",
                "HeavyAtomCount", "CovalentUnitCount", "Charge")

# Merge properties into a comma-delimited string for the API call
properties_arg <- paste(properties, collapse = ",")

# In this example, we only look at the first 10 CIDs
results <- list()
for (cid in CIDs[1:10]) {
  url <- paste0(BASE_URL, "cid/", cid, "/property/", properties_arg, "/JSON")
  result <- sub("\n", "", rawToChar(GET(url)$content))
  results <- append(results, list(fromJSON(result)$PropertyTable$Properties))
  Sys.sleep(0.25)
}

similarity_results_df <- do.call(rbind, results)
head(similarity_results_df)
##        CID MolecularWeight        ConnectivitySMILES
## 1    61347          124.18             CCCCN1C=CN=C1
## 2   529334          138.21            CCCCCN1C=CN=C1
## 3  2734161          174.67 CCCCN1C=C[N+](=C1)C.[Cl-]
## 4   118785          110.16              CCCN1C=CN=C1
## 5 12971008          252.10   CCCN1C=C[N+](=C1)C.[I-]
## 6   304622          138.21            CCCCN1C=CN=C1C
##                                                                          InChI
## 1                    InChI=1S/C7H12N2/c1-2-3-5-9-6-4-8-7-9/h4,6-7H,2-3,5H2,1H3
## 2                InChI=1S/C8H14N2/c1-2-3-4-6-10-7-5-9-8-10/h5,7-8H,2-4,6H2,1H3
## 3 InChI=1S/C8H15N2.ClH/c1-3-4-5-10-7-6-9(2)8-10;/h6-8H,3-5H2,1-2H3;1H/q+1;/p-1
## 4                        InChI=1S/C6H10N2/c1-2-4-8-5-3-7-6-8/h3,5-6H,2,4H2,1H3
## 5      InChI=1S/C7H13N2.HI/c1-3-4-9-6-5-8(2)7-9;/h5-7H,3-4H2,1-2H3;1H/q+1;/p-1
## 6                InChI=1S/C8H14N2/c1-3-4-6-10-7-5-9-8(10)2/h5,7H,3-4,6H2,1-2H3
##                                 IUPACName Charge HeavyAtomCount
## 1                        1-butylimidazole      0              9
## 2                       1-pentylimidazole      0             10
## 3 1-butyl-3-methylimidazol-3-ium chloride      0             11
## 4                       1-propylimidazole      0              8
## 5  1-methyl-3-propylimidazol-1-ium iodide      0             10
## 6               1-butyl-2-methylimidazole      0             10
##   CovalentUnitCount
## 1                 1
## 2                 1
## 3                 2
## 4                 1
## 5                 2
## 6                 1