College Scorecard API in Python#

by Michael T. Moen

The College Scorecard API is an online tool hosted by the U.S. Department of Education that contains data concerning higher education institutions.

Please see the following resources for more information on API usage:

NOTE: The College Scorecard API limits requests to a maximum of 1000 requests per IP address per hour.

These recipe examples were tested on February 3, 2026.

Setup#

Import Libraries#

The following external libraries need to be installed into your environment to run the code examples in this tutorial:

We import the libraries used in this tutorial below:

import requests
from pprint import pprint
from time import sleep
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from dotenv import load_dotenv
import os

Import API Key#

An API key is required to access the College Scorecard API. You can sign up for one at the College Scorecard Website.

We keep our API key in a .env file and use the dotenv library to access it. If you would like to use this method, create a .env file and add the following line to it:

COLLEGE_SCORECARD_API_KEY=PUT_YOUR_API_KEY_HERE
load_dotenv()
try:
    API_KEY = os.environ["COLLEGE_SCORECARD_API_KEY"]
except KeyError:
    print("API key not found. Please set 'COLLEGE_SCORECARD_API_KEY' in your .env file.")

1. Get Names of All Institutions#

To start, we’ll use a basic query to find the names of all educational institutions recognized by the College Scorecard API.

All of the data for the API can be found using the v1/schools endpoint.

Fields in the College Scorecard API are accessed with a <time>.<category>.<name> sequence:

  • <time> indicates the year of the data to be accessed. To access the most recent data, use latest.

  • <category> and <name> can be found in the Data Dictionary file that can be downloaded from the API’s documentation. The <category> of a field is given by the dev-category column in the Institution_Data_Dictionary section, and the <name> is given by the developer-friendly name column.

BASE_URL = 'http://api.data.gov/ed/collegescorecard/v1/schools'
params = {
    'fields': 'school.name',
    'api_key': API_KEY
}

names = requests.get(BASE_URL, params=params).json()

# Display resulting metadata
names['metadata']
{'page': 0, 'total': 6429, 'per_page': 20}

The total value indicates the total number results returned in this query. These results are paginated, so each query will return only the number indicated by page_size, which has a default value of 20 and a maximum value of 100. The page number is indicated by page, which by default is set to 0.

We can use a loop to create an API request for each page:

field = 'school.name'
sort_key = 'school.name'
page_size = 100

# Calculate the number of loops needed to page through every result
total_pages = (names['metadata']['total'] // page_size) + 1

institution_names = []

# Loop through each page of the dataset, sending a request for each page
for page_number in range(total_pages):

    params = {
        'fields': field,
        'page': page_number,
        'per_page': page_size,
        'sort': sort_key,
        'api_key': API_KEY
    }
    name_data = requests.get(BASE_URL, params=params).json()['results']

    for university in name_data:
        institution_names.append(university['school.name'])

    # Wait 1 second between API calls to be nicer on the host servers
    sleep(1)

# Display number of institution names found
len(institution_names)
6429
# Print first 10 institution names
institution_names[:10]
['A Better U Beauty Barber Academy',
 'A T Still University of Health Sciences',
 'Aaniiih Nakoda College',
 'ABC Adult School',
 'ABC Adult School - Cabrillo Lane',
 'ABC Beauty Academy',
 'ABCO Technology',
 'Abcott Institute',
 'Abilene Christian University',
 'Abilene Christian University-Undergraduate Online']

2. Get Names of All Universities#

College Scorecard API requests can also take conditions to only select certain institutions.

In this example, we limit the results to only include institutions that award graduate degrees. In order to do this, we set the degrees_awarded.highest parameter to 4 to indicate that the highest degree awarded by an institution is a graduate degree. This information is within the Institution_Data_Dictionary section of the College Scorecard data disctionary.

field = 'school.name'
sort_key = 'school.name'
page_size = 100

# Calculate the number of loops needed to page through every result
params = {
    'fields': field,
    'latest.school.degrees_awarded.highest': 4,
    'api_key': API_KEY
}
name_metadata = requests.get(BASE_URL, params=params).json()['metadata']
total_pages = (name_metadata['total'] // page_size) + 1

university_names = []

for page_number in range(total_pages):

    params = {
        'fields': field,
        'latest.school.degrees_awarded.highest': 4,
        'sort': sort_key,
        'page': page_number,
        'per_page': page_size,
        'api_key': API_KEY
    }
    name_data = requests.get(BASE_URL, params=params).json()['results']

    for university in name_data:
        university_names.append(university['school.name'])

    # Wait 1 second between API calls to be nicer on the host servers
    sleep(1)

# Display the number of university names found
len(university_names)
1985
# Print first 10 university names
university_names[:10]
['A T Still University of Health Sciences',
 'Abilene Christian University',
 'Abraham Lincoln University',
 'Academy for Five Element Acupuncture',
 'Academy for Jewish Religion',
 'Academy for Jewish Religion California',
 'Academy of Art University',
 'Academy of Chinese Culture and Health Sciences',
 'Academy of Vocal Arts',
 'Acupuncture and Integrative Medicine College-Berkeley']

3. Find Number of Universities by State#

The school.state_fips data element contains a number that corresponds to each state. This mapping is given below:

states = {
    1: 'Alabama', 2: 'Alaska', 4: 'Arizona', 5: 'Arkansas', 6: 'California', 8: 'Colorado',
    9: 'Connecticut', 10: 'Delaware', 11: 'District of Columbia', 12: 'Florida', 13: 'Georgia',
    15:	'Hawaii', 16: 'Idaho', 17: 'Illinois', 18: 'Indiana', 19: 'Iowa', 20: 'Kansas',
    21:	'Kentucky', 22: 'Louisiana', 23: 'Maine', 24: 'Maryland', 25: 'Massachusetts',
    26:	'Michigan', 27: 'Minnesota', 28: 'Mississippi', 29: 'Missouri', 30: 'Montana',
    31:	'Nebraska', 32: 'Nevada', 33: 'New Hampshire', 34: 'New Jersey', 35: 'New Mexico',
    36:	'New York', 37: 'North Carolina', 38: 'North Dakota', 39: 'Ohio', 40: 'Oklahoma',
    41:	'Oregon', 42: 'Pennsylvania', 44: 'Rhode Island', 45: 'South Carolina',
    46:	'South Dakota', 47: 'Tennessee', 48: 'Texas', 49: 'Utah', 50: 'Vermont',
    51: 'Virginia', 53: 'Washington', 54: 'West Virginia', 55: 'Wisconsin', 56: 'Wyoming',
    60: 'American Samoa', 64: 'Federated States of Micronesia', 66: 'Guam',
    69: 'Northern Mariana Islands', 70: 'Palau', 72: 'Puerto Rico', 78: 'Virgin Islands'
}

Using this mapping, we can find the number of universities in each state:

field = 'latest.school.state_fips'
page_size = 100

# Calculate the number of loops needed to page through every result
params = {
    'latest.school.degrees_awarded.highest': 4,
    'fields': field,
    'api_key': API_KEY
}
name_metadata = requests.get(BASE_URL, params=params).json()['metadata']
total_pages = (name_metadata['total'] // page_size) + 1

state_freq = {}
for page_number in range(total_pages):

    params = {
        'latest.school.degrees_awarded.highest': 4,
        'fields': field,
        'page': page_number,
        'per_page': page_size,
        'api_key': API_KEY
    }
    state_data = requests.get(BASE_URL, params=params).json()['results']

    for university in state_data:
        state = states[university['latest.school.state_fips']]
        state_freq[state] = state_freq.get(state, 0) + 1
    
    # Wait 1 second between API calls to be nicer on the host servers
    sleep(1)

Now, we can sort and display the results:

# Sort states by number of universities in descending order
sorted_states = sorted(state_freq.items(), key=lambda x: x[1], reverse=True)

# Print the top 20 states/territories with the most universities
for state_name, num_universities in sorted_states[:20]:
    print(f'{state_name:<15} {num_universities}')
California      201
New York        154
Pennsylvania    112
Texas           103
Illinois        79
Florida         75
Massachusetts   73
Ohio            69
North Carolina  57
Missouri        55
Virginia        53
Indiana         50
Georgia         49
Tennessee       47
Puerto Rico     47
Michigan        46
Minnesota       39
New Jersey      38
Wisconsin       37
South Carolina  35

4. Retrieving Multiple Data Points in a Single Query#

The following example uses multiple conditions and multiple fields. The conditions in the query are separated by & while the fields are separated by ,.

# Use .join to add ',' between the elements in the list of fields
fields = ','.join([
    'school.name',
    'latest.admissions.admission_rate.overall',
    'latest.student.size',
    'latest.cost.tuition.out_of_state',
    'latest.cost.tuition.in_state',
    'latest.student.demographics.median_hh_income',
    'latest.school.endowment.begin'
])
sort_key = 'school.name'
page_size = 100

# Calculate the number of loops needed to page through every result
params = {
    'fields': fields,
    'latest.school.degrees_awarded.highest': 4,
    'latest.student.size__range': '1000..', # Schools with 1000 or more students
    'api_key': API_KEY
}
name_metadata = requests.get(BASE_URL, params=params).json()['metadata']
total_pages = (name_metadata['total'] // page_size) + 1

rows = []

for page_number in range(total_pages):

    params = {
        'fields': fields,
        'latest.school.degrees_awarded.highest': 4,
        'latest.student.size__range': '1000..',
        'page': page_number,
        'per_page': page_size,
        'sort': sort_key,
        'api_key': API_KEY
    }
    data = requests.get(BASE_URL, params=params).json()['results']

    for university in data:
        rows.append([
            university['school.name'],
            university['latest.admissions.admission_rate.overall'],
            university['latest.student.size'],
            university['latest.cost.tuition.out_of_state'],
            university['latest.cost.tuition.in_state'],
            university['latest.student.demographics.median_hh_income'],
            university['latest.school.endowment.begin']
        ])
    
    # Wait 1 second between API calls to be nicer on the host servers
    sleep(1)

columns = ['Name', 'Admission Rate', 'Size', 'Tuition Out of State', 'Tuition In State',
           'Median Household Income', 'Endowment']
df = pd.DataFrame(rows, columns=columns)

# Display the dataframe
df
Name Admission Rate Size Tuition Out of State Tuition In State Median Household Income Endowment
0 Abilene Christian University 0.6388 3129 42380.0 42380.0 67136.0 6.055984e+08
1 Academy of Art University NaN 4131 28024.0 28024.0 74015.0 NaN
2 Adams State University NaN 1239 21848.0 9776.0 50726.0 6.267400e+04
3 Adelphi University 0.7751 5077 47290.0 47290.0 80864.0 2.123457e+08
4 Adrian College 0.6837 1635 40556.0 40556.0 66915.0 4.106072e+07
... ... ... ... ... ... ... ...
1145 Xavier University of Louisiana 0.7448 2534 27868.0 27868.0 55657.0 1.627134e+08
1146 Yale University 0.0450 6811 64700.0 64700.0 75345.0 4.138326e+10
1147 Yeshiva University 0.6375 2942 49900.0 49900.0 78671.0 5.183480e+08
1148 York College of Pennsylvania 0.9420 3187 24606.0 24606.0 73378.0 1.709745e+08
1149 Youngstown State University 0.8032 7340 11151.0 10791.0 51307.0 3.175548e+08

1150 rows × 7 columns

We can query the resulting dataframe to find the data for specific universities:

df[df['Name'] == 'The University of Alabama']
Name Admission Rate Size Tuition Out of State Tuition In State Median Household Income Endowment
829 The University of Alabama 0.7582 32323 33200.0 11900.0 57928.0 1.144633e+09

We can also query the dataframe to find the data for universities that satisfy certain conditions:

df[df['Admission Rate'] < 0.1]
Name Admission Rate Size Tuition Out of State Tuition In State Median Household Income Endowment
95 Brown University 0.0523 7273 68230.0 68230.0 79027.0 6.141243e+09
104 California Institute of Technology 0.0314 1023 63255.0 63255.0 81448.0 3.390504e+09
183 Columbia University in the City of New York 0.0423 8899 69045.0 69045.0 76971.0 1.327985e+10
196 Cornell University 0.0816 15935 66014.0 66014.0 80346.0 9.349247e+09
215 Dartmouth College 0.0623 4367 65739.0 65739.0 79834.0 8.065743e+09
235 Duke University 0.0678 6417 65805.0 65805.0 78468.0 1.211626e+10
344 Harvard University 0.0345 7755 59076.0 59076.0 76879.0 5.087768e+10
392 Johns Hopkins University 0.0756 5617 63340.0 63340.0 81539.0 8.244472e+09
468 Massachusetts Institute of Technology 0.0474 4571 60156.0 60156.0 77426.0 2.460081e+10
549 New York University 0.0941 29430 60438.0 60438.0 82106.0 5.235504e+09
566 Northeastern University 0.0565 15719 63141.0 63141.0 80190.0 1.434086e+09
578 Northwestern University 0.0715 8960 65997.0 65997.0 81811.0 1.087985e+10
630 Princeton University 0.0450 5579 59710.0 59710.0 81428.0 3.512622e+10
652 Rice University 0.0788 4562 58128.0 58128.0 77707.0 7.844941e+09
761 Stanford University 0.0391 7841 62484.0 62484.0 80275.0 3.633879e+10
888 University of California-Los Angeles 0.0873 33040 44524.0 13747.0 72896.0 2.903804e+09
899 University of Chicago 0.0479 7540 66939.0 66939.0 74573.0 8.569802e+09
999 University of Pennsylvania 0.0587 10768 66104.0 66104.0 78252.0 2.072435e+10
1069 Vanderbilt University 0.0628 7143 63946.0 63946.0 76279.0 1.020607e+10
1133 Williams College 0.0999 2060 64860.0 64860.0 77966.0 3.341413e+09
1146 Yale University 0.0450 6811 64700.0 64700.0 75345.0 4.138326e+10
df[df['Endowment'] > 1.0e+10]
Name Admission Rate Size Tuition Out of State Tuition In State Median Household Income Endowment
183 Columbia University in the City of New York 0.0423 8899 69045.0 69045.0 76971.0 1.327985e+10
235 Duke University 0.0678 6417 65805.0 65805.0 78468.0 1.211626e+10
265 Emory University 0.1110 7275 60774.0 60774.0 80509.0 1.115540e+10
344 Harvard University 0.0345 7755 59076.0 59076.0 76879.0 5.087768e+10
468 Massachusetts Institute of Technology 0.0474 4571 60156.0 60156.0 77426.0 2.460081e+10
578 Northwestern University 0.0715 8960 65997.0 65997.0 81811.0 1.087985e+10
630 Princeton University 0.0450 5579 59710.0 59710.0 81428.0 3.512622e+10
761 Stanford University 0.0391 7841 62484.0 62484.0 80275.0 3.633879e+10
809 Texas A&M University-College Station 0.6325 59099 40328.0 13099.0 67194.0 1.721950e+10
955 University of Michigan-Ann Arbor 0.1794 33488 58072.0 17228.0 77145.0 1.710443e+10
995 University of Notre Dame 0.1238 8923 62693.0 62693.0 76710.0 1.710111e+10
999 University of Pennsylvania 0.0587 10768 66104.0 66104.0 78252.0 2.072435e+10
1069 Vanderbilt University 0.0628 7143 63946.0 63946.0 76279.0 1.020607e+10
1091 Washington University in St Louis 0.1196 7897 62982.0 62982.0 79298.0 1.224276e+10
1146 Yale University 0.0450 6811 64700.0 64700.0 75345.0 4.138326e+10

5. Retrieving All Data for an Institution#

The College Scorecard API can also be used to retrieve all of the data for a particular institution. The example below finds all data for The University of Alabama:

params = {
    'school.name': 'The University of Alabama',
    'api_key': API_KEY
}
api_data = requests.get(BASE_URL, params=params).json()

# Print structure of the result
pprint(api_data["results"][0], depth=1)
{'fed_sch_cd': '001051',
 'id': 100751,
 'latest': {...},
 'location': {...},
 'ope6_id': '001051',
 'ope8_id': '00105100',
 'school': {...}}

Finally, we’ll look at the breakdown of size of each program at the University of Alabama:

program_percentage_data = api_data['results'][0]['latest']['academics']['program_percentage']
programs = list(program_percentage_data.keys())
percentages = list(program_percentage_data.values())

smallest_percent_allowed = 0.015 # Any sector under this threshold will be added to "other"
small_slices = 0 # The number of slices below the threshold above

# Count the number of small slices
small_slices = sum(1 for i in percentages if i < smallest_percent_allowed)

# If there is more than one slice smaller than the threshold, combine those slices into "other"
if small_slices > 1:
    other_percentage = 0
    i = len(program_percentage_data) - 1
    while i >= 0:
        if percentages[i] < smallest_percent_allowed:
            other_percentage += percentages[i]
            programs.pop(i)
            percentages.pop(i)
        i -= 1
    percentages.append(other_percentage)
    programs.append("other")

# Configure color of pie chart
cmap = plt.get_cmap('YlGn_r')
colors = cmap(np.linspace(0.2, 0.8, len(percentages)))

fig, ax = plt.subplots()
plt.pie(percentages, labels=programs, autopct='%1.1f%%', colors=colors)
plt.title("Program Percentage at The University of Alabama")

plt.show()
../_images/e3faf4aac9679bd06c3644fb0bbf3b7ccaa2a61ef9f8d82a097d3ed31d9550b3.png
# Sort the dictionary by keys
sorted_program_percentage_data = dict(
    sorted(program_percentage_data.items(), key=lambda x: x[1], reverse=True)
)

# Print the sorted dictionary
for key, value in sorted_program_percentage_data.items():
    print(f'{key}: {value}')
business_marketing: 0.3011
health: 0.0966
engineering: 0.0966
communication: 0.0942
social_science: 0.0799
family_consumer_science: 0.0696
psychology: 0.0513
biological: 0.0389
parks_recreation_fitness: 0.0286
education: 0.0272
visual_performing: 0.0217
multidiscipline: 0.0172
computer: 0.0144
history: 0.0117
public_administration_social_service: 0.0109
physical_science: 0.0106
english: 0.0097
mathematics: 0.0089
language: 0.0045
resources: 0.0038
philosophy_religious: 0.0018
ethnic_cultural_gender: 0.0009
legal: 0
library: 0
military: 0
humanities: 0
agriculture: 0
architecture: 0
construction: 0
transportation: 0
personal_culinary: 0
science_technology: 0
precision_production: 0
engineering_technology: 0
security_law_enforcement: 0
communications_technology: 0
mechanic_repair_technology: 0
theology_religious_vocation: 0