College Scorecard API in Python

College Scorecard API in Python#

by Michael T. Moen

The College Scorecard API is an online tool hosted by the U.S. Department of Education that contains data concerning higher education institutions.

Please see the following resources for more information on API usage:

Documentation
- College Scorecard API Documentation
Data Reuse
- College Scorecard Copyright Status
- Data.gov Privacy Policy

NOTE: The College Scorecard API limits requests to a maximum of 1000 requests per IP address per hour.

These recipe examples were tested on February 3, 2026.

Setup#

Import Libraries#

The following external libraries need to be installed into your environment to run the code examples in this tutorial:

We import the libraries used in this tutorial below:

import requests
from pprint import pprint
from time import sleep
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from dotenv import load_dotenv
import os

Import API Key#

An API key is required to access the College Scorecard API. You can sign up for one at the College Scorecard Website.

We keep our API key in a .env file and use the dotenv library to access it. If you would like to use this method, create a .env file and add the following line to it:

COLLEGE_SCORECARD_API_KEY=PUT_YOUR_API_KEY_HERE

load_dotenv()
try:
    API_KEY = os.environ["COLLEGE_SCORECARD_API_KEY"]
except KeyError:
    print("API key not found. Please set 'COLLEGE_SCORECARD_API_KEY' in your .env file.")

1. Get Names of All Institutions#

To start, we’ll use a basic query to find the names of all educational institutions recognized by the College Scorecard API.

All of the data for the API can be found using the v1/schools endpoint.

Fields in the College Scorecard API are accessed with a <time>.<category>.<name> sequence:

<time> indicates the year of the data to be accessed. To access the most recent data, use latest.
<category> and <name> can be found in the Data Dictionary file that can be downloaded from the API’s documentation. The <category> of a field is given by the dev-category column in the Institution_Data_Dictionary section, and the <name> is given by the developer-friendly name column.

BASE_URL = 'http://api.data.gov/ed/collegescorecard/v1/schools'
params = {
    'fields': 'school.name',
    'api_key': API_KEY
}

names = requests.get(BASE_URL, params=params).json()

# Display resulting metadata
names['metadata']

{'page': 0, 'total': 6429, 'per_page': 20}

The total value indicates the total number results returned in this query. These results are paginated, so each query will return only the number indicated by page_size, which has a default value of 20 and a maximum value of 100. The page number is indicated by page, which by default is set to 0.

We can use a loop to create an API request for each page:

field = 'school.name'
sort_key = 'school.name'
page_size = 100

# Calculate the number of loops needed to page through every result
total_pages = (names['metadata']['total'] // page_size) + 1

institution_names = []

# Loop through each page of the dataset, sending a request for each page
for page_number in range(total_pages):

    params = {
        'fields': field,
        'page': page_number,
        'per_page': page_size,
        'sort': sort_key,
        'api_key': API_KEY
    }
    name_data = requests.get(BASE_URL, params=params).json()['results']

    for university in name_data:
        institution_names.append(university['school.name'])

    # Wait 1 second between API calls to be nicer on the host servers
    sleep(1)

# Display number of institution names found
len(institution_names)

# Print first 10 institution names
institution_names[:10]

['A Better U Beauty Barber Academy',
 'A T Still University of Health Sciences',
 'Aaniiih Nakoda College',
 'ABC Adult School',
 'ABC Adult School - Cabrillo Lane',
 'ABC Beauty Academy',
 'ABCO Technology',
 'Abcott Institute',
 'Abilene Christian University',
 'Abilene Christian University-Undergraduate Online']

2. Get Names of All Universities#

College Scorecard API requests can also take conditions to only select certain institutions.

In this example, we limit the results to only include institutions that award graduate degrees. In order to do this, we set the degrees_awarded.highest parameter to 4 to indicate that the highest degree awarded by an institution is a graduate degree. This information is within the Institution_Data_Dictionary section of the College Scorecard data disctionary.

field = 'school.name'
sort_key = 'school.name'
page_size = 100

# Calculate the number of loops needed to page through every result
params = {
    'fields': field,
    'latest.school.degrees_awarded.highest': 4,
    'api_key': API_KEY
}
name_metadata = requests.get(BASE_URL, params=params).json()['metadata']
total_pages = (name_metadata['total'] // page_size) + 1

university_names = []

for page_number in range(total_pages):

    params = {
        'fields': field,
        'latest.school.degrees_awarded.highest': 4,
        'sort': sort_key,
        'page': page_number,
        'per_page': page_size,
        'api_key': API_KEY
    }
    name_data = requests.get(BASE_URL, params=params).json()['results']

    for university in name_data:
        university_names.append(university['school.name'])

    # Wait 1 second between API calls to be nicer on the host servers
    sleep(1)

# Display the number of university names found
len(university_names)

# Print first 10 university names
university_names[:10]

['A T Still University of Health Sciences',
 'Abilene Christian University',
 'Abraham Lincoln University',
 'Academy for Five Element Acupuncture',
 'Academy for Jewish Religion',
 'Academy for Jewish Religion California',
 'Academy of Art University',
 'Academy of Chinese Culture and Health Sciences',
 'Academy of Vocal Arts',
 'Acupuncture and Integrative Medicine College-Berkeley']

3. Find Number of Universities by State#

The school.state_fips data element contains a number that corresponds to each state. This mapping is given below:

states = {
'Alabama', 2: 'Alaska', 4: 'Arizona', 5: 'Arkansas', 6: 'California', 8: 'Colorado',
'Connecticut', 10: 'Delaware', 11: 'District of Columbia', 12: 'Florida', 13: 'Georgia',
'Hawaii', 16: 'Idaho', 17: 'Illinois', 18: 'Indiana', 19: 'Iowa', 20: 'Kansas',
'Kentucky', 22: 'Louisiana', 23: 'Maine', 24: 'Maryland', 25: 'Massachusetts',
'Michigan', 27: 'Minnesota', 28: 'Mississippi', 29: 'Missouri', 30: 'Montana',
'Nebraska', 32: 'Nevada', 33: 'New Hampshire', 34: 'New Jersey', 35: 'New Mexico',
'New York', 37: 'North Carolina', 38: 'North Dakota', 39: 'Ohio', 40: 'Oklahoma',
'Oregon', 42: 'Pennsylvania', 44: 'Rhode Island', 45: 'South Carolina',
'South Dakota', 47: 'Tennessee', 48: 'Texas', 49: 'Utah', 50: 'Vermont',
'Virginia', 53: 'Washington', 54: 'West Virginia', 55: 'Wisconsin', 56: 'Wyoming',
'American Samoa', 64: 'Federated States of Micronesia', 66: 'Guam',
'Northern Mariana Islands', 70: 'Palau', 72: 'Puerto Rico', 78: 'Virgin Islands'
}

Using this mapping, we can find the number of universities in each state:

field = 'latest.school.state_fips'
page_size = 100

# Calculate the number of loops needed to page through every result
params = {
    'latest.school.degrees_awarded.highest': 4,
    'fields': field,
    'api_key': API_KEY
}
name_metadata = requests.get(BASE_URL, params=params).json()['metadata']
total_pages = (name_metadata['total'] // page_size) + 1

state_freq = {}
for page_number in range(total_pages):

    params = {
        'latest.school.degrees_awarded.highest': 4,
        'fields': field,
        'page': page_number,
        'per_page': page_size,
        'api_key': API_KEY
    }
    state_data = requests.get(BASE_URL, params=params).json()['results']

    for university in state_data:
        state = states[university['latest.school.state_fips']]
        state_freq[state] = state_freq.get(state, 0) + 1
    
    # Wait 1 second between API calls to be nicer on the host servers
    sleep(1)

Now, we can sort and display the results:

# Sort states by number of universities in descending order
sorted_states = sorted(state_freq.items(), key=lambda x: x[1], reverse=True)

# Print the top 20 states/territories with the most universities
for state_name, num_universities in sorted_states[:20]:
    print(f'{state_name:<15} {num_universities}')

California      201
New York        154
Pennsylvania    112
Texas           103
Illinois        79
Florida         75
Massachusetts   73
Ohio            69
North Carolina  57
Missouri        55
Virginia        53
Indiana         50
Georgia         49
Tennessee       47
Puerto Rico     47
Michigan        46
Minnesota       39
New Jersey      38
Wisconsin       37
South Carolina  35

4. Retrieving Multiple Data Points in a Single Query#

The following example uses multiple conditions and multiple fields. The conditions in the query are separated by & while the fields are separated by ,.

# Use .join to add ',' between the elements in the list of fields
fields = ','.join([
    'school.name',
    'latest.admissions.admission_rate.overall',
    'latest.student.size',
    'latest.cost.tuition.out_of_state',
    'latest.cost.tuition.in_state',
    'latest.student.demographics.median_hh_income',
    'latest.school.endowment.begin'
])
sort_key = 'school.name'
page_size = 100

# Calculate the number of loops needed to page through every result
params = {
    'fields': fields,
    'latest.school.degrees_awarded.highest': 4,
    'latest.student.size__range': '1000..', # Schools with 1000 or more students
    'api_key': API_KEY
}
name_metadata = requests.get(BASE_URL, params=params).json()['metadata']
total_pages = (name_metadata['total'] // page_size) + 1

rows = []

for page_number in range(total_pages):

    params = {
        'fields': fields,
        'latest.school.degrees_awarded.highest': 4,
        'latest.student.size__range': '1000..',
        'page': page_number,
        'per_page': page_size,
        'sort': sort_key,
        'api_key': API_KEY
    }
    data = requests.get(BASE_URL, params=params).json()['results']

    for university in data:
        rows.append([
            university['school.name'],
            university['latest.admissions.admission_rate.overall'],
            university['latest.student.size'],
            university['latest.cost.tuition.out_of_state'],
            university['latest.cost.tuition.in_state'],
            university['latest.student.demographics.median_hh_income'],
            university['latest.school.endowment.begin']
        ])
    
    # Wait 1 second between API calls to be nicer on the host servers
    sleep(1)

columns = ['Name', 'Admission Rate', 'Size', 'Tuition Out of State', 'Tuition In State',
           'Median Household Income', 'Endowment']
df = pd.DataFrame(rows, columns=columns)

# Display the dataframe
df

	Name	Admission Rate	Size	Tuition Out of State	Tuition In State	Median Household Income	Endowment
0	Abilene Christian University	0.6388	3129	42380.0	42380.0	67136.0	6.055984e+08
1	Academy of Art University	NaN	4131	28024.0	28024.0	74015.0	NaN
2	Adams State University	NaN	1239	21848.0	9776.0	50726.0	6.267400e+04
3	Adelphi University	0.7751	5077	47290.0	47290.0	80864.0	2.123457e+08
4	Adrian College	0.6837	1635	40556.0	40556.0	66915.0	4.106072e+07
...	...	...	...	...	...	...	...
1145	Xavier University of Louisiana	0.7448	2534	27868.0	27868.0	55657.0	1.627134e+08
1146	Yale University	0.0450	6811	64700.0	64700.0	75345.0	4.138326e+10
1147	Yeshiva University	0.6375	2942	49900.0	49900.0	78671.0	5.183480e+08
1148	York College of Pennsylvania	0.9420	3187	24606.0	24606.0	73378.0	1.709745e+08
1149	Youngstown State University	0.8032	7340	11151.0	10791.0	51307.0	3.175548e+08

1150 rows × 7 columns

We can query the resulting dataframe to find the data for specific universities:

df[df['Name'] == 'The University of Alabama']

	Name	Admission Rate	Size	Tuition Out of State	Tuition In State	Median Household Income	Endowment
829	The University of Alabama	0.7582	32323	33200.0	11900.0	57928.0	1.144633e+09

We can also query the dataframe to find the data for universities that satisfy certain conditions:

df[df['Admission Rate'] < 0.1]

	Name	Admission Rate	Size	Tuition Out of State	Tuition In State	Median Household Income	Endowment
95	Brown University	0.0523	7273	68230.0	68230.0	79027.0	6.141243e+09
104	California Institute of Technology	0.0314	1023	63255.0	63255.0	81448.0	3.390504e+09
183	Columbia University in the City of New York	0.0423	8899	69045.0	69045.0	76971.0	1.327985e+10
196	Cornell University	0.0816	15935	66014.0	66014.0	80346.0	9.349247e+09
215	Dartmouth College	0.0623	4367	65739.0	65739.0	79834.0	8.065743e+09
235	Duke University	0.0678	6417	65805.0	65805.0	78468.0	1.211626e+10
344	Harvard University	0.0345	7755	59076.0	59076.0	76879.0	5.087768e+10
392	Johns Hopkins University	0.0756	5617	63340.0	63340.0	81539.0	8.244472e+09
468	Massachusetts Institute of Technology	0.0474	4571	60156.0	60156.0	77426.0	2.460081e+10
549	New York University	0.0941	29430	60438.0	60438.0	82106.0	5.235504e+09
566	Northeastern University	0.0565	15719	63141.0	63141.0	80190.0	1.434086e+09
578	Northwestern University	0.0715	8960	65997.0	65997.0	81811.0	1.087985e+10
630	Princeton University	0.0450	5579	59710.0	59710.0	81428.0	3.512622e+10
652	Rice University	0.0788	4562	58128.0	58128.0	77707.0	7.844941e+09
761	Stanford University	0.0391	7841	62484.0	62484.0	80275.0	3.633879e+10
888	University of California-Los Angeles	0.0873	33040	44524.0	13747.0	72896.0	2.903804e+09
899	University of Chicago	0.0479	7540	66939.0	66939.0	74573.0	8.569802e+09
999	University of Pennsylvania	0.0587	10768	66104.0	66104.0	78252.0	2.072435e+10
1069	Vanderbilt University	0.0628	7143	63946.0	63946.0	76279.0	1.020607e+10
1133	Williams College	0.0999	2060	64860.0	64860.0	77966.0	3.341413e+09
1146	Yale University	0.0450	6811	64700.0	64700.0	75345.0	4.138326e+10

df[df['Endowment'] > 1.0e+10]

	Name	Admission Rate	Size	Tuition Out of State	Tuition In State	Median Household Income	Endowment
183	Columbia University in the City of New York	0.0423	8899	69045.0	69045.0	76971.0	1.327985e+10
235	Duke University	0.0678	6417	65805.0	65805.0	78468.0	1.211626e+10
265	Emory University	0.1110	7275	60774.0	60774.0	80509.0	1.115540e+10
344	Harvard University	0.0345	7755	59076.0	59076.0	76879.0	5.087768e+10
468	Massachusetts Institute of Technology	0.0474	4571	60156.0	60156.0	77426.0	2.460081e+10
578	Northwestern University	0.0715	8960	65997.0	65997.0	81811.0	1.087985e+10
630	Princeton University	0.0450	5579	59710.0	59710.0	81428.0	3.512622e+10
761	Stanford University	0.0391	7841	62484.0	62484.0	80275.0	3.633879e+10
809	Texas A&M University-College Station	0.6325	59099	40328.0	13099.0	67194.0	1.721950e+10
955	University of Michigan-Ann Arbor	0.1794	33488	58072.0	17228.0	77145.0	1.710443e+10
995	University of Notre Dame	0.1238	8923	62693.0	62693.0	76710.0	1.710111e+10
999	University of Pennsylvania	0.0587	10768	66104.0	66104.0	78252.0	2.072435e+10
1069	Vanderbilt University	0.0628	7143	63946.0	63946.0	76279.0	1.020607e+10
1091	Washington University in St Louis	0.1196	7897	62982.0	62982.0	79298.0	1.224276e+10
1146	Yale University	0.0450	6811	64700.0	64700.0	75345.0	4.138326e+10

5. Retrieving All Data for an Institution#

The College Scorecard API can also be used to retrieve all of the data for a particular institution. The example below finds all data for The University of Alabama:

params = {
    'school.name': 'The University of Alabama',
    'api_key': API_KEY
}
api_data = requests.get(BASE_URL, params=params).json()

# Print structure of the result
pprint(api_data["results"][0], depth=1)

{'fed_sch_cd': '001051',
 'id': 100751,
 'latest': {...},
 'location': {...},
 'ope6_id': '001051',
 'ope8_id': '00105100',
 'school': {...}}

Finally, we’ll look at the breakdown of size of each program at the University of Alabama:

program_percentage_data = api_data['results'][0]['latest']['academics']['program_percentage']
programs = list(program_percentage_data.keys())
percentages = list(program_percentage_data.values())

smallest_percent_allowed = 0.015 # Any sector under this threshold will be added to "other"
small_slices = 0 # The number of slices below the threshold above

# Count the number of small slices
small_slices = sum(1 for i in percentages if i < smallest_percent_allowed)

# If there is more than one slice smaller than the threshold, combine those slices into "other"
if small_slices > 1:
    other_percentage = 0
    i = len(program_percentage_data) - 1
    while i >= 0:
        if percentages[i] < smallest_percent_allowed:
            other_percentage += percentages[i]
            programs.pop(i)
            percentages.pop(i)
        i -= 1
    percentages.append(other_percentage)
    programs.append("other")

# Configure color of pie chart
cmap = plt.get_cmap('YlGn_r')
colors = cmap(np.linspace(0.2, 0.8, len(percentages)))

fig, ax = plt.subplots()
plt.pie(percentages, labels=programs, autopct='%1.1f%%', colors=colors)
plt.title("Program Percentage at The University of Alabama")

plt.show()

../_images/e3faf4aac9679bd06c3644fb0bbf3b7ccaa2a61ef9f8d82a097d3ed31d9550b3.png

# Sort the dictionary by keys
sorted_program_percentage_data = dict(
    sorted(program_percentage_data.items(), key=lambda x: x[1], reverse=True)
)

# Print the sorted dictionary
for key, value in sorted_program_percentage_data.items():
    print(f'{key}: {value}')

business_marketing: 0.3011
health: 0.0966
engineering: 0.0966
communication: 0.0942
social_science: 0.0799
family_consumer_science: 0.0696
psychology: 0.0513
biological: 0.0389
parks_recreation_fitness: 0.0286
education: 0.0272
visual_performing: 0.0217
multidiscipline: 0.0172
computer: 0.0144
history: 0.0117
public_administration_social_service: 0.0109
physical_science: 0.0106
english: 0.0097
mathematics: 0.0089
language: 0.0045
resources: 0.0038
philosophy_religious: 0.0018
ethnic_cultural_gender: 0.0009
legal: 0
library: 0
military: 0
humanities: 0
agriculture: 0
architecture: 0
construction: 0
transportation: 0
personal_culinary: 0
science_technology: 0
precision_production: 0
engineering_technology: 0
security_law_enforcement: 0
communications_technology: 0
mechanic_repair_technology: 0
theology_religious_vocation: 0