Available Data Sources
The AHEAD Institute warehouses several large databases and assists in identifying, obtaining and maintaining data to meet your research needs.
Many databases are de-identified and using them has been deemed non-human subjects research by the Saint Louis University Institutional Review Board. Other databases require data use agreements and special training. Below is a brief description of the primary data sources available within the institute.
The SLU/SSM Virtual Data Warehouse (VDW) captures academic and non- academic ambulatory and inpatient clinical encounters from more than 5 million patients (birth to age > 90 years) starting in 1/1/2008 to present. The VDW is updated monthly. SLU/SSM is a member site of the Health Care Systems Research Network (HCSRN), and the VDW was created per HCSRN specifications.
This data source captures electronic health record (EHR) information from rural and urban settings in the St. Louis, Missouri, metropolitan area, mid-Missouri, southern Illinois, Oklahoma City, Oklahoma and surrounding areas, and southern Wisconsin. The VDW variables include ICD-9 and ICD-10 diagnostic codes; Current Procedural Terminology (CPT), ICD-9-PCS, and ICD-10-PCS procedure codes; prescription orders; laboratory orders and results; vital signs; provider and clinic type; and demographics.
VDW usage requires a data use agreement and a data request.
If you have questions about this data source, please email joanne.salas@health.slu.edu.
The Healthcare Cost and Utilization Project (HCUP) includes the largest collection of longitudinal hospital care data in the United States.
The AHEAD Institute warehouses the following HCUP data sources:
- Nationwide Inpatient Sample (1998-2017)
- The National (Nationwide) Inpatient Sample (NIS) is the largest publicly available all payer inpatient care database in the United States, containing data on more than seven million hospital stays. Its large sample size is ideal for developing national and regional estimates and enables analyses of rare conditions, uncommon treatments and special populations.
- Kids’ Inpatient Database (1997-2016)
- The Kids' Inpatient Database (KID) is the largest publicly available all-payer pediatric inpatient care database in the United States, containing data from two to three million hospital stays. Its large sample size is ideal for developing national and regional estimates and enables analyses of rare conditions, such as congenital anomalies, as well as uncommon treatments, such as organ transplantation. The KID has been produced every three years.
- Nationwide Emergency Department Sample (2003-2017)
- The Nationwide Emergency Department Sample (NEDS) produces national estimates about emergency department visits across the country. The NEDS describes emergency department visits, regardless of whether they result in admission. One of the most distinctive features of the NEDS is its large sample size, which allows for analysis across hospital types and the study of relatively uncommon disorders and procedures.
- Nationwide Readmissions Database (2016-2019)
- The Nationwide Readmissions Database (NRD) is a unique and powerful database designed to support various types of analyses of national readmission rates for all patients regardless of the expected payer for the hospital stay. The NRD includes discharges for patients with and without repeat hospital visits in a year and those who have died in the hospital. Repeat stays may or may not be related. This database addresses a large gap in health care data: the lack of nationally representative information on hospital readmissions for all ages.
HCUP usage requires a data use agreement and a data request.
If you have questions about this data source, please email paula.buchanan@health.slu.edu.
TriNetX is a global research network with a federated architecture. This database provides access to more than 200 million, de-identified patient lives (EHR), including demographics, diagnoses, procedures, medication orders, lab results and vital signs.
TriNetX Data Snapshot:- Free of charge to unfunded, contributing health care organizations
- Access to more than 200 million, de-identified patient lives (EHR)
- EHR (demographics, diagnoses, procedures, medication orders, lab results and vital signs)
- Access to linked third-party claims and mortality data (subset of about 11.6 million)
- Limited access: Notes/reports (NLP), cancer registry, tumor registry
AHEAD personnel serve as the liaison between primary investigators and TriNetX to manage the data request process. TriNetX usage requires a data use agreement and a data request approval by TriNetX.
If you have questions about this data source, please email joanne.salas@health.slu.edu.
The All of Us Research Program from the National Institutes of Health (NIH) is one of the most diverse health databases in history. The All of Us research Hub stores health data from diverse participants across the United States, focusing on underrepresented minorities in biomedical research (POC, LGBTQIA). This database includes EHR, surveys, physical measurements, genomics and digital health data from >507,000 participants recruited from partner sites (via EHR).
Thanks to our partnership with the All of Us Research Program, any individual with a health.slu.edu email address can sign up online to access data snapshots. Click the button below to register for the All of Us Research Program and start accessing data snapshots today!
New researchers in the All of Us program will receive an initial $300 credit to kickstart their projects, assisting with preliminary storage and computational needs. These initial credits are tied to the original creator of a workspace. Once the credits are exhausted, users can resume analysis by creating or adding their own Google Billing Account. Workspaces can be shared with AHEAD personnel.
Short training courses on the National Institutes of Health website are required. Data cannot be downloaded or used outside the platform. If your research project requires additional advanced programming and analytics, please submit a service request. AHEAD personnel can use the All of Us advanced data tools (i.e. Jupyter Notebook, R-Studio) to meet your project needs.
Watch the All of Us tutorial video for an overview and demonstration of the platform's resources. The AHEAD institute can provide training materials upon request.
If you have questions about this data source, please email ahead@health.slu.edu.
Public Data Sources
The AHEAD Institute works with publicly available databases accessible via organizational websites including the Centers for Disease Control and the U.S. Department of Health and Human Services. Some examples include CDC Wonder, CDC National Center for Health Statistics and Inter-university Consortium for Political and Social Research.
Other Data Sources
The AHEAD team can work with investigator supplied data, Epic pulls, and retrospective chart reviews. The AHEAD team will work alongside investigators to select the ideal data source for all research projects.