May 30, 2019
- Statistical Science Web
- Journal of Statistical Education data archive. These data sets were collected for the purpose of teaching statistics. One advantage of them is that there is often a JSE article associated with the data which gives context and a statistically informed discussion. Here are a very few …
- Body shape measurements
- African conflict data
- Gardacil vaccine study
- NASCAR
- Profiles from OK Cupid. This is the GitHub repo which contains the PDF of the article, as well as the codebook and dataset. 32 variables (some of which are text) and about 60,000 cases.
- Data and story library There is a very large number of data sets, many of which are derived from public sources and cleaned for some purpose.
- Framingham Heart Study cholesterol, smoking, mortality, …
- Our World in Data which has numerous data sets about health, education, work, violence, media, culture, ….
- Data used by William Gossett in his study of what came to be called the t distribution
- New York Bridge inspection data
- Heart attack charges
- Data sets from the Department of Biostatistics at Vanderbilt University. (Many of the other links on this page come from this site.)
- Data Sources on the Web
- Jo Hardin’s Dynamic Data in the Statistics Classroom
- Australasian Data and Story Library, containing a large number of interesting datasets, many pertaining to Australia
- CRASH datasets
- DryadLab
- StatLib Repository at Carnegie Mellon University.
- GapMinder
- Web-scraping programs from students in the NYC Data Science Academy.
Medical research
- Centers for Disease Control
- NIH You have to request the data, but the site is immediately valuable as a source of data collection forms used in clinical (especially cardiovascular) studies.
- NHLBI
- International Stroke Trial dataset
- Physionet ICU data
- Pooled Resource Open-Access ALS Clinical Trials Database - contains high-quality data with time to event and ordinal scale outcomes. The data may be useful for assessing differential treatment effect (often called HTE - hetogeneous treatment effect) for Riluzole. The database was created by a non-profit organization, Prize4Life
- Clinical Study Data Request - a wealth of data from clinical trails done by the pharmaceutical industry
- Datasets for research use from the National Heart, Lung, and Blood Institute of the U.S. National Institutes of Health
Other research
- Geospatial and Statistical Data Center of the University of Virginia. See especially the City and County Data Books.
- U.S. Joint Global Ocean Flux Study
- Consortium for International Earth Science Information Network Dataset Guide
- UCI Machine Learning Repository
comments powered by Disqus