Astronomy, Aeronautics + Space Exploration
- NASA: Data.nasa.gov – directory of NASA-related datasets
- Space Telescope (optical): Sloan Digital Sky Survey
- Space Telescope (optical): Hubble Legacy Archive – search utility to find archival Hubble image. Has nice display feature and access to FITS format (raw image data)
- NASA JPL Planetary photojournal
- MAST archive (The Multimission Archive at STScI supports a variety of astronomical data archives, with a primary focus on scientifically related data sets in the optical, ultraviolet, and near-infrared parts of the spectrum Gold Parties) http://archive.stsci.edu/missions.html
- Space Telescope (radio): SETI Institute radio telescope data sets – “We hope that you can analyze this data in different ways, and come up with signals we may have missed.”
- MERLIN Data Archive – the catalogue of processed MERLIN data (UK-based radio interferometer) since June 1991.
- European VLBI Network Archive – radio observations combining data from many telescopes across Europe.
- Exoplanets: Exoplanet orbit database – properties of planets orbiting stars other than the Sun. See also the Extrasolar Planets Encyclopaedia and the Visual Exoplanet Catalogue.
- Satellites (orbital): KML files for satellites.
- Pulsars: The ATNF Pulsar Database (visualization of known pulsars)
- Galaxies: Galaxy Zoo – more than 40 million morphological classifications of 900,000 SDSS galaxies by over 100,000 people.
- Space Telescope Science Institute: STScI (http://archive.stsci.edu/)
- NASA/IPAC Infrared Data Archive (http://irsa.ipac.caltech.edu/)
- Papers data for astronomy and others (ADS: http://adswww.harvard.edu/, arXiv: http://arxiv.org/)
- NASA SkyView: multiwavelength, all-sky data server. (http://skyview.gsfc.nasa.gov/)
- VizieR: a very complete collection of astronomical databases. http://vizier.u-strasbg.fr/viz-bin/VizieR. For instance get the entire Tycho-II catalog (every star in the sky brighter than 12th mag) or the Hipparcos catalog. (several hundred thousand stars with distances)
- Space Telescope (Gamma Rays): Fermi gamma-ray telescope — it’s got “fresh data smell” — download photon lists from yesterday! http://fermi.gsfc.nasa.gov/cgi-bin/ssc/LAT/WeeklyFiles.cgi and http://fermi.gsfc.nasa.gov/ssc/data/access/
- Comets: 1,300 images of Comet Holmes, with locations on the sky — see project list for details about this one, orhttp://www.astro.princeton.edu/~dstn/temp/scihackday/ , you want the files starting with “holmes”.
- Stars: Tycho-2 catalog — 2.5 million brightest stars — http://www.astro.princeton.edu/~dstn/temp/scihackday/tycho2-cut.fits
- USNO-B all-sky survey images, NGC/IC galaxy catalogs, constellations, and bright named stars — in http://astrometry.net/downloadscode package: util/ngc2000.py , util/brightstars.c, util/constellations.c, util/usnob_get_images.py or browse at http://trac.astrometry.net/browser/trunk/src/astrometry/util
- NASA: NASA Ames Open Source Projects
- NASA: NASA Goddard Open Source Projects
- Aeronautics: Discovery in Aeronautics Systems Health (DASHlink) https://c3.nasa.gov/dashlink/resources/?sort=-created&type=28
- Space Telescope (Radar): ESA SAR Toolbox
Biology + Life Sciences
- Wildlife: BBC Wildlife Finder the resources are available as RDF/XML (eitehr add .rdf to the end of the URL or via conneg). details of the ontology here: http://purl.org/ontology/wo/ and some background here: http://www.slideshare.net/derivadow/apis-and-apis-a-wildlife-ontology.
- NBN Gateway is used to explore UK biodiversity data. It contains over 50 million species records covering England, Scotland, Wales and Northern Ireland. Data are available via the website, web services or as tab delimited files. Support is provided via the community forum.
- Bio2RDF has lots of biological information as linked data (RDF).
- Birds: Avian Knowledge Network. Bird monitoring data resources represent arguably the most comprehensive time-series environmental data in existence. These data, gathered by hundreds of independent projects, have collected an estimated 60 million records over the past 100 years. Lots of range/distribution data. Download prepackaged data sets or query the database.
- Birds: International Ornithological Congress world bird checklist. Download in Excel, CSV or XML.
- Fish: Fishbase. Checklist of fish available as CSV, Tab-delimited, or Excel.
- IUCN Redlist of Threatened Species: contains assessments for 49,000 species of which spatial data exists for about 25,000 species. ESRI shapefiles.
- Amphibians, Birds, Mammals: Species distribution grids from SEDAC. Data are available for global amphibian distributions, and for birds and mammals in the Americas. .BIL images.
- Birds: RSPB garden birdwatch 2010 results: Results from a survey to count the number of birds in your garden. Localised to the United Kingdom. Downloadable as spreadsheets.
- Open 23AndMe raw genotyping datasets: SNPedia has links to a number of data sets that people have decided to share. We can probably pool some more from amongst ourselves. Is there a larger repository of open data sets somewhere
- The 1000 Genomes Project: The 1000 Genomes Project is the first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. As with other major human genome reference projects, data from the 1000 Genomes Project will be made available quickly to the worldwide scientific community through freely accessible public databases. The goal of the 1000 Genomes Project is to find most genetic variants that have frequencies of at least 1% in the populations studied.
- BioChemWeb: List of online databases in Biochemistry, Moleculer Biology and Cell Biology
- Genome Interpretation/Annotation Tools:
- Interpretome (http://interpretome.com)
- Promethease (http://www.snpedia.com/index.php/Promethease)
- Trait-o-Matic (http://snp.med.harvard.edu/)
- GET Evidence (http://evidence.personalgenomes.org/about)
- SNPTips (http://snptips.5amsolutions.com/)
- DIYGenomics mobile app (http://www.diygenomics.org/mobile.php)
- Trait Association Databases:
- Population/Ancestry Databases:
- ChemSpider: structures and other compound information (in case we need it)
Computer Science + Web Data
Earth Sciences, Climate & Environment
- Data.nasa.gov – directory of NASA-related datasets
- NASA World Wind
- RealClimate.org List of Data Sources on climate change – RealClimate is a blog run by climate scientists to respond to claims by “deniers”. They also have a wiki which lists the names and details of pushers of “climate-related nonsense”.
- Mineral Resources Data System (MRDS). MRDS describes metallic and nonmetallic mineral resources throughout the world. Included are deposit name, location, commodity, deposit description, geologic characteristics, production, reserves, resources, and references. It includes the original MRDS and MAS/MILS data.
- OneGeology. Global geological maps.
- AMEEdiscover: database of global emissions standards and methodologies, integrated with AMEE’s API.
- CIA World Factbook
- US Census Data
- Transport for London data, seehttp://www.tfl.gov.uk/corporate/media/newscentre/15771.aspx and http://www.tfl.gov.uk/tfl/businessandpartners/syndication/assets/syndication-developer-guidelines.pdf
- BC OpenData Catalogue – all provincial datasets for British Columbia (education, natural resources, health,
Medicine and Health Sciences
- Dailymed Digest
- LinkedCT – clinical trial data as Linked Data
- Diseases: OMIM is a (messy, text file based) database of Mendelian human diseases (diseases caused by mutations in a single gene) including descriptions, the gene that caused them etc.
- OpenfMRI.org: OpenfMRI.org is a project dedicated to the free and open sharing of functional magnetic resonance imaging (fMRI) datasets, including raw data.
- 1000 Functional Connectomes Project and International Neuroimaging Data-sharing Initiative (http://fcon_1000.projects.nitrc.org/ <– for some reason the wiki doesn’t like this URL, I recommend you copy and paste it): neuroimaging scans from 1000s of subjects. Includes resting state functional magnetic resonance imaging (fMRI) data, structural MRIs, and diffusion tensor imaging (DTI). While the FCP dataset is mostly from healthy controls, and includes very little phenotypic data, the INDI dataset is well phenotyped and includes data for several patient populations such as ADHD, epilepsy and cocaine addiction.
- ADHD-200 preprocessed data: preprocessed resting state fMRI and structural MRI data from ~ 700 typically developing children and ~ 400 children with ADHD released through INDI. The goal of this project is to release data in a form that is more accessible to those without functional neuroimaging expertise.
- Multi-Modal MRI Reproducibility Resource: scan-rescan imaging sessions from 21 healthy volunteers (no history of neurological disease). Imaging modalities include MPRAGE, FLAIR, DTI, resting state fMRI, B0 and B1 field maps, ASL, VASO, quantitative T1 mapping, quantitative T2 mapping, and magnetization transfer imaging. This is intended to be a resource for statisticians and imaging scientists to be able to quantify the reproducibility of their imaging methods using data available from a generic “1 hour” session at 3T.
- brainmap.org: BrainMap is an online database of published functional neuroimaging (fMRI and PET) experiments with coordinate-based (x,y,z) activation locations in Talairach space. The goal of BrainMap is to provide a vehicle to share methods and results of studies in specific research domains, such as language, memory, attention, emotion, and perception. BrainMap can also be used to perform meta-analyses of similar research studies.
- Open Connectome Project: “Collectively reverse engineering the brain one synapse at a time.” Transmission electron microscopy images of mouse visual cortex.
- Allen Brain Atlas: A growing collection of online public resources integrating extensive gene expression and neuroanatomical data, complete with a novel suite of search and viewing tools.
- PLoS publication datasets – 3700+ datasets from across the sciences, free to experiment with and hack
- Linked data on research publications – Journal metadata from Crossref, Springer, Highwire, and the National Library of Medicine, available as Linked Data in RDF/XML/JSON/Turtle with SPARQL endpoint.
- LIBRIS – the Swedish national library as linked data and SPARQL (useful for bibliometric sampling)
- William Gunn’s list of datasets, APIs, and tools (Google doc).
- World Bank scientific indicators, e.g. http://data.worldbank.org/indicator/IP.JRN.ARTC.SC