Data Commons · Data Commons vs collection of datasets Collections of datasets (ala NIH Data...

Data Commons

Context

Data powers everything

- policy, - journalism,- health,- science

How do we make it easier?

Problem is not a shortage of data

Demographics: ACS, Housing Survey, Community Survey,

Economics: BLS, BEA, World Bank, OECD, …Health: CDC Wonder, CDC Diabetes, County Health, …Climate: NOAA, Hurricanes, Flooding, …Genomics: NCBI, ENCODE, ...

Too many formats, schemas, ...

Using Data: current model

Forage for data sources, track down assumptions/provenance

Clean it up, compile data sources, figure out storage, …

High upfront costs, sparse ecosystem, few tools, …

Google maps made satellite imagery part of everyone’s life

We want to do that for data

From search for dataset, download, clean, normalize,

join …

Just ask Google

Census(ACS)BLSFBI

CDCSequence data

Wikidata Landsat

Aggregated Knowledge Graph

Medicare

JournalistsStudents ResearchersSearch, News Cloud

Data Commons vs collection of datasets

Collections of datasets (ala NIH Data Commons). Solves the problem of finding the dataset.

But the remaining problems --- cleaning, joining … remain

Datacommons --- a single KG built by cleaning, normalizing, joining all these datasets

Data Commons v 0.9

- People, places, …: Integrated view of Census (ACS), CDC, BLS, BEA, FEC, NOAA, DEA, DOL,… on average 5k variables for every state, county, city, zip, school district, congressional district, …

- Education: College Scoreboard, NCES, ...- Disasters: earthquakes, hurricanes, floods, fires

- Scientific Collections: Bronx Botanical Gardens, ENCODE, parts of NCBI, GTEx

Data Commons v 0.9

Applications for 4 categories

- Researchers- Students- Journalists- Google Users

APIs in

- REST- Python / Python Notebooks- SPARQL- SQL against Big Query- Google Sheets

Python Notebooksfor students &researchers

Simple Charting Tool

Simple CorrelationTool

Google Sheets API

Biomedical Data Commons

Census(ACS)UniProtCheMBL

CDCSequence data

Entrez Gene GTEx

Aggregated Knowledge Graph

Medicare

CitizenScientists

Students ResearchersDoctors Cloud

Biomedical Data Commons Datasets

- CDC - 500 Cities- CDC - Diabetes

Atlas- CDC - Wonder- ChEMBL*- ClinVar- dbSNP- Dartmouth

Medicare Atlas

- Disease Ontology*- FDA -

Pharmacologic Class*

- GTEx- ENCODE- Entrez Gene- MeSH*- NY Times -

COVID-19 Cases + Deaths

- SIDER*- SPOKE*- UCSC Genome

Browser- US Census - SAHIE- UniProt*- WHO - COVID-19

Cases + Deaths- WHO - ICD-10

Codes*Source: UCSF SPOKE

Eg: Extract 3 data points on given variants

Given a list of genetic variants

1. Clinical Significance2. Functional Category3. Significant Gene Associations in Whole Blood

Current Methods to find information

Option 1:Site Search

Use needed database browser and search all variants individually

Option 2:Download and Analyze Data

Download data, read it into memory, and parse it for the needed info programmatically

Option 3:Data Commons

Run 1 query on Data Commons

Data Commons · Data Commons vs collection of datasets Collections of datasets (ala NIH Data...

Documents

Transcript of Data Commons · Data Commons vs collection of datasets Collections of datasets (ala NIH Data...

Chapter 18 - Data sources and datasets

Today’s Topics Data Commons’ approach Data Commons’ Services Use of MedBiquitious Professional Profile standard.

Data integration with diverse datasets - National Academiessites.nationalacademies.org/cs/groups/depssite/documents/webpag… · Data integration with diverse datasets Alfred Hero

Advanced Algorithms for Massive DataSets Data Compression.

The Global Commons of Data

The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer Research Data Commons NIH Data Commons Pilot Data Commons co-locate data, storage and

NCI Cancer Research Data Commons · NCI Cancer Research Data Commons (CRDC) - Concept . Data commons co-locate data, storage and computing infrastructure with commonly used services,

NHS Hospital Data and Datasets: A Consultation · PDF fileNHS Hospital Data and Datasets: A Consultation IMPROVE ... NHS HOSPITAL DATA AND DATASETS: ... outcome and patient satisfaction,

Data Commons & Data Science Workshop

Querying Heterogeneous Datasets on the Linked Data Web

Data harmonization in diverse datasets - BIG DATA SCIENCE ...

VILLAGE/GP LEVEL DATA CAPTURING USING HR RS DATA...cadastral datasets, Bhu-lekh RoR data, satellite derived spatial datasets and attribute datasets of industry department are used

Disconnected Data Applications: using Datasets, XML and Transactions.

APPENDIX D DATA PROCESSING: INPUT …DATA PROCESSING: INPUT DATASETS, EDIT REPORTS AND OUTPUT DATASETS File Naming Convention: DPSdd.GQ.FXXXXX.Yyyyys STUDENT DATA BASE – SURVEY 5

NIH Data Summit - The NIH Data Commons

Making your data FAIR · applications. These datasets support the FAIR Data Principles in multiple ways: The datasets has 10,000+ downloads, with Mendeley Data Repository supporting

Open Data and Creative Commons

Understanding Complex Datasets: Data Mining with Matrix ... › eb53 › 62c08a8bf013... · Understanding Complex Datasets: Data Mining with Matrix Decompositions D.B. Skillicorn

CDIS Biomedical Data Commons - CENTER FOR …cri.uchicago.edu/wp-content/uploads/2017/12/CLSSS_Seminar_101820… · CDIS Biomedical Data Commons ... •NCI GDC Data Commons –GDC

Creative + Data: Increase the Impact of Your Datasets