"ExpoDB: An Exploratory Data Science Platform"
-
Upload
diannepatricia -
Category
Technology
-
view
497 -
download
0
Transcript of "ExpoDB: An Exploratory Data Science Platform"
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB: An Exploratory Data Science Platform(A New Frontier: From Data Processing to Knowledge Exploration)
Mohammad Sadoghi Assistant ProfessorDepartment of Computer SciencePurdue University
IBM Cognitive Systems Institute Speaker Series September 29, 2016
© 2016 Mohammad Sadoghi (Purdue University)
Insight is Lost in Islands of Data
2
http://www.cpsresearch.eu/clinical-trials/
http://news.mit.edu/2015/mnookin-vaccination-public-health-0227
http://www.healthcarepackaging.com/trends-and-issues/clinical-trials
http://stormercellularloo.gq/evolve-ii-clinical-trial.html
https://www.geneticliteracyproject.org
Data is spread across many islands of disconnected sources (a lack of holistic view)
© 2016 Mohammad Sadoghi (Purdue University)
Insight is Lost in Islands of Data
3
http://www.cpsresearch.eu/clinical-trials/
http://news.mit.edu/2015/mnookin-vaccination-public-health-0227
http://www.healthcarepackaging.com/trends-and-issues/clinical-trials
http://stormercellularloo.gq/evolve-ii-clinical-trial.html
https://www.geneticliteracyproject.org
Sadly, adverse drug reactions (ADRs) is the 4th leading cause of deaths in United States, resulting in100,000 loss of life annually
© 2016 Mohammad Sadoghi (Purdue University)
Insight is Lost in Islands of Data
4
http://www.cpsresearch.eu/clinical-trials/
http://news.mit.edu/2015/mnookin-vaccination-public-health-0227
http://www.healthcarepackaging.com/trends-and-issues/clinical-trials
http://stormercellularloo.gq/evolve-ii-clinical-trial.html
https://www.geneticliteracyproject.org
Adverse drug reaction costs over $136 billion dollars in US annually
© 2016 Mohammad Sadoghi (Purdue University)
Real-time Fusion and Exploration of Data
© 2016 Mohammad Sadoghi (Purdue University)
Real-time Fusion and Exploration of Enriched Data
© 2016 Mohammad Sadoghi (Purdue University)
Real-time Fusion and Exploration of Enriched Data at Web Scale
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
8
PTGS2 (Gene)
inhibits
TP53 (Gene)
Rheumatoid Arthritis
Osteosarcoma (Bone Cancer)
Naproxen (Aleve)
Disease
Immune System
Autoimmune
Joint Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR (Gene)
inhibi
ts
Arthritis
WarfarinEmbolism
(Blood Clot)
Nicotine
VKORC1 (Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic Acids
Heterocyclic
Aminopterin Phenylpro- pionates
Approved Drugs
increased degradation
inhibits
Inhibits
Inhi
bits
Inhibits
limit cells growth
tumor
suppressor
Why capture the semantic/context?Semantic is essential to connect the dots.
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
9
PTGS2 (Gene)
inhibits
TP53 (Gene)
Rheumatoid Arthritis
Osteosarcoma (Bone Cancer)
Naproxen (Aleve)
Disease
Immune System
Autoimmune
Joint Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR (Gene)
inhibi
ts
Arthritis
WarfarinEmbolism
(Blood Clot)
Nicotine
VKORC1 (Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic Acids
Heterocyclic
Aminopterin Phenylpro- pionates
Approved Drugs
increased degradation
inhibits
Inhibits
Inhi
bits
Inhibits
limit cells growth
tumor
suppressor
Why capture the semantic/context?Semantic is essential to connect the dots.
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
10
PTGS2 (Gene)
inhibits
TP53 (Gene)
Rheumatoid Arthritis
Osteosarcoma (Bone Cancer)
Naproxen (Aleve)
Disease
Immune System
Autoimmune
Joint Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR (Gene)
inhibi
ts
limit cells growth
Arthritis
WarfarinEmbolism
(Blood Clot)
Nicotine
VKORC1 (Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic Acids
Heterocyclic
Aminopterin Phenylpro- pionates
Approved Drugs
increased degradation
inhibits
Inhibits
Inhi
bits
Inhibits
tumor
suppressor
Why capture the semantic/context?Semantic is essential to connect the dots.
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
11
PTGS2 (Gene)
inhibits
TP53 (Gene)
Rheumatoid Arthritis
Osteosarcoma (Bone Cancer)
Naproxen (Aleve)
Disease
Immune System
Autoimmune
Joint Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR (Gene)
inhibi
ts
limit cells growth
Arthritis
WarfarinEmbolism
(Blood Clot)
Nicotine
VKORC1 (Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic Acids
Heterocyclic
Aminopterin Phenylpro- pionates
Approved Drugs
increased degradation
inhibits
Inhibits
Inhi
bits
Inhibits
tumor
suppressor
?
Why capture the semantic/context?Semantic is essential to connect the dots.
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
12
PTGS2 (Gene)
inhibits
TP53 (Gene)
Rheumatoid Arthritis
Osteosarcoma (Bone Cancer)
tumor
suppressor
Naproxen (Aleve)
Disease
Immune System
Autoimmune
Joint Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR (Gene)
inhibi
ts
Arthritis
WarfarinEmbolism
(Blood Clot)
Nicotine
VKORC1 (Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic Acids
Heterocyclic
Aminopterin Phenylpro- pionates
Approved Drugs
increased degradation
inhibits
Inhibits
Inhi
bits
Inhibits
limit cells growth ?
?
?
Why capture the semantic/context?Semantic is essential to connect the dots.
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
13
PTGS2 (Gene)
inhibits
TP53 (Gene)
Rheumatoid Arthritis
Osteosarcoma (Bone Cancer)
Naproxen (Aleve)
Disease
Immune System
Autoimmune
Joint Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR (Gene)
inhibi
ts
Arthritis
WarfarinEmbolism
(Blood Clot)
Nicotine
VKORC1 (Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic Acids
Heterocyclic
Aminopterin Phenylpro- pionates
Approved Drugs
increased degradation
inhibits
Inhibits
Inhi
bits
Inhibits (1) Instance Layer: Capturing raw data instances including both structured & semi-structured data
How to capture the context?
limit cells growth
tumor
suppressor
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
14
PTGS2 (Gene)
inhibits
TP53 (Gene)
Rheumatoid Arthritis
Osteosarcoma (Bone Cancer)
Naproxen (Aleve)
Disease
Immune System
Autoimmune
Joint Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR (Gene)
inhibi
ts
Arthritis
WarfarinEmbolism
(Blood Clot)
Nicotine
VKORC1 (Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic Acids
Heterocyclic
Aminopterin Phenylpro- pionates
Approved Drugs
increased degradation
inhibits
Inhibits
Inhi
bits
Inhibits
How to capture the context?
limit cells growth
tumor
suppressor
(2) Relation Layer: Capturing the interconnectedness of data instances across data sources
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
15
PTGS2 (Gene)
inhibits
TP53 (Gene)
Rheumatoid Arthritis
Osteosarcoma (Bone Cancer)
Naproxen (Aleve)
Disease
Immune System
Autoimmune
Joint Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR (Gene)
inhibi
ts
Arthritis
WarfarinEmbolism
(Blood Clot)
Nicotine
VKORC1 (Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic Acids
Heterocyclic
Aminopterin Phenylpro- pionates
Approved Drugs
increased degradation
inhibits
Inhibits
Inhi
bits
Inhibits
How to capture the context?
limit cells growth
tumor
suppressor
(3) Semantic Layer: Capturing conceptual relationships among data instances and their types
© 2016 Mohammad Sadoghi (Purdue University)
Enriched Data Model: Semantic is essential to connect the dots
16
PTGS2(Gene)
TP53(Gene)
Acetaminophen(Tylenol)
Rheumatoid Arthritis
Osteosarcoma(Bone Cancer)
ReliefFever
Ibuprofen(Advil)
Immune System
Autoimmune
Joint Diseases
Sarcoma
Neoplasms
DrugName DrugTargets(Genes)
SymptomaticTreatment
Ibuprofen PTGS2 Rheumatoid Arthritis
Acetaminophen PTGS2 Relief Fever
Methotrexate DHFR AntineoplasticAnti-metabolite
Warfarin TP53 Embolism(Blood Clot)
Gene Interaction
PTGS2 TP53(Gene)
DrugBank: Bioinformatics & Cheminformatics ResourceCTD: Comparative Toxicogenomics Database
Gene Function
TP53 TumorSuppressor
DHFR LimitsCell Growth
Uniprot: Universal Protein Resource
Gene Disease
TP53 Osteosarcoma
Sem
anti
c la
yer
Rel
atio
n la
yer
Inst
ance
laye
r
Methotrexate
DHFR(Gene)
Arthritis
WarfarinEmbolism
(Blood Clot)
Info
rmat
ion
Kno
wle
dge
Dat
a
Warfarin has narrow therapeutic range(fatal outcomes)
Dosage for Asians population: 3.4 mg
Dosage for Whites population: 5.1mg
Dosage for African-Americans population: 6.1 mg
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
17
RankQueryRepresentation
RankQueryRefinement
RankDataSourcesDiscovery
RankQueryComposition
RankQueryAnswers
RankAnswerEvidence
RankAnswerRepresentation
QueryRefinementRanking
DataSourceDiscoveryRanking
QueryCompositionRanking
QueryAnswerRanking
EvidenceRanking
QueryRepresentationRanking
AnswerRepresentationRanking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
18
RankQueryRepresentation
RankQueryRefinement
RankDataSourcesDiscovery
RankQueryComposition
RankQueryAnswers
RankAnswerEvidence
RankAnswerRepresentation
QueryRefinementRanking
DataSourceDiscoveryRanking
QueryCompositionRanking
QueryAnswerRanking
EvidenceRanking
QueryRepresentationRanking
AnswerRepresentationRanking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
“Is Warfarin sensitive to ethnic background?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
19
RankQueryRepresentation
RankQueryRefinement
RankDataSourcesDiscovery
RankQueryComposition
RankQueryAnswers
RankAnswerEvidence
RankAnswerRepresentation
QueryRefinementRanking
DataSourceDiscoveryRanking
QueryCompositionRanking
QueryAnswerRanking
EvidenceRanking
QueryRepresentationRanking
AnswerRepresentationRanking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
“Is Warfarin sensitive to ethnic background?”
“Does Warfarin have a narrow therapeutic range?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
20
RankQueryRepresentation
RankQueryRefinement
RankDataSourcesDiscovery
RankQueryComposition
RankQueryAnswers
RankAnswerEvidence
RankAnswerRepresentation
QueryRefinementRanking
DataSourceDiscoveryRanking
QueryCompositionRanking
QueryAnswerRanking
EvidenceRanking
QueryRepresentationRanking
AnswerRepresentationRanking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
“Is Warfarin sensitive to ethnic background?”
“Does Warfarin have a narrow therapeutic range?”
“What are the disjoint classes of population with respect to Warfarin?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
21
RankQueryRepresentation
RankQueryRefinement
RankDataSourcesDiscovery
RankQueryComposition
RankQueryAnswers
RankAnswerEvidence
RankAnswerRepresentation
QueryRefinementRanking
DataSourceDiscoveryRanking
QueryCompositionRanking
QueryAnswerRanking
EvidenceRanking
QueryRepresentationRanking
AnswerRepresentationRanking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
“Is Warfarin sensitive to ethnic background?”
“Does Warfarin have a narrow therapeutic range?”
“What are the disjoint classes of population with respect to Warfarin?”
“What are the adverse reactions of Warfarin?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
22
RankQueryRepresentation
RankQueryRefinement
RankDataSourcesDiscovery
RankQueryComposition
RankQueryAnswers
RankAnswerEvidence
RankAnswerRepresentation
QueryRefinementRanking
DataSourceDiscoveryRanking
QueryCompositionRanking
QueryAnswerRanking
EvidenceRanking
QueryRepresentationRanking
AnswerRepresentationRanking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
“Is Warfarin sensitive to ethnic background?”
“Does Warfarin have a narrow therapeutic range?”
“What are the disjoint classes of population with respect to Warfarin?”
“What are the adverse reactions of Warfarin?”
“What is an effective dosage of Warfarin for preventing blood clot?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
23
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
“What are the disjoint classes of population with
respect to Warfarin?”
“What is an effective dosage of Warfarin for preventing blood clot?”
“Does Warfarin have a narrow therapeutic range?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
24
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
“What are the disjoint classes of population with
respect to Warfarin?”
“What is an effective dosage of Warfarin for preventing blood clot?”
“Does Warfarin have a narrow therapeutic range?”
Dosage for African-Americans population: 6.1 mg Dosage for Whites
population: 5.1mg
Dosage for Asians population: 3.4 mg
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
25
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
“What are the disjoint classes of population with
respect to Warfarin?”
Querying different sources return 6.1 mg, 5.1 mg, & 3.4 mg,
so is the data inconsistent? (revisiting consistent answers formalism
& possible world semantics)
“What is an effective dosage of Warfarin for preventing blood clot?”
“Does Warfarin have a narrow therapeutic range?”
Dosage for African-Americans population: 6.1 mg Dosage for Whites
population: 5.1mg
Dosage for Asians population: 3.4 mg
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
26
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
“What are the disjoint classes of population with
respect to Warfarin?”
Querying different sources return 6.1 mg, 5.1 mg, & 3.4 mg,
so is the data inconsistent? (revisiting consistent answers formalism
& possible world semantics)
“What is an effective dosage of Warfarin for preventing blood clot?”
“Does Warfarin have a narrow therapeutic range?”
Dosage for African-Americans population: 6.1 mg Dosage for Whites
population: 5.1mg
Dosage for Asians population: 3.4 mg
Given the known narrow therapeutic range, so is 5.1 mg close enough to 5.0 mg?
(fuzzy answers formalism in presence of enriched data)
© 2016 Mohammad Sadoghi (Purdue University)
Spark Architecture: Knowledge ObliviousApplications
APIs/Services(Access/Interfaces)
Processing Engine
Data Model(Immutable
Collection of Objects)
Storage
Resource Virtualization
27
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Apache Spark (General Data Processing on Distributed Memory)
Spark Data Model (Resilient Distributed Datasets — RDDs)
Resource Abstractions (Apache Mesos)
Resource Management(Hadoop Yarn)
Personalized Medicine (Drug Discovery/Safety)
Spark Streaming
SparkSQLBlinkDB GraphX SparkR MLlib
Computational FinanceCompliance Informatics
© 2016 Mohammad Sadoghi (Purdue University)
Spark Architecture: Knowledge ObliviousApplications
APIs/Services(Access/Interfaces)
Processing Engine
Data Model(Immutable
Collection of Objects)
Storage
Resource Virtualization
28
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Apache Spark (General Data Processing on Distributed Memory)
Spark Data Model (Resilient Distributed Datasets — RDDs)
Resource Abstractions (Apache Mesos)
Resource Management(Hadoop Yarn)
Spark Streaming
SparkSQLBlinkDB GraphX SparkR MLlib
Personalized Medicine (Drug Discovery/Safety)
Computational FinanceCompliance Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services(Access/Interfaces)
Processing Engine
Data Model(Enriching Raw Data Towards Knowledge)
Storage
Resource Virtualization
29
Spark Streaming
SparkSQLBlinkDB GraphX SparkR MLlib
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions (Apache Mesos)
Resource Management(Hadoop Yarn)
Apache Spark (General Data Processing on Distributed Memory)
Personalized Medicine (Drug Discovery/Safety)
Computational FinanceCompliance Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services(Access/Interfaces)
Processing Engine
Data Model(Enriching Raw Data Towards Knowledge)
Storage
Resource Virtualization
30
Spark Streaming
SparkSQLBlinkDB GraphX SparkR MLlib
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions (Apache Mesos)
Resource Management(Hadoop Yarn)
Apache Spark (General Data Processing on Distributed Memory)
Personalized Medicine (Drug Discovery/Safety)
Computational FinanceCompliance Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services(Access/Interfaces)
Processing Engine
Data Model(Enriching Raw Data Towards Knowledge)
Storage
Resource Virtualization
31
Spark Streaming
SparkSQLBlinkDB GraphX SparkR MLlib
Semantic Layer Ontology Rules Stochastic Models Tensor Embedding
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions (Apache Mesos)
Resource Management(Hadoop Yarn)
Apache Spark (General Data Processing on Distributed Memory)
Personalized Medicine (Drug Discovery/Safety)
Computational FinanceCompliance Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services(Access/Interfaces)
Processing Engine
Data Model(Enriching Raw Data Towards Knowledge)
Storage
Resource Virtualization
32
Spark Streaming
SparkSQLBlinkDB GraphX SparkR MLlib
Semantic Layer
Spark Data Model (RDDs) Generic Data Model (Key-Value Store)
Ontology Rules Stochastic Models Tensor Embedding
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions (Apache Mesos)
Resource Management(Hadoop Yarn)
Apache Spark (General Data Processing on Distributed Memory)
Personalized Medicine (Drug Discovery/Safety)
Computational FinanceCompliance Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services(Access/Interfaces)
Processing Engine
Data Model(Enriching Raw Data Towards Knowledge)
Storage
Resource Virtualization
33
Spark Streaming
SparkSQLBlinkDB GraphX SparkR MLlib
ReasoningRefinementCuration Fusion Discovery
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions (Apache Mesos)
Resource Management(Hadoop Yarn)
Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP)
Semantic Layer
Spark Data Model (RDDs) Generic Data Model (Key-Value Store)
Ontology Rules Stochastic Models Tensor Embedding
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Personalized Medicine (Drug Discovery/Safety)
Computational FinanceCompliance Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: Active Data PathApplications
APIs/Services(Access/Interfaces)
Processing Engine
Data Model(Enriching Raw Data Towards Knowledge)
Storage
Resource Virtualization
34
Spark Streaming
SparkSQLBlinkDB GraphX SparkR MLlib
ReasoningRefinementCuration Fusion
Semantic Layer
Spark Data Model (RDDs) Generic Data Model (Key-Value Store)
Ontology Rules Stochastic Models Tensor Embedding
Discovery
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions (Apache Mesos)
Resource Management(Hadoop Yarn)
Virtualized Hardware Acceleration (GPU & FPGA)
Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP)
Personalized Medicine (Drug Discovery/Safety)
Computational FinanceCompliance Informatics
© 2016 Mohammad Sadoghi (Purdue University)
Personalized Medicine (Drug Discovery/Safety)
Computational Finance
The First Step!Applications
APIs/Services(Access/Interfaces)
Processing Engine
Data Model(Enriching Raw Data Towards Knowledge)
Storage
Resource Virtualization
35
Spark Streaming
SparkSQLBlinkDB GraphX SparkR MLlib
ReasoningRefinementCuration Fusion
Semantic Layer
Spark Data Model (RDDs) Generic Data Model (Key-Value Store)
Ontology Rules Stochastic Models Tensor Embedding
Discovery
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions (Apache Mesos)
Resource Management(Hadoop Yarn)
Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP)
L-Store (Real-time OLTP+OLAP)
FQP (Flexible Query Processor)
EmbedS (Ontology)
Phenomenological Features (Deep-Learning-as-Oracle)
PADRES (Event Processing)
IBM DB2 BLU (Column Store)
SPIDER (Declarative Data Cleansing)
Vraph (Vectorized Graph Processing)
Tiresias (Predicting Adverse Drug Reaction)
fpga-ToPSS (Algorithmic Trading)
Compliance Informatics
Virtualized Hardware Acceleration (GPU & FPGA)
© 2016 Mohammad Sadoghi (Purdue University)
Thank YouQ&A
Exploratory Systems Lab (ExpoLab)website: https://msadoghi.github.io/
© 2016 Mohammad Sadoghi (Purdue University)
Data/Knowledge Exploration: • Mohammad Sadoghi, Kavitha Srinivas, Oktie Hassanzadeh, Yuan-Chi Chang, Mustafa Canim, Achille Fokoue, Yishai A. Feldman: Self-Curating Databases. EDBT 2016
• Amit Chandel, Oktie Hassanzadeh, Nick Koudas, Mohammad Sadoghi, Divesh Srivastava: Benchmarking declarative approximate selection predicates. SIGMOD Conference 2007: 353-364
• Oktie Hassanzadeh, Mohammad Sadoghi, Renée J. Miller: Accuracy of Approximate String Joins Using Grams. QDB 2007
Drug Safety: • Achille Fokoue, Mohammad Sadoghi, Oktie Hassanzadeh, Ping Zhang: Predicting Drug-Drug Interactions Through Large-Scale Similarity-Based Link Prediction. ESWC 2016
• Achille Fokoue, Oktie Hassanzadeh, Mohammad Sadoghi, Ping Zhang: Predicting Drug-Drug Interactions Through Similarity-Based Link Prediction Over Web Data. WWW 2016
OLTP & OLAP: • Mohammad Sadoghi, Souvik Bhattacherjee, Bishwaranjan Bhattacharjee, Mustafa Canim: L-Store: A Real-time OLTP and OLAP System. CoRR abs/1601.04084 (2016)
• Kaiwen Zhang, Mohammad Sadoghi, Hans-Arno Jacobsen: DL-Store: A Distributed Hybrid OLTP and OLAP Data Processing Engine. ICDCS 2016
• Mohammad Sadoghi, Kenneth A. Ross, Mustafa Canim, Bishwaranjan Bhattacharjee: Exploiting SSDs in operational multiversion databases. VLDB J. 25(5): 651-672 (2016)
• Mohammad Sadoghi, Mustafa Canim, Bishwaranjan Bhattacharjee, Fabian Nagel, Kenneth A. Ross: Reducing Database Locking Contention Through Multi-version Concurrency. PVLDB 7(13): 1331-1342 (2014)
• Prashanth Menon, Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen: CaSSanDra: An SSD boosted key-value store. ICDE 2014: 1162-1167
• Prashanth Menon, Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen: Optimizing key-value stores for hybrid storage architectures. CASCON 2014: 355-358
• Mohammad Sadoghi, Kenneth A. Ross, Mustafa Canim, Bishwaranjan Bhattacharjee: Making Updates Disk-I/O Friendly Using SSDs. PVLDB 6(11): 997-1008 (2013)
Hardware Acceleration: • Rajesh R. Bordawekar, Mohammad Sadoghi: Accelerating database workloads by software-hardware-system co-design. ICDE 2016
• Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: SplitJoin: A Scalable, Low-latency Stream Join Architecture with Adjustable Ordering Precision. USENIX Annual Technical Conference 2016
• Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: The FQP Vision: Flexible Query Processing on a Reconfigurable Computing Fabric. SIGMOD Record 44(2): 5-10 (2015)
• Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: Configurable hardware-based streaming architecture using Online Programmable-Blocks. ICDE 2015
• Mohammedreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: Flexible Query Processor on FPGAs. PVLDB 6(12): 1310-1313 (2013)
• Mohammad Sadoghi, Rija Javed, Naif Tarafdar, Harsh Singh, Rohan Palaniappan, Hans-Arno Jacobsen: Multi-query Stream Processing on FPGAs. ICDE 2012: 1229-1232
• Mohammad Sadoghi, Harsh Singh, Hans-Arno Jacobsen: Towards highly parallel event processing through reconfigurable hardware. DaMoN 2011: 27-32
• Mohammad Sadoghi, Harsh Singh, Hans-Arno Jacobsen: fpga-ToPSS: line-speed event processing on fpgas. DEBS 2011: 373-374
• Mohammad Sadoghi, Hans-Arno Jacobsen, Martin Labrecque, Warren Shum, Harsh Singh: Efficient Event Processing through Reconfigurable Hardware for Algorithmic Trading. PVLDB 3(2):
1525-1528 (2010)
References: