Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social...
Transcript of Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social...
![Page 1: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/1.jpg)
Database-as-a-Service
for Long Tail Science
Bill Howe
Garret Cole
Nodira Khoussainova
Luke Zettlemoyer
Shaminoo Kapoor
Patrick Michaud
![Page 2: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/2.jpg)
All science is reducing to a database problem
Old model: “Query the world” (Data acquisition coupled to a specific hypothesis)
New model: “Download the world” (Data acquired en masse, in support of many hypotheses)
Astronomy: High-resolution, high-frequency sky surveys (SDSS, LSST, PanSTARRS)
Oceanography: high-resolution models, cheap sensors, satellites
Biology: lab automation, high-throughput sequencing,
![Page 3: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/3.jpg)
data
volu
me
rank
CERN
(~15PB/year)
LSST
(~100PB)
PanSTARRS
(~40PB)
Ocean
Modelers <Spreadsheet
users>
SDSS
(~100TB)
Seis-
mologistsMicrobiologistsCARMEN
(~50TB)
“The future is already here;
it’s just not very evenly
distributed.”
-- William
Gibson
The Long Tail
![Page 4: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/4.jpg)
Biology
Oceanograph
y
Astronom
y
The other “Large Scale”# o
f byte
s
# of types, # of apps
LSST
SDSS
Galaxy
BioMart
GEO
IOOS
OOI
LANL
HIV Pathway
Commons
PanSTARR
S Client + Cloud Viz, SSDBM 2010
Science Dataspaces, CIDR 2007, IIMAS 2008
This talk
Mesh Algebra, VLDB 2004, VLDBJ
2005, ICDE 2005, eScience
2008
HaLoop, VLDB 2010
see also:
Skew handling, SOCC 2010
Clustering, SSDBM 2010
Science Mashups, SSDBM 2009
Cloud Viz, UltaScale Viz 2009, Visualization 2010
![Page 5: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/5.jpg)
Ad Hoc Research Data
5/18/10 Garret Cole, eScience Institute
Fasta formatSpread sheets
Delimited ASCII
![Page 6: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/6.jpg)
Problem
How much time do you spend “handling
data” as opposed to “doing science”?
Mode answer: “90%”
![Page 7: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/7.jpg)
5/18/10 Garret Cole, eScience Institute
Simple Example###query length COG hit #1 e-value #1 identity #1 score #1 hit length #1 description #1chr_4[480001-580000].287 4500chr_4[560001-660000].1 3556chr_9[400001-500000].503 4211 COG4547 2.00E-04 19 44.6 620 Cobalamin biosynthesis protein CobT (nicotinate-mononucleotide:5, 6-dimethylbenzimidazole phosphoribosyltransferase)chr_9[320001-420000].548 2833 COG5406 2.00E-04 38 43.9 1001 Nucleosome binding factor SPN, SPT16 subunitchr_27[320001-404298].20 3991 COG4547 5.00E-05 18 46.2 620 Cobalamin biosynthesis protein CobT (nicotinate-mononucleotide:5, 6-dimethylbenzimidazole phosphoribosyltransferase)chr_26[320001-420000].378 3963 COG5099 5.00E-05 17 46.2 777 RNA-binding protein of the Puf family, translational repressorchr_26[400001-441226].196 2949 COG5099 2.00E-04 17 43.9 777 RNA-binding protein of the Puf family, translational repressorchr_24[160001-260000].65 3542chr_5[720001-820000].339 3141 COG5099 4.00E-09 20 59.3 777 RNA-binding protein of the Puf family, translational repressorchr_9[160001-260000].243 3002 COG5077 1.00E-25 26 114 1089 Ubiquitin carboxyl-terminal hydrolasechr_12[720001-820000].86 2895 COG5032 2.00E-09 30 60.5 2105 Phosphatidylinositol kinase and protein kinases of the PI-3 kinase familychr_12[800001-900000].109 1463 COG5032 1.00E-09 30 60.1 2105 Phosphatidylinositol kinase and protein kinases of the PI-3 kinase familychr_11[1-100000].70 2886chr_11[80001-180000].100 1523
ANNOTATIONSUMMARY-COMBINEDORFANNOTATION16_Phaeo_genome
id query hit e_value identity_ score query_start query_end hit_start hit_end hit_length1 FHJ7DRN01A0TND.1 COG0414 1.00E-08 28 51 1 74 180 257 2852 FHJ7DRN01A1AD2.2 COG0092 3.00E-20 47 89.9 6 85 41 120 2333 FHJ7DRN01A2HWZ.4 COG3889 0.0006 26 35.8 9 94 758 845 872…
2853FHJ7DRN02HXTBY.5 COG5077 7.00E-09 37 52.3 3 77 313 388 10892854FHJ7DRN02HZO4J.2 COG0444 2.00E-31 67 127 1 73 135 207 316
…3566FHJ7DRN02FUJW3.1 COG5032 1.00E-09 32 54.7 1 75 1965 2038 2105
…
COGAnnotation_coastal_sample.txt
![Page 8: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/8.jpg)
id query hit e_value query_start query_end hit_start hit_end hit_length6409FHJ7DRN01BYA61.1 TIGR00149 2.20E-21 1 84 43 125 1346410FHJ7DRN01BDTEA.1 TIGR00149 3.40E-09 3 42 30 69 1346411FHJ7DRN02HEUGQ.1 TIGR00149 1.70E-05 4 46 1 46 1346412FHJ7DRN01CA4BO.1 TIGR00149 5.30E-05 4 45 1 45 1346413FHJ7DRN01DM2FK.3 TIGR01651 5.70E-64 1 76 511 586 6066414FHJ7DRN01B8BPS.1 TIGR01651 1.20E-36 1 52 500 551 6066415FHJ7DRN02JM54P.1 TIGR01651 2.20E-24 15 80 301 366 6066416FHJ7DRN02FK6C5.2 TIGR00039 2.70E-16 1 45 37 85 1536417FHJ7DRN01D019A.1 TIGR00039 8.90E-12 5 65 48 118 1536418FHJ7DRN02FYAFO.1 TIGR00039 1.60E-11 1 76 67 153 153
coastal sample
Complex Example
…[H] COG4547 Cobalamin biosynthesis protein CobT (nicotinate-mononucleotide:5, 6-dimethylbenzimidazole phosphoribosyltransferase) Ype: YPMT1.87 Atu: AGl2410 Sme: SMc00701 Bme: BMEI0050 Mlo: mll3561 Ccr: CC0672…[J] COG5099 RNA-binding protein of the Puf family, translational repressor Sce: YGL014w YGL178w YJR091c YLL013c YPR042c
Spo: SPAC1687.22c SPAC4G8.03c SPAC4G9.05 SPAC6G9.14 SPBC56F2.08c SPBP35G2.14 SPCC1682.08c Ecu: ECU11g1730…
COG database
###query length COG hit #1 e-value #1 identity #1 score #1 hit length #1 description #1chr_4[480001-580000].287 4500chr_4[560001-660000].1 3556chr_9[400001-500000].503 4211 COG4547 2.00E-04 19 44.6 620 Cobalamin biosynthesis protein CobT (nicotinate-mononucleotide:5, 6-dimethylbenzimidazole phosphoribosyltransferase)chr_9[320001-420000].548 2833 COG5406 2.00E-04 38 43.9 1001 Nucleosome binding factor SPN, SPT16 subunitchr_27[320001-404298].20 3991 COG4547 5.00E-05 18 46.2 620 Cobalamin biosynthesis protein CobT (nicotinate-mononucleotide:5, 6-dimethylbenzimidazole phosphoribosyltransferase)chr_26[320001-420000].378 3963 COG5099 5.00E-05 17 46.2 777 RNA-binding protein of the Puf family, translational repressorchr_26[400001-441226].196 2949 COG5099 2.00E-04 17 43.9 777 RNA-binding protein of the Puf family, translational repressorchr_24[160001-260000].65 3542chr_5[720001-820000].339 3141 COG5099 4.00E-09 20 59.3 777 RNA-binding protein of the Puf family, translational repressorchr_9[160001-260000].243 3002 COG5077 1.00E-25 26 114 1089 Ubiquitin carboxyl-terminal hydrolasechr_12[720001-820000].86 2895 COG5032 2.00E-09 30 60.5 2105 Phosphatidylinositol kinase and protein kinases of the PI-3 kinase familychr_12[800001-900000].109 1463 COG5032 1.00E-09 30 60.1 2105 Phosphatidylinositol kinase and protein kinases of the PI-3 kinase familychr_11[1-100000].70 2886chr_11[80001-180000].100 1523
ANNOTATIONSUMMARY-COMBINEDORFANNOTATION16_Phaeo_genome
SwissProt web service
Browser Cross-Reference
TIGR01650 GO:0051116 contributes_to
TIGR01651 GO:0009236 NULL
TIGR01651 GO:0051116 NULL
TIGR01660 GO:0008940 NULL
TIGR01660 GO:0009061 NULL
TIGR01660 GO:0009325 NULL
TIGR01663 GO:0000012 NULL
TIGR01663 GO:0046403 NULL
TIGRFAM to GO Mapping
id query hit e_value query_start query_end hit_start hit_end hit_length6409FHJ7DRN01BYA61.1 TIGR00149 2.20E-21 1 84 43 125 1346410FHJ7DRN01BDTEA.1 TIGR00149 3.40E-09 3 42 30 69 1346411FHJ7DRN02HEUGQ.1 TIGR00149 1.70E-05 4 46 1 46 1346412FHJ7DRN01CA4BO.1 TIGR00149 5.30E-05 4 45 1 45 1346413FHJ7DRN01DM2FK.3 TIGR01651 5.70E-64 1 76 511 586 6066414FHJ7DRN01B8BPS.1 TIGR01651 1.20E-36 1 52 500 551 6066415FHJ7DRN02JM54P.1 TIGR01651 2.20E-24 15 80 301 366 6066416FHJ7DRN02FK6C5.2 TIGR00039 2.70E-16 1 45 37 85 1536417FHJ7DRN01D019A.1 TIGR00039 8.90E-12 5 65 48 118 1536418FHJ7DRN02FYAFO.1 TIGR00039 1.60E-11 1 76 67 153 153
coastal sample
5/18/10 Garret Cole, eScience Institute
![Page 9: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/9.jpg)
An observation about “handling data”
How many plasmids were bombarded in July and
have a rescue and expression?
5/18/10 Garret Cole, eScience Institute
SELECT count(*)
FROM [bombardment_log]
WHERE bomb_date BETWEEN ‟7/1/2010' AND ‟7/31/2010'
AND rescue clone IS NOT NULL
AND [expression?] = 'yes'
![Page 10: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/10.jpg)
An observation about “handling data”
Which samples have not been cloned?
5/18/10 Garret Cole, eScience Institute
SELECT *
FROM plasmiddb
WHERE NOT (ISDATE(cloned) OR cloned = „yes‟)
![Page 11: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/11.jpg)
An observation about “handling data”
How often does each RNA hit appear inside the
annotated surface group?
5/18/10 Garret Cole, eScience Institute
SELECT hit, COUNT(*) as cnt
FROM tigrfamannotation_surface
GROUP BY hit
ORDER BY cnt DESC
![Page 12: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/12.jpg)
An observation about “handling data”
For a given promoter (or protein fusion), how many expressing line have been generated (they would all have different strain designations)
5/18/10 Garret Cole, eScience Institute
SELECT strain, count(distinct line)
FROM glycerol_stocks
GROUP BY strain
![Page 13: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/13.jpg)
An observation about “handling data”
Find all TIGRFam ids (proteins) that are missing from at
least one of three samples (relations)
SELECT col0 FROM [refseq_hma_fasta_TGIRfam_refs]
UNION
SELECT col0 FROM [est_hma_fasta_TGIRfam_refs]
UNION
SELECT col0 FROM [combo_hma_fasta_TGIRfam_refs]
EXCEPT
SELECT col0 FROM [refseq_hma_fasta_TGIRfam_refs]
INTERSECT
SELECT col0 FROM [est_hma_fasta_TGIRfam_refs]
INTERSECT
SELECT col0 FROM [combo_hma_fasta_TGIRfam_refs]
![Page 14: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/14.jpg)
Long Tail Science DaaS Requirements
Schema-Later or Schema-Free
Schema represents a shared consensus on structure,
semantics, data model, usage modalities
By definition, no such consensus exists at the frontier of research
By definition, lots of schema churn
By definition, dirty data
Consistency?
Read mostly, appends, versioning/batch replace
Scale?
Relatively small (<100GB)
Dataspace abstraction attractive [Halevy, Maier, Franklin 2005]
anecdotally well-received
![Page 15: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/15.jpg)
Some Science DaaS Motivations
Chronic IT poverty + exploding data volumes
especially in the long tail
Data sharing is the whole point
mandated by funding agencies
in the cloud, sharing reduces to policy
Public reference databases
Globally accessible in the cloud
![Page 16: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/16.jpg)
Chavi dataspace
![Page 17: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/17.jpg)
More Examples
What is the location of the E.Coli glycerol stock(s) for gene X promoter
fusion?
What is the -80 freezer and liquid nitrogen location of worm strain for
gene
x promoter fusion and/or protein fusion?
Show me all worm strains currently in storage?
Show me all worm strains for gene X?
Show me all worm strains for gene X promoter fusion?
Show me all worm strains for gene X protein fusion?
Show me a table of all worm strains with early embryonic expression?
Show me the location of the imaging data for gene x?
What strains have been shipped to Yale, Stanford etc, and when were
they shipped?
Show me a list of all primers with PCR failure?
What genes have midiprep stocks but no worm strains?
![Page 18: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/18.jpg)
18
Discovery: SQL Does not Terrify Scientists
5/18/10 Garret Cole, eScience Institute
![Page 19: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/19.jpg)
5/18/10 Garret Cole, eScience Institute
What‟s the point?
Databases are underused in (long tail) science
Conventional wisdom says “Scientists won‟t write SQL”
This is utter horseshit
witness SDSS if you don‟t trust us
Instead, we implicate difficulty in
installation
configuration
schema design
performance tuning
data ingest
app-building (over-reliance on GUIs)
So we ask “What kind of platform can support ad hoc scientific Q&A?”
![Page 20: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/20.jpg)
Example Workflow: Environmental
Metagenomics
5/18/10 Garret Cole, eScience Institute
![Page 21: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/21.jpg)
5/18/10 Garret Cole, eScience Institute
![Page 22: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/22.jpg)
5/18/10 Garret Cole, eScience Institute
![Page 23: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/23.jpg)
5/18/10 Garret Cole, eScience Institute
![Page 24: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/24.jpg)
metadata
search results
sequence
data
![Page 25: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/25.jpg)
5/18/10 Garret Cole, eScience Institute
SQL
![Page 26: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/26.jpg)
5/18/10 Garret Cole, eScience Institute
Old UI (1)
![Page 27: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/27.jpg)
5/18/10 Garret Cole, eScience Institute
Old UI (2)
![Page 28: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/28.jpg)
New UI (1)
![Page 29: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/29.jpg)
Usage
about 5 months old
8 labs around UW campus
~200 tables
~400 views
![Page 30: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/30.jpg)
Implementation
Windows Azure app serves GUI and RESTful API for uploading data, saving queries
SQL Azure Database
SQL Server on AWS to spill over 50GB and manage distributed query
shared database, separate schemas per account
Accounts 1:1 with DB roles
![Page 31: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/31.jpg)
View Semantics and Features
“Saved query” = View with attached metadata
Unify views and tables as “datasets”
table = “select * from [raw_table]”
Replacement semantics for name conflicts
old versions materialized and archived
Materialize downstream views
when dependencies deleted
when dependencies become incompatible
Permissions
public vs. private vs. ACLs vs. groups
Sharing, social querying, CQMS*
search, recent queries, friends’ queries, favorites, ratings
facilitate sharing and recommendations of not just whole queries, but common predicates, join patterns, etc.
Discover and expose implicit relationships between datasets
View synthesis [Garcia-molina, Widom, ICDT 2010]
Proactively create views for potential joins, unions, filters* [Khoussainova, CIDR 2009]
![Page 32: Database-as-a-Service for Long Tail Sciencepublic vs. private vs. ACLs vs. groups Sharing, social querying, CQMS* search, recent queries, friends’ queries, favorites, ratings facilitate](https://reader034.fdocuments.in/reader034/viewer/2022051909/5ffd71ec4c80a0010165309c/html5/thumbnails/32.jpg)
SQLShare as a Research Platform
SQL Autocomplete (Nodira Khoussainova, YongChul Kwon, Magda Balazinska)
English to SQL (Bill Howe, Luke Zettlemoyer, Shaminoo Kapoor)
Automatic Mashups and Visualization (Bill Howe, Alicia Key)
Semi-Automatic Logical Design Join, Union Recommendations (Bill Howe, Garret Cole)
View Synthesis: Find Q given result R and database D s.t. R = Q(D)
Crowdsourced SQL authoring
Information Extraction
Logs -> Snippets
English -> Snippets
Crowd -> Snippets
Schema, Data -> Snippets
Raw Data -> Snippets