Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS
-
Upload
mckenzie-mclean -
Category
Documents
-
view
28 -
download
1
description
Transcript of Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS
![Page 1: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/1.jpg)
Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS
Martin Theobald
Jennifer Widom
Stanford University
![Page 2: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/2.jpg)
2
Outline
• Motivation and Applications
• Working Model for ULDBs
• Trio-One System Architecture
• Efficient Confidence Computations
![Page 3: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/3.jpg)
3
Motivation
• Many applications involve data that is uncertain
(approximate, probabilistic, inexact, incomplete, imprecise, fuzzy, inaccurate, ...)
• Many of the same applications need to track the lineage of their data
Neither uncertainty nor lineage are supported by conventional Database Management Systems (DBMSs)
Coincidence or Fate?Coincidence or Fate?
![Page 4: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/4.jpg)
4
Sample Applications
Deduplication / Entity Resolution• Uncertainty: Match and merge• Lineage: Source records
Information extraction• Uncertainty: Extracted labels and values• Lineage: Original context
Information integration• Uncertainty: Inconsistent information• Lineage: Original sources
![Page 5: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/5.jpg)
5
Sample Applications
Scientific experiments• Uncertainty: Captured (and derived) data
• Lineage: Layers of views
Sensor data• Uncertainty: Sensor values, aggregations,
missing readings
• Lineage: Original readings, views
![Page 6: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/6.jpg)
6
Claim
The connection between uncertainty and lineage goes deeper than just a shared need by several applications
Coincidence or Fate?Coincidence or Fate?
![Page 7: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/7.jpg)
7
Lineage and Uncertainty
Lineage...• Enables simple and consistent representation of
uncertain data
• Allows for intuitive and complete working models
• Correlates uncertainty in query results with uncertainty in the input data
• Can make confidence computations over uncertain data more efficient
Applications use lineage to reduce or resolve uncertainty
![Page 8: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/8.jpg)
8
Goal
A new kind of DBMS in which:
1. Data2. Uncertainty3. Lineage
are all first-class interrelated concepts
Trio Trio }
![Page 9: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/9.jpg)
9
The Trio Trio
1. Data Model Simplest extension to relational model that’s
sufficiently expressive
2. Query Language Simple extension to SQL with well-defined
semantics and intuitive behavior
3. System A complete open-source DBMS that people
want to use
![Page 10: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/10.jpg)
10
The Present
1. Data Model Uncertainty-Lineage Databases (ULDBs)
2. Query Language TriQL
3. System Trio-One: Layered prototype deployable
on top of any standard relational DBMS
![Page 11: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/11.jpg)
11
Running Example: Crime-Solving
Saw(witness,car) // may be uncertain
Drives(person,car) // may be uncertain
Suspects(person) = πperson(Saw ⋈car Drives)
![Page 12: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/12.jpg)
12
Data Model: Uncertainty
An uncertain database represents a set ofpossible instances of data values
• Amy saw either a Honda or a Toyota
• Jimmy drives a Toyota, a Mazda, or both
• Betty saw an Acura with confidence 0.5 or a Toyota with confidence 0.3
• Hank is a suspect with confidence 0.7
![Page 13: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/13.jpg)
13
Our Model for Uncertainty
1. Alternatives (mutually exclusive)
2. ‘?’ (Maybe) Annotations
3. Confidences
![Page 14: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/14.jpg)
14
Our Model for Uncertainty
1. Alternatives: uncertainty about data value(s)
2. ‘?’ (Maybe) Annotations
3. ConfidencesSaw (witness,car)
(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda)
witness car
Amy { Honda, Toyota, Mazda }
=
Three possibleinstances
![Page 15: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/15.jpg)
15
Six possibleinstances
Our Model for Uncertainty
1. Alternatives
2. ‘?’ (Maybe): uncertainty about presence
3. Confidences
Saw (witness,car)
(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda)
(Betty, Acura)?
![Page 16: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/16.jpg)
16
Our Model for Uncertainty
1. Alternatives
2. ‘?’ (Maybe) Annotations
3. Confidences: weighted uncertainty
Saw (witness,car)
(Amy, Honda): 0.5 ∥ (Amy,Toyota): 0.3 ∥ (Amy, Mazda): 0.2
(Betty, Acura): 0.6?
Six possible instances, each with a probability
![Page 17: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/17.jpg)
17
Models for Uncertainty
• Our model (so far) is not especially new
• We spent some time exploring the space of models for uncertainty
• Tension between understandability and expressiveness– Our model is understandable
– But it is not complete, or even closed under common operations
![Page 18: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/18.jpg)
18
Closure and Completeness
Completeness Can represent any finite set of possible
instances
Closure Can represent any result of relational
operators
Note: Completeness Closure
![Page 19: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/19.jpg)
19
Models for Uncertainty
• A-Tuples – Uncertainty on attribute values• (Amy, {Toyota, Mazda})• Not closed
• X-Tuples – Uncertainty on entire tuples• { (Amy, Toyota), (Amy, Mazda) }
• Still not closed
• C-Tables – Tuples + Boolean constraints• { (Amy, Toyota, X=1),
(Amy, Mazda, X<>1),
(Cathy, Honda, X<>1) }• Closed and complete!• Understandable?
![Page 20: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/20.jpg)
20
Our Model is Not Closed
Saw (witness,car)
(Cathy, Honda) ∥ (Cathy, Mazda)
Drives (person,car)
(Jimmy, Toyota) ∥ (Jimmy, Mazda)
(Billy, Honda) ∥ (Frank, Honda)
(Hank, Honda)
Suspects
Jimmy
Billy ∥ Frank
Hank
Suspects = πperson(Saw ⋈car Drives)
???
Does not correctlycapture possibleinstances in theresult
CANNOT
![Page 21: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/21.jpg)
21
to the Rescue
Lineage (provenance): “where data came from”• Internal lineage – pointers to base data
• External lineage – files, URLs, web portals, etc.
In Trio: A Boolean function λ from alternatives to other alternatives (or external sources)
Lineage
![Page 22: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/22.jpg)
22
Example with Lineage
ID Saw (witness,car)
11
(Cathy, Honda) ∥ (Cathy, Mazda)
ID Drives (person,car)
21
(Jimmy, Toyota) ∥ (Jimmy, Mazda)
22
(Billy, Honda) ∥ (Frank, Honda)
23
(Hank, Honda)
ID Suspects
31
Jimmy
32
Billy ∥ Frank
33
Hank
???
Suspects = πperson(Saw ⋈car Drives)
λ(31) = (11,2),(21,2)λ(32,1) = (11,1),(22,1); λ(32,2) = (11,1),(22,2)
λ(33) = (11,1), 23
Correctly captures possible instances inthe result
![Page 23: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/23.jpg)
23
Trio Data Model
1. Alternatives (mutually exclusive)
2. ‘?’ (Maybe) Annotations
3. Confidences
4. Lineage
ULDBs are closed and complete
Uncertainty-Lineage Databases (ULDBs)Uncertainty-Lineage Databases (ULDBs)
![Page 24: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/24.jpg)
24
ULDBs: Lineage
Conjunctive lineage sufficient for most operations
• Disjunctive lineage for duplicate-elimination
• Negative lineage for difference
![Page 25: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/25.jpg)
25
ULDBs: Minimality
A ULDB relation R represents a set of possible
instancesDoes every tuple in R appear in some
possible instance? (no extraneous tuples)
Does every maybe-tuple in R not appear in some possible instance? (no extraneous ‘?’s)
Also
Data-minimalityData-minimality
Lineage-minimalityLineage-minimality
![Page 26: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/26.jpg)
26
Data Minimality Examples
Extraneous ‘?’
. . .
10
(Billy, Honda) ∥ (Frank, Honda)
. . .
. . .
40
Billy ∥ Frank
. . .
?λ(40,1)=(10,1); λ(40,2)=(10,2)
extraneous
. . .
20
(Billy, Honda)
. . .
?. . .
30
(Frank, Honda)
. . .
?
![Page 27: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/27.jpg)
27
Data Minimality Examples
Extraneous tuple
(Diane, Mazda) ∥ (Diane, Acura)
Dianeextraneous
(Diane, Mazda)
(Diane, Acura)
?
??
![Page 28: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/28.jpg)
28
ULDBs: Membership Questions
Does a given tuple t appear in some (all) possible instance(s) of R ?
Is a given table T one of (all of) the possible instances of R ?
Polynomial algorithms based on data-minimizationPolynomial algorithms based on data-minimization
NP-HardNP-Hard
![Page 29: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/29.jpg)
29
Non-Theorists...
Wake Back Up!
![Page 30: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/30.jpg)
30
Querying ULDBs
• Simple extension to SQL
• Formal semantics, intuitive meaning
• Query uncertainty, confidences, and lineage
TriQLTriQL
![Page 31: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/31.jpg)
31
Initial TriQL Example
ID Saw (witness,car)
11
(Cathy, Honda) ∥ (Cathy, Mazda)
ID Drives (person,car)
21
(Jimmy, Toyota) ∥ (Jimmy, Mazda)
22
(Billy, Honda) ∥ (Frank, Honda)
23
(Hank, Honda)SELECT Drives.person INTO SuspectsFROM Saw, DrivesWHERE Saw.car = Drives.car
ID Suspects
31
Jimmy
32
Billy ∥ Frank
33
Hank
???
λ(31) = (11,2),(21,2)λ(32,1)=(11,1),(22,1); λ(32,2)=(11,1),(22,2)
λ(33) = (11,1), 23
![Page 32: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/32.jpg)
32
Formal Semantics
Query Q on ULDB D
DD
D1, D2, …, DnD1, D2, …, Dn
possibleinstances
Q on eachinstance
representationof instances
Q(D1), Q(D2), …, Q(Dn)Q(D1), Q(D2), …, Q(Dn)
D’D’implementation of Q
operational semanticsD + ResultD + Result
![Page 33: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/33.jpg)
33
Operational Semantics
Over conventional relational database:
For each tuple in cross-product of X1, X2, ..., Xn
1. Evaluate the predicate
2. If true, project attr-list to create result tuple
3. If INTO clause, insert into table
SELECT attr-list [ INTO table ]FROM X1, X2, ..., Xn
WHERE predicate
![Page 34: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/34.jpg)
34
Operational Semantics
Over ULDB:For each tuple in cross-product of X1, X2, ..., Xn
1. Create “super tuple” T from all combinations of alternatives
2. Evaluate predicate on each alternative in T ; keep only the true ones
3. Project attr-list on each alternative to create result tuple
4. Details: ‘?’, lineage, confidences
SELECT attr-list [ INTO table ]FROM X1, X2, ..., Xn
WHERE predicate
![Page 35: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/35.jpg)
35
Operational Semantics: Example
SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness,car)
(Cathy, Honda) ∥ (Cathy, Mazda)
Drives (person,car)
(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
![Page 36: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/36.jpg)
36
Operational Semantics: Example
SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness,car)
(Cathy, Honda) ∥ (Cathy, Mazda)
Drives (person,car)
(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)
![Page 37: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/37.jpg)
37
Operational Semantics: Example
SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness,car)
(Cathy, Honda) ∥ (Cathy, Mazda)
Drives (person,car)
(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)
![Page 38: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/38.jpg)
38
Operational Semantics: Example
SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness,car)
(Cathy, Honda) ∥ (Cathy, Mazda)
Drives (person,car)
(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)
![Page 39: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/39.jpg)
39
Operational Semantics: Example
(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)
SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness,car)
(Cathy, Honda) ∥ (Cathy, Mazda)
Drives (person,car)
(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
(Cathy,Honda,Hank,Honda) ∥ (Cathy,Mazda,Hank,Honda)
![Page 40: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/40.jpg)
40
Operational Semantics: Example
(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)
SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness,car)
(Cathy, Honda) ∥ (Cathy, Mazda)
Drives (person,car)
(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
(Cathy,Honda,Hank,Honda) ∥ (Cathy,Mazda,Hank,Honda)
![Page 41: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/41.jpg)
41
Operational Semantics: Example
(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)
SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness,car)
(Cathy, Honda) ∥ (Cathy, Mazda)
Drives (person,car)
(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
(Cathy,Honda,Hank,Honda) ∥ (Cathy,Mazda,Hank,Honda)
![Page 42: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/42.jpg)
42
Operational Semantics: Example
SELECT Drives.person INTO SuspectsFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness,car)
(Cathy, Honda) ∥ (Cathy, Mazda)
Drives (person,car)
(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
Suspects
Jim ∥ Bill
Hank
??
λ( ) = ...λ( ) = ...
![Page 43: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/43.jpg)
43
Confidences
Confidences supplied with base data
Trio computes confidences on query results• Default probabilistic interpretation• Can choose to plug in different arithmetic
Saw (witness,car)
(Cathy, Honda): 0.6 ∥ (Cathy, Mazda): 0.4
Drives (person,car)
(Jim, Mazda): 0.3 ∥ (Bill, Mazda): 0.6
(Hank, Honda) ): 1.0
Suspects
Jim: 0.12 ∥ Bill: 0.24
Hank: 0.6
?
?
?
0.3 0.4
0.6
ProbabilisticProbabilistic Min
![Page 44: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/44.jpg)
44
TriQL: Querying Confidences
Built-in function: Conf()
SELECT Drives.person INTO Suspects FROM Saw, Drives WHERE Saw.car = Drives.car AND Conf(Saw) > 0.5 AND Conf(Drives) >
0.8
SELECT Drives.person INTO Suspects FROM Saw, Drives WHERE Saw.car = Drives.car AND Conf(*) > 0.4
![Page 45: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/45.jpg)
45
TriQL: Querying Lineage
Built-in join predicate: Lineage()
SELECT Saw.witness INTO AccusesHank FROM Suspects, Saw WHERE Lineage(Suspects,Saw) AND Suspects.person = ‘Hank’
Also works for transitive lineage:
SELECT witness FROM AccusesHank WHERE Lineage(Saw,AccusesHank)
![Page 46: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/46.jpg)
46
Additional Query Constructs
• “Horizontal subqueries”Refer to tuple alternatives as a relation
• Unmerged (allow horizontal duplicates)
• Flatten, GroupAlts
• NoLineage, NoConf, NoMaybe
• Query-computed confidences
• SQL-like data modification statements
![Page 47: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/47.jpg)
47
Final Example Query
PrimeSuspect (crime#, accuser, suspect)
(1, Amy, Jimmy) ∥ (1, Betty, Billy) ∥ (1, Cathy, Hank)
(2, Cathy, Frank) ∥ (2, Betty, Freddy)
person score
Amy 10
Betty 15
Cathy 5
Suspects
Jimmy: 0.33 ∥ Billy: 0.5 ∥ Hank: 0.166
Frank: 0.25 ∥ Freddy: 0.75
Credibility
List suspects with conf values based on accuser credibility
![Page 48: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/48.jpg)
48
Final Example Query
PrimeSuspect (crime#, accuser, suspect)
(1, Amy, Jimmy) ∥ (1, Betty, Billy) ∥ (1, Cathy, Hank)
(2, Cathy, Frank) ∥ (2, Betty, Freddy)
person score
Amy 10
Betty 15
Cathy 5
Suspects
Jimmy: 0.33 ∥ Billy: 0.5 ∥ Hank: 0.166
Frank: 0.25 ∥ Freddy: 0.75
Credibility
SELECT suspect, score/[sum(score)] as confFROM (SELECT suspect, (SELECT score FROM Credibility C WHERE C.person = P.accuser) FROM PrimeSuspect P)
![Page 49: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/49.jpg)
49
Final Example Query
PrimeSuspect (crime#, accuser, suspect)
(1, Amy, Jimmy) ∥ (1, Betty, Billy) ∥ (1, Cathy, Hank)
(2, Cathy, Frank) ∥ (2, Betty, Freddy)
person score
Amy 10
Betty 15
Cathy 5
Suspects
Jimmy: 0.33 ∥ Billy: 0.5 ∥ Hank: 0.166
Frank: 0.25 ∥ Freddy: 0.75
Credibility
SELECT suspect, score/[sum(score)] as confFROM (SELECT suspect, (SELECT score FROM Credibility C WHERE C.person = P.accuser) FROM PrimeSuspect P)
![Page 50: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/50.jpg)
50
Final Example Query
PrimeSuspect (crime#, accuser, suspect)
(1, Amy, Jimmy) ∥ (1, Betty, Billy) ∥ (1, Cathy, Hank)
(2, Cathy, Frank) ∥ (2, Betty, Freddy)
person score
Amy 10
Betty 15
Cathy 5
Suspects
Jimmy: 0.33 ∥ Billy: 0.5 ∥ Hank: 0.166
Frank: 0.25 ∥ Freddy: 0.75
Credibility
SELECT suspect, score/[sum(score)] as confFROM (SELECT suspect, (SELECT score FROM Credibility C WHERE C.person = P.accuser) FROM PrimeSuspect P)
“Scaled as conf”
“Uniform as conf”
…
“Scaled as conf”
“Uniform as conf”
…
![Page 51: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/51.jpg)
51
The Trio System
Version 1Entirely on top of a conventional DBMS
Surprisingly easy and complete, reasonably efficient
![Page 52: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/52.jpg)
52
The Trio System: Version 1
Lin:RaidF
r tableaidT
o
R aid xid C
Relational DBMS
create trio table T(A,B)
select C into R ...
TrioMetadat
a
Trio APITrio API
SQL commands
• Result cursors• Traverse lineage
T aid xid A B
![Page 53: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/53.jpg)
53
Relation Encoding
aid xid
conf
person
car
21 1 2 Jim Mazda
22 1 2 Bill Mazda
23 2 1 Hank Honda
Suspects
Jim ∥ Bill
Hank
aid xid conf person
31 1 3 Jim
32 1 3 Bill
33 2 2 Hank
Saw (witness,car)
(Cathy, Honda) ∥ (Cathy, Mazda)
Drives (person,car)
(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
aid xid conf witness
car
11 1 2 Cathy Honda
12 1 2 Cathy Mazda
??
aidFr
table aidTo
31 Saw 12
31 Drives 21
32 Saw 12
32 Drives 22
33 Saw 11
33 Drives 23
SawDrives
SuspectsLin:Suspects
![Page 54: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/54.jpg)
54
Relation Encoding with Confidences
aid xid
conf
person car
21 1 0.3 Jim Mazda
22 1 0.6 Bill Mazda
23 2 1.0 Hank Honda
aid xid conf person
31 1 NULL Jim
32 1 NULL Bill
33 2 NULL Hank
Saw (witness,car)
(Cathy, Honda): 0.6 ∥ (Cathy, Mazda): 0.4
Drives (person,car)
(Jim, Mazda): 0.3 ∥ (Bill, Mazda): 0.6
(Hank, Honda)
aid xid conf witness
car
11 1 0.6 Cathy Honda
12 1 0.4 Cathy Mazda
SawDrives
Suspects
0.12
0.24
0.6
![Page 55: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/55.jpg)
55
Query Translation
Persistent results: Query Q into result table R
1. Run query Q’ to produce “super-result” R Q’ ≈ Q but adds aid/xid’s of source tuples, joins lineage tables for lineage() predicates
2. Group R into alternatives, generate new xid’s3. Materialze lineage data into lin:R
4. Compute confidences? (optional)5. Add/update metadata: schemas, confidence info, lineage structure
Transient results: stop at 2, return cursor over new xid’s
![Page 56: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/56.jpg)
56
The Trio System: Version 1
R aid xid C
Relational DBMS
create trio table T(A,B)
select C into R ...
TrioMetadat
a
Trio APITrio API
SQL commands
• Result cursors• Traverse lineage
T aid xid A B
Lin:RaidF
r tableaidT
o
![Page 57: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/57.jpg)
57
The Trio System: Version 1
Trio API and translator(TriQL parser in Python)
Trio API and translator(TriQL parser in Python)
TrioMetadat
a
TrioExplorer(GUI client)
TrioExplorer(GUI client)
Trio Stored
Procedures
EncodedData
TablesLineageTables
Standard SQL
• Data & system columns• A-IDs for alternatives• “Verticalized” through shared X-IDs• Confidences
• One lineage table per derived table• Connecting A-IDs
• Table types• Schema-level lineage structure
• conf()• lineage()
• TriQL queries• DDL commands• Schema browsing• Table browsing• Explore lineage• On-demand confidence computation
Command-lineclient
Command-lineclient
Relational DBMS
SPI-Plugin for
PostgreSQLReplacing
“Fetch-All”
SPI-Plugin for
PostgreSQLReplacing
“Fetch-All”
![Page 58: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/58.jpg)
58
The Trio System: Version 2
Trio API and translatorTrio API and translator
TrioExplorer(GUI client)
TrioExplorer(GUI client)Command-line
client
Command-lineclient
Trio DBMS
SpecializedTrio Data Structures
SpecializedTrio Processing & Optimizing
Standard SQL Direct function calls
![Page 59: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/59.jpg)
59
Efficient Confidence Computation
Previous approach (probabilistic databases)• Each operator computes confidences during
query execution• Only certain (“safe”) query plans allowed
In ULDBs Confidence of alternative A is function of
confidences in λ(A)
Our approach• Decoupled data and confidence computation
• Allow any query plan for data computation
• Compute confidences on-demand based on lineage
![Page 60: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/60.jpg)
60
CIDs & Confidence Memoization
Identify set of Closest Independent Descendants (CIDs) for each relation in the schema-level lineage DAG
relations with no shared ancestors
Batched confidence computations in SQL• Select unions of distinct base alternatives from
all root-to-leaf lineage paths• With CIDs and memoization enabled
– if confidences are memoized then stop at CIDs
– else traverse down to the leafs (base data) and memoize at CIDs
(ongoing work)
![Page 61: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/61.jpg)
61
“Probabilistic TPC-H” Setup
• Horizontal split of TPC-H tables
• Uniform-random number of alternatives [1-10]
• Uniform confidences
PartsuppsPartsuppsLineitemsLineitemsOrdersOrders
O2O2O1
O1
OO LL PP
OLOL LPLP
OLPOLP1.5M
1.0M 0.1M
1.0M 0.5M 0.7M
O3O3 L2
L2L1L1 L3
L3 L4L4 P2
P2P1P1 P3
P3
OO LL PP
CID(OLP) = {O, L, P}
0.8M6M1.5M
![Page 62: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/62.jpg)
62
Confidence Computation Times
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
5,000
Exe
cuti
on
Tim
e (s
eco
nd
s)
O L P OL LP OLP Total
No CIDs
CIDs
![Page 63: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/63.jpg)
63
Per-Tuple vs. Batching (no CIDs)
![Page 64: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/64.jpg)
64
Memoization & Join Multiplicity
![Page 65: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/65.jpg)
65
Current Topics
System• Full query language• Nice interface• Performance experiments• Demo applications
Algorithms: confidence & lineage computation,
extraneous data, membership questions• Minimize lineage traversal
• Confidence memoization
• Efficient batch computations
![Page 66: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/66.jpg)
66
Future Directions
Theory, Model, Algorithms• Unlimited opportunities
System• Storage, indexing, partitioning• Statistics and query optimization
More features• Top-k by confidence • Incomplete relations; continuous uncertainty;
correlated uncertainty• External lineage; update lineage; versioning
![Page 67: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS](https://reader035.fdocuments.in/reader035/viewer/2022062720/568134b1550346895d9bcc7b/html5/thumbnails/67.jpg)
Search “stanford trio”
Current Trio team:Parag Agrawal, Anish Das Sarma, Raghotham
Murthy, Michi Mutsuzaki, Shubha Nabar, Tomoe Sugihara,
Martin Theobald, Jennifer Widom
Special thanks to:Ashok Chandra, Alon Halevy, Jeff Ullman,
Omar Benjelloun
but don’t forgetthe lineage…