Representing and Querying Correlated Tuples in Probabilistic Databases
description
Transcript of Representing and Querying Correlated Tuples in Probabilistic Databases
![Page 1: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/1.jpg)
REPRES
ENTIN
G AND
QUERYING
CORRELATE
D TUPLE
S
IN PROBABILISTIC
DATABASES
P R I TH V I R
A J SE N A
M O L DE S H P A N D E
![Page 2: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/2.jpg)
OUTLINEGeneral InfoIntroductionIndependent tuples modelTuple correlationsRepresenting DependenciesQuery evaluationExperimentsConclusions & Work to be done
![Page 3: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/3.jpg)
GENERAL INFOHigh demand for storing uncertain data
Issues with the use of probabilistic databases
1) existent probabilistic databases make simplistic assumptions about the data that make it difficult to use them in applications that naturally produce correlated data2) Most probabilistic databases can only answer a restricted subset of the queries that can be expressed using traditional query languages
A framework that can represent not only probabilistic tuples but also
correlations among them to tackle these limitations
![Page 4: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/4.jpg)
OUTLINEGeneral InfoIntroductionIndependent tuples modelTuple correlationsProbabilistic graphical models & factored representations
Representing DependenciesQuery evaluationExperimentsConclusions & Work to be done
![Page 5: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/5.jpg)
INTRODUCTION (1/2)
Database research has primarily concentrated on how to store and query exact data
Many real-world applications produce large amounts of uncertain data
Databases need to do more than simply store and retrieve; they have to help the user sift through the uncertainty and find the results most likely to be the answer.
![Page 6: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/6.jpg)
INTRODUCTION (2/2)
Numerous approaches (models) proposed to handle uncertainty.
However, most models make assumptions about data uncertainty that restricts applicability (they cannot easily model or handle dependencies and correlations among tuples)
![Page 7: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/7.jpg)
OUTLINEGeneral InfoIntroductionIndependent tuples modelTuple correlationsProbabilistic graphical models & factored representations
Representing DependenciesQuery evaluationExperimentsConclusions & Work to be done
![Page 8: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/8.jpg)
INDEPENDENT TUPLES MODEL(1/2) One of the most commonly used tuple-level uncertainty models, associates existence probabilities with individual tuples and assumes that the tuples are independent of each other
![Page 9: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/9.jpg)
INDEPENDENT TUPLES MODEL (2/2)Evaluating a query via the set of possible worlds is clearly intractable as the number of possible worlds is very bigIntensional semantics guarantee results in accordancewith possible words semantics but are computationallyexpensive. Extensional semantics are computationally cheaper but do not guarantee results in accordance with the possible worlds semantics.
o Base tuples are independent of each other, the intermediate tuples that are generated during query evaluation are typically correlated
![Page 10: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/10.jpg)
OUTLINEGeneral InfoIntroductionIndependent tuples modelTuple correlationsProbabilistic graphical models & factored representations
Representing DependenciesQuery evaluationExperimentsConclusions & Work to be done
![Page 11: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/11.jpg)
TUPLE CORRELATIONS (1/2)
![Page 12: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/12.jpg)
TUPLE CORRELATIONS (2/2)Although the tuple probabilities associated with s1, s2 and t1 are identical, the query results are drastically different across these four databases.
Since both intensional and extensional semantics assume base tuple independence neither can be directly used to do query evaluation in such cases.
![Page 13: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/13.jpg)
OUTLINEGeneral InfoIntroductionIndependent tuples modelTuple correlationsRepresenting correlationsQuery evaluationExperimentsConclusions & Work to be done
![Page 14: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/14.jpg)
REPRESENTING CORRELATIONS(1/3)
1) Associate every tuple t with a Boolean valued random variable Xt
2) f (X) is a function of a (small) set of random variables X, where 0 <= f (X) <=1
3) Associate with each tuple in the probabilistic database a random variable
4) Define factors on (sub)sets of tuple-based random variables to encode correlations.
5) The probability of an instantiation of the database is given by the product of all the factors.
![Page 15: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/15.jpg)
REPRESENTING CORRELATIONS(2/3)Suppose we want to represent mutual exclusivity between tuples s1 and t1. In particular, let us try to represent the possible worlds:
![Page 16: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/16.jpg)
REPRESENTING CORRELATIONS(3/3)Suppose we want to represent positive correlation between t1 and s1.
In particular, let us try to represent the possible worlds:
![Page 17: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/17.jpg)
PROBABILISTIC GRAPHICAL MODEL REPRESENTATION
A probabilistic graphical model is graph whose nodes represent random variables and edges represent correlations
Complete Ind. Mutual Exclusivity Positive Correlation
Xt1
Xs2
Xs1 Xt1
Xs2
Xs1 Xt1
Xs2
Xs1
![Page 18: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/18.jpg)
PROBABILISTIC GRAPHICAL MODEL REPRESENTATION
X1
X2
X3
![Page 19: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/19.jpg)
OUTLINEGeneral InfoIntroductionIndependent tuples modelTuple correlationsProbabilistic graphical models & factored representations
Representing DependenciesQuery evaluationExperimentsConclusions & Work to be done
![Page 20: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/20.jpg)
QUERY EVALUATION: BASIC IDEA Treat intermediate tuples as regular tuples. Carefully represent correlations between
intermediate tuples, base tuples and result tuples to construct a probabilistic graphical model.
Cast the probability computations resulting from query evaluation to inference in probabilistic graphical models.
![Page 21: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/21.jpg)
QUERY EVALUATION: EXAMPLE
![Page 22: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/22.jpg)
![Page 23: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/23.jpg)
QUERY EVALUATION :EXAMPLE PROBABILISTIC GRAPHICAL MODEL
Xs1
Xs2Xt
1
Xr1
Xi2Xi1
Query evaluation problem in Prob. Databases: Compute the probability of the result tuple summed over all possible worlds of the database
Equivalent problem in prob. graph. models: marginal probability computation.
use inference algorithms
![Page 24: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/24.jpg)
Xs2Xt
1
Xr1
Xi2Xi1
![Page 25: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/25.jpg)
REPRESENTING PROBABILISTIC RELATIONS
![Page 26: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/26.jpg)
OUTLINEGeneral InfoIntroductionIndependent tuples modelTuple correlationsProbabilistic graphical models & factored representations
Representing DependenciesQuery evaluationExperimentsConclusions & Work to be done
![Page 27: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/27.jpg)
EXPERIMENTS (1/3)
Database contains 860 publications from CiteSeer [GBL98]. Searched for publications for given (misspelt) author name. Naturally involves mutual exclusivity correlations
![Page 28: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/28.jpg)
EXPERIMENTS (2/3)
Ran experiments on randomly generated TPC-H dataset of size 10MB. The first bar on each query indicates the time it took to run the full query
including all the database operations and the probabilistic computations. The second one indicates the time it took to run only the database
operations using our Java implementation.
![Page 29: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/29.jpg)
EXPERIMENTS(3/3)
The result of running an average query over a synthetically generated dataset containing tuples
![Page 30: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/30.jpg)
OUTLINEGeneral InfoIntroductionIndependent tuples modelTuple correlationsProbabilistic graphical models & factored representations
Representing DependenciesQuery evaluationExperimentsConclusions & Work to be done
![Page 31: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/31.jpg)
CONCLUSIONS There is an increasing need for database
solutions for efficiently managing and querying uncertain data exhibiting complex correlation patterns.
A simple and intuitive framework is presented, based on probabilistic graphical models, for explicitly modeling correlations among tuples in a probabilistic database
![Page 32: Representing and Querying Correlated Tuples in Probabilistic Databases](https://reader036.fdocuments.in/reader036/viewer/2022062310/5681641a550346895dd5d3e8/html5/thumbnails/32.jpg)
WORK TO BE DONEProblem: Although conceptually the approach presented allows for capturing arbitrary tuple correlations, exact query evaluation over large datasets exhibiting complex correlations may not always be feasible.
Future Considerations: Development of approximate query evaluation
techniques that can be used in such cases Develop disk-based query evaluation algorithms so
that their techniques can scale to very large datasets.