YesWorkflow: Retrospective Provenance Without a Runtime Provenance Recorder
Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))
description
Transcript of Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))
![Page 1: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/1.jpg)
1
PROVENANCE FOR THE CLOUD(USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10))Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo SeltzerHarvard School of Engineering and Applied Sciences
![Page 2: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/2.jpg)
2
Outline Introduction Background Provenance System Property Architecture & Protocol Evaluation Conclusion & Comment
![Page 3: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/3.jpg)
3
Introduction Problem to Solve
Implement a provenance aware storage system in current cloud stores ( use Amazon )
![Page 4: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/4.jpg)
4
Background(1/3) Provenance
Data has two critical components What it is ( contents ) Where it came from ( ancestry )
The provenance is the description of how the object was derived.
The metadata that describes the history of an object Why use provenance?
Use case – Slogan Digital Sky Survey (SDSS) Debug Experimental Results Detect and Avoid Faulty Data Propagation Improving Text Search Result
Security
![Page 5: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/5.jpg)
5
![Page 6: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/6.jpg)
6
Background(2/3) Provenance can be abstract defined as a
directed acyclic graph ( DAG ) Nodes
objects : files, processes, tuples, data sets, etc Have attributes
Command line arguments Name and Version number
Edges Indicate a dependency between the objects
![Page 7: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/7.jpg)
7
Justification Report
is justified by
is response to
is caused by
is caused by
is response to
is response to
is based on
is based on
is based on
is caused by
Data Collection Request
I1
Blood Test Request
I2
Donor Data Request
I4Donation DecisionI9
Blood Test Request
I6
Decision Request
I8
Blood Test Result
I7
Donor Data
I5
Patient Brain Death Notification
I3
![Page 8: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/8.jpg)
8
Background(3/3) Eventual Consistency
A weaker form of data consistency During a sufficient long period of time, and
no updates are sent, we can expect that all replicas in system will be consistent
![Page 9: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/9.jpg)
9
Provenance System Property(1/2)
Provenance Data Coupling An object and its provenance must match The provenance must accurately and
completely describe the data Multi-object Causal Ordering
The causal relationship among objects A system must ensure that an object’s
ancestors and their provenance are persistent before making the object itself persistent
![Page 10: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/10.jpg)
10
Justification Report
is justified by
is response to
is caused by
is caused by
is response to
is response to
is based on
is based on
is based on
is caused by
Data Collection Request
I1
Blood Test Request
I2
Donor Data Request
I4Donation DecisionI9
Blood Test Request
I6
Decision Request
I8
Blood Test Result
I7
Donor Data
I5
Patient Brain Death Notification
I3
![Page 11: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/11.jpg)
11
Provenance System Property(2/2) Data Independent Persistence
Ensure a system retain an object’s provenance, even if the object is removed
Efficient Query Be accessible to users who want to access
or verify provenance properties of their data
![Page 12: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/12.jpg)
12
Architecture(1)
![Page 13: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/13.jpg)
13
Architecture(2) – S3 Simple Storage Service(S3)
Amazon’s storage service An object store where the size of objects
can range from 1 byte to 5GB With each objects, clients can store up to
2KB of metadata Use SOAP or REST API
PUT, GET, HEAD, COPY, DELETE
![Page 14: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/14.jpg)
14
Architecture(3) - SimpleDB SimpleDB
An Amazon’s service that provides the functionality of indexing and querying data
Data model consist items that are described by <attribute,value> pairs
Each item can have 256 <attribute,value> pairs
Each attribute name and value can be as large as 1KB
![Page 15: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/15.jpg)
15
Architecture(4) - SQS Simple Queueing Service
Distributed messaging system that allows users to exchange messages between various distributed components in their systems
8KB limit of the size of the message In this paper, SQS is used as a write-ahead
log(WAL)
![Page 16: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/16.jpg)
16
Architecture(5) -- PASS Provenance-Aware Storage System
A storage system that automatically collects , stores., manages, and provides search for provenance
Monitor system calls Generate provenance and sending both
provenance and data to PA-S3fs
![Page 17: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/17.jpg)
17
Architecture(6) – PA-S3fs Provenance Aware S3 File System
Caches data and provenance on the client to reduce traffic to S3
Send data and provenance to the cloud
![Page 18: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/18.jpg)
18
Protocol(1)
![Page 19: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/19.jpg)
19
Protocol(2) Protocol 1 ( P1 )
Standalone Cloud Store Map each file to an S3 object and store the
provenance as a separate S3 object Provenance object
Named with a uuid Contain the name of primary object
Primary object metadata Version number and uuid
![Page 20: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/20.jpg)
20
Protocol(3) P1 does not support
data coupling But can detect
decoupling Query is inefficient
Need retrieve all provenance
Client
PUT:Provenance
OKPUT:Data
OK
S3
![Page 21: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/21.jpg)
21
Protocol(4)
![Page 22: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/22.jpg)
22
Protocol(5) Protocol 2 ( P2 )
Cloud store with a cloud database Store provenance as one SimpleDB item
If item is larger than 1KB SimpleDB limit store provenance as S3 object save the pointer in attribute-value
![Page 23: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/23.jpg)
23
Protocol(6) Provide efficient
provenance queries Does not support
data coupling
Client
PUT: Prov > 1KB
OK
PUT:Data
OK
S3
SimpleDB
OK
BatchPUTAttributes: Prov
![Page 24: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/24.jpg)
24
Protocol(7) Protocol 3 ( P3 )
Cloud store with Cloud Database and Messaging Service
Use SQS as a write-ahead log (WAL) 8KB limit Store large objects as temporary S3 objects , and
record the pointer in WAL Commit daemon
Read the log records Assemble all the records belonging to a transaction Ignore the records if the client crash
![Page 25: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/25.jpg)
25
ClientPUT: Temp data copy
OK
Copy:Data
OK
S3
SimpleDB
OK
BatchPUTAttributes
SQSSendMessage: Prov
OK
CommitdRecvMess
age
S3
S3PUT:Prov>1
KB
Delete:temp
Delete:Msg
OK
OK OK
![Page 26: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/26.jpg)
26
Protocol(9)
![Page 27: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/27.jpg)
27
Evaluation(1) Workload
CVSROOT nightly backup IO intensive 240 operations
Blast Mix of compute and IO operations Provenance tree has a depth of 5 10773 operations
Challenge Mix of compute and IO operations Provenance tree has a depth of 11 6179 operations
![Page 28: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/28.jpg)
28
Evaluation(2)EC2 instance
Local machine
![Page 29: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/29.jpg)
29
Evaluation(3) Query performance
Q1 Retrieve all the provenance ever recorded
Q2 Retrieve the provenance of all version of one
object Q3
Find all files that were directly output by Blast Q4
Find all the descendants of files derived from Blast
![Page 30: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/30.jpg)
30
Evaluation(4)
![Page 31: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/31.jpg)
31
Conclusion Definition of properties that provenance
systems must exhibit Design and implementation of three
protocols for storing provenance and data on the cloud
All three protocols have reasonable overhead in time and minimal financial overhead
![Page 32: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/32.jpg)
32
Comment Economy
Provenance can not increase profit directly Customer loyalty
Security Provenance can ensure correctness of files But it may contain sensitive information
![Page 33: Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681620b550346895dd235c3/html5/thumbnails/33.jpg)
33
THE END