Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and...
Transcript of Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and...
![Page 1: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/1.jpg)
M. Franklin BNCOD 2009 7 July 2009
Dataspaces: Progress and Prospects
Michael J. Franklin UC Berkeley & Truviso
BNCOD July 7, 2009
![Page 2: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/2.jpg)
M. Franklin BNCOD 2009 7 July 2009
Dataspaces: Progress and Prospects
Michael J. Franklin UC Berkeley & Truviso
BNCOD July 7, 2009
Dataspace: The Final Frontier?
![Page 3: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/3.jpg)
M. Franklin BNCOD 2009 7 July 2009
Outline
• Dataspaces – some history • Dataspaces – what are they, really? • Some emerging examples • Example technologies • What’s missing? • What’s next?
![Page 4: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/4.jpg)
M. Franklin BNCOD 2009 7 July 2009
The SIGMOD Credo
Codd made relations, all else is the work of man.
Leopold Kronecker (paraphrased by Raghu Ramakrishnan?)
![Page 5: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/5.jpg)
M. Franklin BNCOD 2009 7 July 2009
The Politics of Dataspaces • Roots: CIDR 2005 Conference
– “Gloom and Doom” panel – David Dewitt’s call for a unifying goal – Juxtaposed with lots of great work across the
web, new devices, scalable computing, …
![Page 6: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/6.jpg)
M. Franklin BNCOD 2009 7 July 2009
An Aside: The cycle of DB Angst
Did we “miss the boat” on something cool?
Are we polishing a “round ball”?
![Page 7: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/7.jpg)
M. Franklin BNCOD 2009 7 July 2009
Dataspaces: Timeline • CIDR 2005 (January) • A small group started looking for
commonality and a “grand challenge” • We put a name on it. • Ran an early draft by an impromptu group
of advisors at SIGMOD 2005 (June 05). • Wrote it up for SIGMOD Record (Dec 05)
[Franklin, Halevy, Maier] • Kept working on pretty much what we
were already doing!
![Page 8: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/8.jpg)
M. Franklin BNCOD 2009 7 July 2009
What’s in a name?
![Page 9: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/9.jpg)
M. Franklin BNCOD 2009 7 July 2009
Dataspaces – what are they?
![Page 10: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/10.jpg)
M. Franklin BNCOD 2009 7 July 2009
Dataspaces Inclusive
Deal with all the data of interest – in whatever form
Co-existence not Integration No integrated schema, no single warehouse,
no ownership required Pay-as-you-go
– Keyword search is bare minimum. – More function and increased consistency
as you add work.
![Page 11: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/11.jpg)
M. Franklin BNCOD 2009 7 July 2009
Compare to Data Integration
A quintessential schema-first approach.
wrapper wrapper wrapper wrapper wrapper
Mediated Schema
Semantic mappings
Courtesy of Alon Halevy
![Page 12: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/12.jpg)
M. Franklin BNCOD 2009 7 July 2009
Structured Data Management
![Page 13: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/13.jpg)
M. Franklin BNCOD 2009 7 July 2009
A “Modern” View of Data Management
![Page 14: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/14.jpg)
M. Franklin BNCOD 2009 7 July 2009
The Structure Spectrum
Structured (schema-first)
Relational Database
Formatted Messages
Semi-Structured (schema-later)
XML Tagged
Text/Media
Unstructured (schema-never)
Plain Text Media
![Page 15: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/15.jpg)
M. Franklin BNCOD 2009 7 July 2009
Whither Structured Data? • Conventional
Wisdom: only 20% of data
is structured.
• Decreasing due to: – Consumer
applications – Enterprise search – Media applications
![Page 16: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/16.jpg)
M. Franklin BNCOD 2009 7 July 2009
But Structure Matters! Functionality
Time (and cost)
Structured (schema-first)
Unstructured (schema-less)
Dataspaces (pay-as-you-go)
Structure enables computers to help users manipulate and maintain the data.
![Page 17: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/17.jpg)
M. Franklin BNCOD 2009 7 July 2009
An Alternative View
Strong Weak
Strong
Weak
Desktop Search
Web Search Virtual
Organization
Federated DBMS
DBMS
Semantic Integration
Administrative Control
![Page 18: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/18.jpg)
M. Franklin BNCOD 2009 7 July 2009
Some Interesting Points on the Structure Spectrum
![Page 19: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/19.jpg)
M. Franklin BNCOD 2009 7 July 2009
![Page 20: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/20.jpg)
M. Franklin BNCOD 2009 7 July 2009
![Page 21: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/21.jpg)
M. Franklin BNCOD 2009 7 July 2009
![Page 22: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/22.jpg)
M. Franklin BNCOD 2009 7 July 2009
![Page 23: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/23.jpg)
M. Franklin BNCOD 2009 7 July 2009
![Page 24: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/24.jpg)
M. Franklin BNCOD 2009 7 July 2009
Web-scale Structured Data�
23
For years, Microsoft Corporation CEO Bill Gates was against open source. But today he appears to have changed his mind. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“
Richard Stallman, founder of the Free Software Foundation, countered saying…
Name Title Organization Bill Gates CEO Microsoft Bill Veghte VP Microsoft Richard Stallman Founder Free Soft..
HTML Tables extracted from the Web�
Rela6ons generated by informa6on extrac6on from web pages �
Database Views in the Deep Web accessed through HTML Forms on the Web�
![Page 25: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/25.jpg)
M. Franklin BNCOD 2009 7 July 2009
The Future of Analytics • Analytics traditionally a
key DB use case – Need to understand
data to manipulate it • “Barbarians at the Gate”
– Procedural cloud-based approaches gaining interest
– Scalability for massive data sets – But, we’ve seen this movie before!
![Page 26: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/26.jpg)
M. Franklin BNCOD 2009 7 July 2009
The View From the Clouds • “Pig Latin” [Olston et al. SIGMOD 08]
– Why have a schema? 1) Transactional (referential?) Consistency 2) Fast point look ups through indexes 3) Curation for future (other) users
– Flexible, optional, nested data model – Data remains in files (no admin)
• “Column Family” models of BigTable, Hbase, Cassandra, CouchDB, …
• “Schema on Read”? == Errors on Read?
![Page 27: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/27.jpg)
M. Franklin BNCOD 2009 7 July 2009
Other Examples Personal Information Management(iMemex),
Question answering, Scientific Collaboration
![Page 28: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/28.jpg)
M. Franklin BNCOD 2009 7 July 2009
Outline
• Dataspaces – some history • Dataspaces – what are they, really? • Some emerging examples • Example technologies • What’s missing? • What’s next?
![Page 29: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/29.jpg)
M. Franklin BNCOD 2009 7 July 2009
DataSpace Technology
• Probabilistic Databases • Schema Matching • Judicious use of User Input • Approx. Query Answering • Uncertainty Management • Data Model Learning • Provenance and Annotation • Structured + Unstructured Search
![Page 30: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/30.jpg)
M. Franklin BNCOD 2009 7 July 2009
Roomba: Soliciting User Feedback* • A “web 2.0” spin on Reference
Reconciliation. – Inspired by “ESP Game” for image labeling by
Von Ahn & Dabbish; “MOBS” architecture by Doan et al.
• Use automated techniques to generate candidate matches.
• Ask users to confirm. • Problem: which matches are most important?
* “Soliciting User Feedback in a Dataspace System”, Shawn Jeffery, Michael Franklin, Alon Halevy; SIGMOD 2008.
![Page 31: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/31.jpg)
M. Franklin BNCOD 2009 7 July 2009
Roomba Overview
• Based on Value of Perfect Information (VPI) (see Russell and Norvig)
• Choose matches that provide largest increase in dataspace utility.
• Must consider: Query Workload, # Records per Term, and Confidence of Matches.
![Page 32: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/32.jpg)
M. Franklin BNCOD 2009 7 July 2009
Roomba: Sample Result Perfect Knowledge
VPI-Based Ordering
![Page 33: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/33.jpg)
M. Franklin BNCOD 2009 7 July 2009
Data Integration at Web-scale�• A typical data integration solution is impractical for
web-scale data – Too many domains of interest (Web Data is about
everything) – Huge number of sources for each domain – Designing Mediated Schema is infeasible – Data sources are dirty, incomplete and lack of meta-data
• Solution: A Data Integration Solution that is – Automated – Best Effort – Pay-as-you-go
“Functional Dependency Generation and Applications in Pay-as-you-go Data Integration Systems” WebDB 2009 Wang, Dong, Das Sarma, Franklin, Halevy
![Page 34: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/34.jpg)
M. Franklin BNCOD 2009 7 July 2009
Probabilistic Functional Dependencies (pFDs)�
• Idea - use probabilistic Functional Dependencies to guide automated approaches – Normalize mediated schemas – Identify low quality data sources
• Definition of a probabilistic FD (pFD) X p A, p is the likelihood of FD holds in general
• “Learn” pFDs by counting data and schema instances – Note: this will get you a bad grade in your database course.
• Related work – TANE, CORDS – Conditional Functional Dependences
![Page 35: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/35.jpg)
M. Franklin BNCOD 2009 7 July 2009
Results for pFDs Generation Algorithms on “Web Tables”�
Fidelity of generated FDs with confidence 0.8 with “golden standard” FDs
![Page 36: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/36.jpg)
M. Franklin BNCOD 2009 7 July 2009
Normalizing a Mediated Schema�• Generating the minimal pFD-set
– Prune low-probability pFDs – Prune pFDs that can be generated by transitivity
"tle�author authors author(s) � journal "tle
journal �
issn�
subject subjects�
• Avoid over-‐spli8ng
0.95�
0.9�
0.95�
0.95�0.92�
0.97�
conference mee"ng
colloquium�
zip �
address �
city�
0.95�
0.9�1.0�
![Page 37: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/37.jpg)
M. Franklin BNCOD 2009 7 July 2009
Results for Schema Normalization�
![Page 38: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/38.jpg)
M. Franklin BNCOD 2009 7 July 2009
PayGo Quality Metrics�
• Measuring quality of data sources • Measuring and Improving quality of a integration
(e.g. mediated schema, schema mapping, etc.)
• FD-based Quality measuring framework is an example: – Identify Dirty Data sources – Improving Mediated Schema
![Page 39: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/39.jpg)
M. Franklin BNCOD 2009 7 July 2009
What’s Missing? • Metrics!!!!
– Key idea: you pay more to get better data. Must define “better”!
– Application-, user-, context-dependent – Relation to Data Quality work
• Benchmarks – Key to progress
• Support for collaboration/data-sharing/visualization – Particularly with uncertainty in base data and inferences
• More data/media types • Focus on “serious” analytics workloads • …Your ideas here…
![Page 40: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/40.jpg)
M. Franklin BNCOD 2009 7 July 2009
Metcalf’s (not Moore’s) Law will drive future DBMS inovation
Data Center
EDGE
Data Warehouse
Inventory
PoS ERP
• More connectivity means more data to integrate.
• Dataspace-style techniques will play an ever-larger role.
![Page 41: Dataspaces: Progress and Prospects - Peoplefranklin/Talks/BNCOD09.pdf · Dataspaces: Progress and Prospects Michael J. Franklin UC Berkeley & Truviso BNCOD July 7, 2009 ... DBMS DBMS](https://reader030.fdocuments.in/reader030/viewer/2022040522/5e8086c5bc41707fd7327d8e/html5/thumbnails/41.jpg)
M. Franklin BNCOD 2009 7 July 2009
Conclusions • More connectivity means more data. • Many would simply throw away the benefits
of structure due to “schema-first” problems. • Dataspaces provide a framework for
intelligent use of structural information. • Could also meet the goal of a “grand
challenge” for the DB Community.
As an inherently unsolvable problem…
Dataspace may, in fact, be the final frontier.