T OWARDS A B IG D ATA C OMMUNITY C HALLENGE
description
Transcript of T OWARDS A B IG D ATA C OMMUNITY C HALLENGE
![Page 1: T OWARDS A B IG D ATA C OMMUNITY C HALLENGE](https://reader036.fdocuments.in/reader036/viewer/2022062310/56815ef0550346895dcdb147/html5/thumbnails/1.jpg)
TOWARDS A BIG DATACOMMUNITY CHALLENGE
Tilmann Rabl, Florian Stegmaier,Michael Granitzer and Hans-Arno Jacobsen
3rd Workshop on Big Data BenchmarkingJuly 16-17
Xi‘an, China
![Page 2: T OWARDS A B IG D ATA C OMMUNITY C HALLENGE](https://reader036.fdocuments.in/reader036/viewer/2022062310/56815ef0550346895dcdb147/html5/thumbnails/2.jpg)
BIG DATA – WHY COMMUNITY CHALLANGES MATTER
• Big Data is a major buzzword in scientific's world- Conferences, workshops, tutorials, panels- Component benchmark, end-to-end systems, etc.
• Variety leads to incomparability of results
• Research communities run challenges to… enable comparability of results… foster evolution of a research field… “Kites rise highest against the wind, not with it.” (W. Churchill)
![Page 3: T OWARDS A B IG D ATA C OMMUNITY C HALLENGE](https://reader036.fdocuments.in/reader036/viewer/2022062310/56815ef0550346895dcdb147/html5/thumbnails/3.jpg)
WHAT SHOULD BE IN THE FOCUS?
DATA!
„[...] other communities, like information retrieval, natural language processing, or Web research, have a much richer and agile culture in creating, disseminating, and re-using interesting new data resources
for scientific experimentation [...]” – G. Weikum, SIGMOD Blog
HOW SHOULD IT BE?
INTERESTING!
![Page 4: T OWARDS A B IG D ATA C OMMUNITY C HALLENGE](https://reader036.fdocuments.in/reader036/viewer/2022062310/56815ef0550346895dcdb147/html5/thumbnails/4.jpg)
HOW ARE „THE OTHERS“ DOING?• Information retrieval community:
– TREC, TRECVid (task-based, measurable scientific impact)
– CLEF Initiative (task-based, benchmarking initiatives)
• Multimedia community:– Multimedia Grand Challenge (tasks defined by “global players”,
e.g., Yahoo! and Microsoft)
– Open Source Software Comp. (foster community activities)
• Semantic Web guys:– Linked Data Cup (data generation)
– Semantic Web in-Use (mashup creation)
![Page 5: T OWARDS A B IG D ATA C OMMUNITY C HALLENGE](https://reader036.fdocuments.in/reader036/viewer/2022062310/56815ef0550346895dcdb147/html5/thumbnails/5.jpg)
SUCCESSFUL COMMUNITY CHALLENGES: TAKE-HOME MESSAGE
• Challenges are not a single event• On-going process, running through different stages:
– Data generation– Solving restricted, high-impact issues– Fostering open source frameworks – Assembling mashups
• Accepted by the community
![Page 6: T OWARDS A B IG D ATA C OMMUNITY C HALLENGE](https://reader036.fdocuments.in/reader036/viewer/2022062310/56815ef0550346895dcdb147/html5/thumbnails/6.jpg)
BRAINSTORMING AREA:STRUCTURE OF THE CHALLENGE
• Challenge needs to be focused on specific tasks:– Tasks assemble a “Big Data pipeline”– Specified by academia and industry
• Hybrid approach to engage participants:– Utilize benchmark activities– Computing tasks on “Open Data”
![Page 7: T OWARDS A B IG D ATA C OMMUNITY C HALLENGE](https://reader036.fdocuments.in/reader036/viewer/2022062310/56815ef0550346895dcdb147/html5/thumbnails/7.jpg)
TIME TO BREAKOUT!• Discussions should focus on:
– Where to find large-scale, interesting “open” data sets?– Which tasks could form a sophisticated Big Data
pipeline ensuring a broad range of implementations?
BREAKOUT HOW-TO:• Breakout and student groups as
yesterday• Prepare one slide for each question