An Experiment: How to Plan it, Run it, and Get it Published Gerhard Weikum Thoughts about the...
-
Upload
neal-haynes -
Category
Documents
-
view
215 -
download
2
Transcript of An Experiment: How to Plan it, Run it, and Get it Published Gerhard Weikum Thoughts about the...
![Page 1: An Experiment: How to Plan it, Run it, and Get it Published Gerhard Weikum Thoughts about the Experimental Culture in Our Community.](https://reader036.fdocuments.in/reader036/viewer/2022082817/56649da95503460f94a97513/html5/thumbnails/1.jpg)
An Experiment:How to Plan it, Run it, and Get it Published
Gerhard Weikum
Thoughts about the Experimental Culture in Our Community
![Page 2: An Experiment: How to Plan it, Run it, and Get it Published Gerhard Weikum Thoughts about the Experimental Culture in Our Community.](https://reader036.fdocuments.in/reader036/viewer/2022082817/56649da95503460f94a97513/html5/thumbnails/2.jpg)
Performance Experiments (1)throughput, response time, #IOs, CPU, wallclock, „DB time“, hit rates, space-time integrals, etc.
10
30
50
70
90
110
130
theirsours
5 10 15 20 25 30load (MPL, arrival rate, etc.)
speed (RT, CPU, etc.)
35 40
There arelies, damn lies, andworkload assumptions
![Page 3: An Experiment: How to Plan it, Run it, and Get it Published Gerhard Weikum Thoughts about the Experimental Culture in Our Community.](https://reader036.fdocuments.in/reader036/viewer/2022082817/56649da95503460f94a97513/html5/thumbnails/3.jpg)
Performance Experiments (1)throughput, response time, #IOs, CPU, wallclock, „DB time“, hit rates, space-time integrals, etc.
10
30
50
70
90
110
130
theirsours
5 10 15 20 25 30load (MPL, arrival rate, etc.)
speed (RT, CPU, etc.)
35 40
There arelies, damn lies, andworkload assumptions
![Page 4: An Experiment: How to Plan it, Run it, and Get it Published Gerhard Weikum Thoughts about the Experimental Culture in Our Community.](https://reader036.fdocuments.in/reader036/viewer/2022082817/56649da95503460f94a97513/html5/thumbnails/4.jpg)
Performance Experiments (1)throughput, response time, #IOs, CPU, wallclock, „DB time“, hit rates, space-time integrals, etc.
10
30
50
70
90
110
130
theirsours
25 30load (MPL, arrival rate, etc.)
speed (RT, CPU, etc.)
35 40
There arelies, damn lies, andworkload assumptions
Variations:- instr./message = 10- instr./DB call = 106
- latency = 0- uniform access pattern- uncorrelated access...
![Page 5: An Experiment: How to Plan it, Run it, and Get it Published Gerhard Weikum Thoughts about the Experimental Culture in Our Community.](https://reader036.fdocuments.in/reader036/viewer/2022082817/56649da95503460f94a97513/html5/thumbnails/5.jpg)
Performance Experiments (2)
051015202530
5 10 15 20 25 30 35 40
ourstheirs
If you can‘t reproduce it,run it only once
![Page 6: An Experiment: How to Plan it, Run it, and Get it Published Gerhard Weikum Thoughts about the Experimental Culture in Our Community.](https://reader036.fdocuments.in/reader036/viewer/2022082817/56649da95503460f94a97513/html5/thumbnails/6.jpg)
Performance Experiments (2)
051015202530
5 15 25 35
ourstheirs
051015202530
5 15 25 35
ourstheirs
If you can‘t reproduce it,run it only onceand smoothe it
![Page 7: An Experiment: How to Plan it, Run it, and Get it Published Gerhard Weikum Thoughts about the Experimental Culture in Our Community.](https://reader036.fdocuments.in/reader036/viewer/2022082817/56649da95503460f94a97513/html5/thumbnails/7.jpg)
051015202530
5 15 25 35
ours
Performance Experiments (3)
051015202530
5 15 25 35
oursstrawman
Lonesome winner:If you can‘t beat them,cheat them
90% of all algorithmsare among the best 10%
93.274% of all statisticsare made up
![Page 8: An Experiment: How to Plan it, Run it, and Get it Published Gerhard Weikum Thoughts about the Experimental Culture in Our Community.](https://reader036.fdocuments.in/reader036/viewer/2022082817/56649da95503460f94a97513/html5/thumbnails/8.jpg)
Result Quality Evaluation (1)precision, recall, accuracy, F1, P/R breakeven points,uninterpolated micro-averaged precision, etc.
* by and large systematic, but also anomalies
TREC* Web topic distillation 2003:1.5 Mio. pages (.gov domain)50 topics like „juvenile delinquency“, „legalization marijuana“, etc.
winning strategy:• weeks of corpus analysis, parameter calibration for given queries, ...• recipe for overfitting, not for insight • no consideration of DB performance (TPUT, RT) at all
Political correctness:don‘t worry, be happy
![Page 9: An Experiment: How to Plan it, Run it, and Get it Published Gerhard Weikum Thoughts about the Experimental Culture in Our Community.](https://reader036.fdocuments.in/reader036/viewer/2022082817/56649da95503460f94a97513/html5/thumbnails/9.jpg)
Result Quality Evaluation (2)
IR on non-schematic XML
There arebenchmarks, ad-hoc experiments,and rejected papers
INEX benchmark:12 000 IEEE-CS papers(ex-SGML) with >50 tagslike <sect1>, <sect2>, <sect3><par>, <caption>, etc.
if no standard benchmark no place at all for off-the-beaten-paths approaches ?
ad hoc experiment on Wikipedia encyclopedia (in XML)200 000 short but high-quality docswith >1000 tags like <person>, <event>, <location>,<history>, <physics>, <high enery physics>, <Boson>, etc.
vs.
![Page 10: An Experiment: How to Plan it, Run it, and Get it Published Gerhard Weikum Thoughts about the Experimental Culture in Our Community.](https://reader036.fdocuments.in/reader036/viewer/2022082817/56649da95503460f94a97513/html5/thumbnails/10.jpg)
Experimental Utopia
partial role models: TPC, TREC, Sigmetrics?, KDD cup? HCI, psychology, ... ?
Every experimental result is:• fully documented (e.g., data, SW public or @ notary)• reproducible by other parties (with reasonable effort)• insightful in capturing systematic or app behavior• gets (extra) credit when reconfirmed
![Page 11: An Experiment: How to Plan it, Run it, and Get it Published Gerhard Weikum Thoughts about the Experimental Culture in Our Community.](https://reader036.fdocuments.in/reader036/viewer/2022082817/56649da95503460f94a97513/html5/thumbnails/11.jpg)
Proposed Action
Critically need experimental evaluation methodologyof performance/quality tradeoffs in research on semistructured search, data integration, data quality, Deep Web, PIM, entity recognition, entity resolution, P2P, sensor networks, UIs, etc. etc.
raise awareness (e.g., through panels) educate community (e.g., curriculum) establish workshop(s), CIDR track?