Data Mining Disasters
description
Transcript of Data Mining Disasters
![Page 1: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/1.jpg)
Data Mining Disasters
A Report
Mary McGlohonSIGBOVIK Commission for Workplace Safety
![Page 2: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/2.jpg)
Data Mining Safety
•Data mining disasters are a hazard to the progress of scientific research.
•We will review some common mining disasters and make recommendations for prevention
![Page 3: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/3.jpg)
Numeric Overflow
In 2007, numeric floods were responsible for over $600 million in property
damages.-Department of Made-Up Statistics
““’’’’
![Page 4: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/4.jpg)
Numeric Overflow
ERROR::NUMERICOVERFLOW Nobody expected the breach of the levees
![Page 5: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/5.jpg)
Numeric Overflow
•Also caused loss of several hundred nerd-hours.
•1 nerd-hour = 1 grad-student-hour = 0.25 faculty-hours = 6 undergrad-hours
![Page 6: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/6.jpg)
Numeric Overflow
•Recommendation: A drowning researcher’s best bet is to grab onto a floating log.
![Page 7: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/7.jpg)
Power Law Failures
•Occurs when confusing heavy-tailed distributions such as:
• Power Law (incl. Pareto, Zipf)
• Lognormal
• Weibull
• Burr
• Log-gamma
• Log-Log-Log-Log-Mushroom-Mushroom
![Page 8: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/8.jpg)
Power Law Failures
•Many natural phenomena have heavy tails.
• Magnitude of earthquakes
• Size of human settlements
• Degree distribution of “real” graphs
• Time-to-response in CS professors email
• Your mom
•However, confusing heavy-tailed distributions confused results in...
![Page 9: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/9.jpg)
![Page 10: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/10.jpg)
Power Law Failures
•Related danger: Statisticians, computer scientists, and physicists wasting valuable nerd-hours in religious arguments over which heavy-tailed distribution is being followed.
![Page 11: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/11.jpg)
Power Law Failures
•Statisticians get mean when they get religious. (SIGBOVIK07)
•Recommendation: Calm the hell down.
![Page 12: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/12.jpg)
Decision Tree Forest Fires
•Pruning is used to prevent overfitting.
•When overpruning occurs, trees are burned to stumps.
•This spreads, torching entire forests.
(Aww...)
![Page 13: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/13.jpg)
Decision Tree Forest Fires•Recommendation:
Researchers should obtain burning permit before pruning with fire.
•Smoking while researching is not recommended-- if you choose to do so, make sure your “butts are out”.
![Page 14: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/14.jpg)
Voting Fraud by One-Armed Bandits
•Cascading failures from other fields may cause disasters in data mining.
•Fatal mistake: combining related subfields voting mechanisms and one-armed bandit problems.
![Page 15: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/15.jpg)
Voting Fraud by One-Armed Bandits
•One-armed bandits commit voting fraud by:
• Impersonating real voting machines.
• Cramming cake into voting machines.
• (The cake is a lie.)
![Page 16: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/16.jpg)
Other safety measures
•Cool mining helmets
![Page 17: Data Mining Disasters](https://reader035.fdocuments.in/reader035/viewer/2022062410/56815a7d550346895dc7e5c8/html5/thumbnails/17.jpg)
Conclusion
•The Commission for Workplace Safety hopes this has raised awareness of potential data mining disasters.
•When faced with data-mining disasters,
• Remain Calm.
• Blame it on one-off errors, lack of rigor in proofs of correctness, or whatever government agency is funding the project.