Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data...
-
Upload
roxana-voyce -
Category
Documents
-
view
219 -
download
2
Transcript of Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data...
![Page 1: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/1.jpg)
1
Data Science for Tackling the Challenges of Big Data
Dr. Brand NiemannDirector and Senior Data Scientist/Data Journalist
Semantic Communityhttp://semanticommunity.info/
http://www.meetup.com/Federal-Big-Data-Working-Group/http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup
November 14, 2014
![Page 2: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/2.jpg)
2
Overview• Six Week MIT Online Course:
– Started November 4th and Completed November 12th.• Mined this MIT Online Course for Data Sets and Ideas:
– Found subset of the slides that contained data sets and ideas and were interesting and useful visualizations in themselves.
• Professor Karger's Lecture Slides on Visualization User Interfaces Were All About My Heroes:– Tukey, Tufte, Sneiderman, and Spotfire. (In fact it was everything leading
up to Spotfire, but Spotfire itself!)• Preserve My Work & Present Tutorial to the Federal Big Data
Working Group Meetup:– MindTouch Knowledge Base, Excel Spreadsheet Index, and Spotfire
Interactive Visualizations.
![Page 3: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/3.jpg)
3
MITProfessionalX 6.BDx Tackling the Challenges of Big Data: Course Assessment
Web Site (private)
![Page 4: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/4.jpg)
4
MITProfessionalX 6.BDx Tackling the Challenges of Big Data: Course Progress
https://mitprofessionalx.edx.org/courses/MITProfessionalX/6.BDX/2T2014/progress
![Page 5: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/5.jpg)
5
MITProfessionalX 6.BDx Tackling the Challenges of Big Data: Big Data Storage
Web Site (private)
![Page 6: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/6.jpg)
6
MITProfessionalX 6.BDx Tackling the Challenges of Big Data: Modern Databases
Web Site (private) and Script (Public)
Script
![Page 7: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/7.jpg)
7
Courseware: Big Data Storage• I was especially interested in the following since both
Professors Stonebraker and Madden presented to our Federal Big Data Working Group Meetup:– This module begins with an overview of a number of these technologies by
renowned database professor Mike Stonebraker. In his unique and ardent fashion, Mike expresses his skepticism about many new technologies, particularly Hadoop/MapReduce and NoSQL, and voices support for many new relational technologies, including column stores and main memory databases.
– After that, Professors Matei Zaharia and Samuel Madden provide a more nuanced view of the tradeoffs between the various approaches, discussing Hadoop and its derivatives, as well as NoSQL and its tradeoffs, in more detail.
– Professor Stonebraker expresses a number of strong opinions in this module. Which of them do you agree with? Which do you disagree with? Why?
3.0 Introduction to Big Data Storage and Discussion 3
![Page 8: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/8.jpg)
8
Selected Slides: Professor Sam Madden
What Is This Course Going to Cover? Other Techniques We'll Cover
![Page 9: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/9.jpg)
9
Selected Slides: Professor David Karger
Overview Interaction Strategy
![Page 10: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/10.jpg)
10
Selected Slides: Professor Daniela Rus
Case Study: Transportation in Singapore
1.1 Case Study: Transportation - PDF of Presentation slides (Rus)
![Page 11: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/11.jpg)
11
Google Search: Singapore Taxi Data
![Page 12: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/12.jpg)
12
Think Business:Why can’t I find a taxi when I really need one?
http://thinkbusiness.nus.edu/smart-finance/item/131-why-can%E2%80%99t-i-find-a-taxi-when-i-really-need-one?
Based on: Labor Supply Decisions of Singaporean Cab Drivers, May 8, 2013Newer Paper: Labor Supply Decisions of Singaporean Cab Drivers, September 2014
![Page 13: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/13.jpg)
13
Labor Supply Decisions of Singaporean Cab Drivers: Table 1: Summary Statistics by Days
http://www.ushakrisna.com/Cabdrivers.pdf
![Page 14: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/14.jpg)
14
MIT Big Data Knowledge Base: Table 1 Spreadsheet
Spreadsheet
My Note: Image PDF so had to hand build!
![Page 15: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/15.jpg)
15
Singapore Land Transport Authority:Traffic Info Service Providers
http://www.lta.gov.sg/content/ltaweb/en/industry-matters/traffic-info-service-providers.html
![Page 16: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/16.jpg)
16
Singapore Land Transport Authority:MyTransport.sg
http://www.mytransport.sg/content/mytransport/home/dataMall.html#All_Datasets
Screen Scrape
![Page 17: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/17.jpg)
17
Singapore Land Transport Authority:All Datasets Spreadsheet
Spreadsheet
![Page 18: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/18.jpg)
18
MIT Big Data Knowledge Base: MindTouch
Data Science for Tackling the Challenges of Big Data
Labor Supply Decisions of Singaporean Cab Drivers, September 2014, as a Data Science Data Publication
![Page 19: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/19.jpg)
19
MIT Big Data:Knowledge Base Spreadsheet
Spreadsheet
![Page 20: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/20.jpg)
20
MIT Big Data:Course Participant Spreadsheet
Spreadsheet
My Note: This was mapped in Spotfire after data curation (cleaning of the country names).Spotfire has built in data curation functions.
![Page 21: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/21.jpg)
21
MIT Big Data:Spotfire Cover Page
Web Player
![Page 22: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/22.jpg)
22
MIT Big Data:Student Enrollment
Web Player
![Page 23: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/23.jpg)
23
MIT Big Data:Singaporean Cab Drivers
Web Player
![Page 24: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/24.jpg)
24
New York City Open Data: Socrata
https://nycopendata.socrata.com/
![Page 25: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/25.jpg)
25
New York City Open Data:Search Results
Web Site
My Note: Could Only Find Taxi Drivers Data.
![Page 26: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/26.jpg)
26
New York City Open Data:Data Table
Web Site and Medallion_Drivers_-_Active.xlsx
Download: XLSX
![Page 27: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/27.jpg)
27
Visualizing NYC’s Open Data:Socrata Beta
https://nycopendata.socrata.com/viz
![Page 28: Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community](https://reader036.fdocuments.in/reader036/viewer/2022062712/56649c9b5503460f949595ba/html5/thumbnails/28.jpg)
28
MIT Big Data Assessment:Questions and Answers
• Big Data Collection– 2) Data science requires:
• Knowledge of statistics• Knowledge of data management• Knowledge of curation• Alloftheabove-correct
• Big Data Systems– 13) For which of the following tasks is interactive visualization most useful? (choose all
that apply)• Developingahypothesisaboutdata-correct• Formally confirming a hypothesis• Communicatingaconclusionaboutdata-correct• All of the above
• Big Data Analytics:– 13) Big Data means that there's no shortage of useful data.
• True• False-correct Story