Data scientist enablement dse 400 week 4 roadmap
-
Upload
dr-mohan-k-bavirisetty -
Category
Documents
-
view
261 -
download
2
Transcript of Data scientist enablement dse 400 week 4 roadmap
Data Scientist EnablementDSE 400 - Fast Track to Data Science
Week 4 Roadmap
Advanced Center of ExcellenceModern Renaissance CorporationIn Collaboration with SONO team and others
Content of this document is under Creative Commons Licence CC BY 4.0
AgendaYou can always find the latest version of this document at http://bit.ly/1g8tMKM
Week 4 OverviewDiscussions Learning PathActivities AssignmentSubmissionLooking aheadReferencesCitation
Discussions:Big Data - top blog posts from 2013. Evolving Darwin Genetic Algorithm. Optional Q&A.
Learning plan:Read R for Machine Learning by Allison Chang and Introduction to Machine Learning etc.
Activities:
Try Visualization through spreadsheets. Implement functions in R. Build a personal roadmap.
Assignment 4:Survey Paper - How Big Data is being used in your industry.
DSE 400 - Week 4 at a glance
Discussion 1: Read Top 8 Big Data Posts from December 2013. Pick a post that interest you most. Comment what you like most about it and how these insights can applied.
Discussion 2: Watch video Evolving Darwin - Genetic Algorithm and comment on it. Does it sound like a valid machine learning approach? What are its strengths and weaknesses, if any? How would you improve it?
These discussions are required. If you already have access to SONO > DSE 400, you will be required to participate in these discussions. There will also be an Optional Q&A.
Please do not create additional threads in weekly KCs.
Social Engagement on SONO - Week 4http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1004
Read R for Machine Learning by Allison Chang Read Introduction to Machine Learning by Lars Marius Garshol<Optional> Watch The Learning Problem by Prof. Abu Mostafa from Caltech ML video series.<Optional> Watch Machine Learning: The Basics by Ron Bekkerman<Optional> Watch Introduction to R for Data Mining by Joseph Rickert
<Optional> Read Top 10 Algorithms in Data Mining by Wu et. al.
Recommended Learning Plan
<Practice> Write a user-defined function in R that takes an integer N and outputs the sum of first N odd numbers. Using this function verify that the sum of first N odd integers is given by the formula N^2 (i.e N*N or N-squared).
Activities<Practice> Gather the data on 2010 Winter Olympic Medals. Visualize this data using a spreadsheet showing geographic distribution pattern of these medals. If you use Google Spreadsheet this pattern may like the adjacent picture. Later on you can repeat this exercise for 2014 Winter Olympics
<Practice> Sieve of Eratosthenes is an algorithm that describes how to generate all prime numbers between 1 and given number N, by eliminating the multiples of prime numbers. Write an R function that implements Sieve of Eratosthenes.
<Practice> Build a personal Career Advancement Roadmap. Focus on your career over 5-10 year horizon. Get an inventory of your current strengths and capabilities. Reflect on your career ambitions and add it to this roadmap. Use DSE Roadmap to enhance your capabilities to move you towards the desired goals. What other skills and competencies do need to advance yourself? Use open knowledge repositories like ocw.mit.edu to examine these additional capabilities you can assimilate.
Activities
Assignment 4 - Submission Required
Prepare a small survey (i.e. overview) paper (2-5 pages) of Big Data and its impact on your industry or area of focus. If you do not have a preferred industry or area of focus, choose either Retail or Telecom sector. Use pictures and infographics in your paper to make it readable. As an example, you may refer to The 'big data' revolution in healthcare - McKinsey & Company report. Your assignment doesn’t have to be this exhaustive. It is enough if you give an overview and make it readable for any audience. You can use blogs, newspaper articles, webinars and Linkedin forums etc. to gather material for your survey.
If you do not have access to commercial Word Processing Packages, you can use either Google Docs or OpenOffice.org or similar free or opensource package.
Submissions
Deadline Saturday, 11:59 PM your local time.
Mail Assignment 4 to <[email protected]> Submit a single PDF document showing your Big Data Survey. Use this naming convention: DSE 400 - Assignment 4 - Your Full Name for your document. No document links should be sent. Just one single PDF document, please. Please add DSE 400 > Assignment 4 in the subject line.
Week 5 Visualizations. Submit your research Data Visualization Tools - A Comparative Study
Week 6 -7 Processing large data sets. Hadoop Ecosystem. Stream Computing etc.
Week 8 Ethics, Privacy and Building Data Products.
DSE 400 - Weeks 5-8 ahead
References, Resources and Additional Reading
[MIT OCW] R for Machine Learning by Allison Chang An Introduction to Machine Learning. Hilary Mason, O’Reilly Media Inc., 2011Machine Learning, Tom Mitchell, Mc Graw-Hill Publishers, 1997Advanced Machine Learning. Hilary Mason, O’Reilly Media Inc., 2012Scaling Up Machine Learning. Bekkerman, Bilenko, and Langford, O’Reilly Publishers, 2011[MIT OCW] Prediction: Machine Learning and Statistics Stanford University Machine Learning Video CollectionCaltech Machine Learning Video Collection
Citation
R for Machine Learning by Allison Chang is recommended by MIT Course Prediction: Machine Learning and Statistics from Sloan School of Management, It is adopted in DSE 400 as per OCW guidelines.
Content that appears as is on this document only, is under Creative Commons License CC BY 4.0 This license may not necessarily apply to other material referenced here in this document.
For More Information
Week 4 discussions take place during this week on SONO DSE 400 Week 4
<Help On Demand> You may reach out to Ms. Rachel Fleming <[email protected]> if you have any difficulties with the assignments or looking for more activities.
If you have any questions or suggestions on SONO, please reach out Mr. Eric Kmeic <>
We welcome questions, thoughts and suggestions. Post these on SONO in the right forum/discussion or write to us at <[email protected]>
You can always find the latest version of this document at http://bit.ly/1g8tMKM
Fun@Work
In year 1859, Charles Darwin published On the Origin of Species which is regarded as one of the monumental works in human history. In this work, he explained that life on earth adapts to constantly changing environment by means of natural selection.
Thank You