Decision making in the era of cloud computing and big data
-
Upload
ajay-ohri -
Category
Technology
-
view
114 -
download
2
description
Transcript of Decision making in the era of cloud computing and big data
![Page 1: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/1.jpg)
AN INTRODUCTION TO BIG DATA ANALYTICS AND CLOUD COMPUTING
a talk on Decision Making in
Big Data and Cloud Computing era
May 10, 2014 (1400-1600 Hrs) in
Room no. 511, Fifth floor, Department of Management Studies,
Vishwakarma Bhawan, IIT Delhi
![Page 2: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/2.jpg)
Your speaker
Ajay Ohri
R for Business Analytics http://www.springer.com/statistics/book/978-1-4614-4342-1
![Page 3: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/3.jpg)
My requirementsWhat are the key challenges to Data Analytics? How to capture unstructured data and then process it, so that it can
be used for analysis? Which methodology can be more efficient to handle Big Data ? What are the key technologies that can help to
process Big Data?
What skill sets are required to become a Data Scientist? What are the possible key areas for research in Big Data Analytics? What level of
programming skills is required to work in this area? Which packages/algorithms are useful ? Does R support some inbuilt functionality to make efficient use of multi-core processors ?
How R can be used to do data mining in Social Network Data? Can it help HR persons to do analytics to hire right set of people (HR
Analytics) ?
How R can be used to perform Regression, Classification, Clustering, Structural Equation Modeling and Data Envelopment Analysis? Illustrate
with real life example.
How Cloud computing can help in processing or analyzing data efficiently? What are the risks associated with using Cloud-based
model?
![Page 4: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/4.jpg)
My requirements- let’s break this downWhat are the key challenges to Data Analytics? How to capture unstructured data and then process it, so that it can
be used for analysis? Which methodology can be more efficient to handle Big Data ? What are the key technologies that can help to
process Big Data?
What skill sets are required to become a Data Scientist? What are the possible key areas for research in Big Data Analytics? What level of
programming skills is required to work in this area? Which packages/algorithms are useful ? Does R support some inbuilt functionality to make efficient use of multi-core processors ?
How R can be used to do data mining in Social Network Data? Can it help HR persons to do analytics to hire right set of people (HR
Analytics) ?
How R can be used to perform Regression, Classification, Clustering, Structural Equation Modeling and Data Envelopment Analysis? Illustrate
with real life example.
How Cloud computing can help in processing or analyzing data efficiently? What are the risks associated with using Cloud-based
model?
![Page 5: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/5.jpg)
My requirements- let’s sort this upWhat are the key challenges to Data Analytics? How to capture unstructured data and then process it, so that it can
be used for analysis?
How Cloud computing can help in processing or analyzing data efficiently? What are the risks associated with using Cloud-based
model?
Which methodology can be more efficient to handle Big Data ? What are the key technologies that can help to
process Big Data?
What skill sets are required to become a Data Scientist? What are the possible key areas for research in Big Data Analytics? What level of
programming skills is required to work in this area? Can it help HR persons to do analytics to hire right set of people (HR
Analytics) ?
How R can be used to do data mining in Social Network Data?
How R can be used to perform Regression, Classification, Clustering, Structural Equation Modeling and Data Envelopment Analysis? Illustrate
with real life example. Which packages/algorithms are useful ? Does R support some inbuilt functionality to make efficient use of multi-core processors ?
![Page 6: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/6.jpg)
My requirements- let’s tag this downWhat are the key challenges to Data Analytics? How to capture unstructured data and then process it, so that it can
be used for analysis?
How Cloud computing can help in processing or analyzing data efficiently? What are the risks associated with using Cloud-based
model?
Which methodology can be more efficient to handle Big Data ? What are the key technologies that can help to
process Big Data?
What skill sets are required to become a Data Scientist? What are the possible key areas for research in Big Data Analytics? What level of
programming skills is required to work in this area? Can it help HR persons to do analytics to hire right set of people (HR
Analytics) ?
How R can be used to do data mining in Social Network Data?
How R can be used to perform Regression, Classification, Clustering, Structural Equation Modeling and Data Envelopment Analysis? Illustrate
with real life example. Which packages/algorithms are useful ? Does R support some inbuilt functionality to make efficient use of multi-core processors ?
Data Analytics and Cloud Computing
Big Data
R
R (Data Science Careers)
![Page 7: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/7.jpg)
My requirements- let’s check this againWhat are the key challenges to Data Analytics? How to capture unstructured data and then process it, so that it can
be used for analysis?
How Cloud computing can help in processing or analyzing data efficiently? What are the risks associated with using Cloud-based
model?
Which methodology can be more efficient to handle Big Data ? What are the key technologies that can help to
process Big Data?
What skill sets are required to become a Data Scientist? What are the possible key areas for research in Big Data Analytics? What level of
programming skills is required to work in this area? Can it help HR persons to do analytics to hire right set of people (HR
Analytics) ?
How R can be used to do data mining in Social Network Data?
How R can be used to perform Regression, Classification, Clustering, Structural Equation Modeling and Data Envelopment Analysis? Illustrate
with real life example. Which packages/algorithms are useful ? Does R support some inbuilt functionality to make efficient use of multi-core processors ?
Data Analytics and Cloud Computing
Big Data
R
R (Data Science Careers) Incorrect Classification?
![Page 8: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/8.jpg)
Topics to be covered
Business AnalyticsData ScienceBig DataCloud ComputingR
![Page 9: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/9.jpg)
Sub- topics to be covered
Business Analytics -methodologies, challenges,structured /unstructured data
Data ScienceBig DataCloud ComputingR
![Page 10: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/10.jpg)
Sub- topics to be covered
Business Analytics -methodologies, challenges,structured /unstructured data,HR analytics
Data Science -careers, skills
Big Data - Technology, skills
Cloud ComputingR
![Page 11: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/11.jpg)
Sub- topics to be covered
Business Analytics -methodologies, challenges,structured /unstructured data,HR analytics
Data Science -careers, skills
Big Data - Technology, skills
Cloud Computing -technology,risks
R-
![Page 12: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/12.jpg)
Sub- topics to be covered
Business Analytics -methodologies, challenges,structured /unstructured data,HR analytics
Data Science -careers, skills
Big Data - Technology, skills
Cloud Computing -technology,risks
R- ???
![Page 13: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/13.jpg)
Sub- topics that won’t be covered
R- Data Envelopment Analysis (http://professorjf.webs.com/DEA%202013.pdf )
http://www.uri.edu/artsci/ecn/burkett/DEAnotes.pdf
Structural Equation Modeling ( http://socserv.socsci.mcmaster.ca/jfox/Misc/sem/SEM-paper.pdf )
http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-sems.pdf
and if time permits
HR Analytics
http://www.slideshare.net/ajayohri/hr-analytics-34080636
![Page 14: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/14.jpg)
Business AnalyticsDefinitionBusiness analytics (BA) refers to the field of exploration and investigation of data generated by businesses.
Business Intelligence (BI) is the seamless dissemination of information through the organization, which primarily involves business metrics both past and current for the use of decision support in businesses.
Data Mining (DM) is the process of discovering new patterns from large data using algorithms and statistical methods.
To differentiate between the three, BI is mostly current reports, BA is models to predict and strategize and DM matches patterns in big data.
![Page 15: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/15.jpg)
Business Analytics
Definition
-Information Ladder-CRISP DM-KDD-SEMMA
![Page 16: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/16.jpg)
Business Analytics
-Information Ladder Data → Information → Knowledge → Understanding → Insight → Wisdom
Whereas the first two steps can be scientifically exactly defined, the upper parts belong to the domain of psychology and
philosophy.
Also DIKW
![Page 17: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/17.jpg)
CRISP DM
![Page 18: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/18.jpg)
KDD
![Page 19: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/19.jpg)
SEMMA
![Page 21: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/21.jpg)
Data Sciencehttps://s3.amazonaws.com/aws.drewconway.com/viz/venn_diagram/data_science.html
![Page 22: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/22.jpg)
What is a Data Scientista data scientist is simply a person who can
write code understand statistics derive insights from data
![Page 23: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/23.jpg)
Oh really, is this a Data Scientist ?a data scientist is simply a person who can write code = in R,Python,Java, SQL, Hadoop (Pig,HQL,MR) etc
= for data storage, querying, summarization, visualization
= how efficiently, and in time (fast results?)
= where on databases, on cloud, servers
and understand enough statistics
to derive insights from data so business can make decisions
![Page 24: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/24.jpg)
Some tools
Linux+Java /Python/Pig+R+SQL
![Page 25: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/25.jpg)
Cheat Sheets for Data Scientistshttp://www.slideshare.net/ajayohri/cheat-sheets-for-data-scientists
![Page 26: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/26.jpg)
Data Scientist Programming Skills
Java http://www.learnjavaonline.org/
Python http://www.codecademy.com/tracks/python
SQL http://www.w3schools.com/sql/
R http://bigdatauniversity.com/bdu-wp/bdu-course/introduction-to-data-analysis-using-r/
http://www.statmethods.net/
Hadoop http://hortonworks.com/hadoop-training/
Linux https://github.com/WilliamHackmore/linuxgems/blob/master/cheat_sheet.org.sh
![Page 27: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/27.jpg)
Other place to learn
MOOCs 1 https://www.edx.org/ 2 https://www.coursera.org/ 3 https://www.udacity.com/ 4 https://www.udemy.com/
Books
Courses
Workshops
![Page 28: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/28.jpg)
Big Data
![Page 29: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/29.jpg)
Statistics on Facebook https://newsroom.fb.com/company-info/
● 802 million daily active users on average in March 2014
● 609 million mobile daily active users on average in March 2014
● 1.28 billion monthly active users as of March 31, 2014
● 1.01 billion mobile monthly active users as of March 31, 2014
![Page 30: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/30.jpg)
Statistics on Twitterhttps://about.twitter.com/company
● 255 million monthly active users● 500 million Tweets are sent per day● 78% of Twitter active users are on mobile● 77% of accounts are outside the U.S.
![Page 31: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/31.jpg)
Big Data
is changing marketingis changing marketing modelsis much more quantitative compared to earlier marketing
![Page 32: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/32.jpg)
Hadoop - infrastructure for Big Data http://hadoop.apache.org/
![Page 33: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/33.jpg)
Hadoop- evolving ecosystem
![Page 34: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/34.jpg)
Hadoop- evolving ecosystem
![Page 35: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/35.jpg)
Hadoop- evolving ecosystem
![Page 36: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/36.jpg)
Hadoop- evolving ecosystem
![Page 37: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/37.jpg)
Cloud Computing -HW to the SWhttp://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
http://www.silverlighthack.com/post/2011/02/27/iaas-paas-and-saas-terms-explained-and-defined.aspx
![Page 38: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/38.jpg)
Cloud Computinghttp://www.silverlighthack.com/post/2011/02/27/iaas-paas-and-saas-terms-explained-and-defined.aspx
![Page 39: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/39.jpg)
Cloud Computing-Google https://cloud.google.com/products/
Compute Engine is Google’s Infrastructure-as-a-Service (IaaS).
App Engine is Google’s Platform-as-a-Service (PaaS).
Storage
Cloud SQL -a fully-managed, relational MySQL database.
Cloud Storage -a simple API that allows you to manage your data programmatically
Cloud Datastore provides a managed, NoSQL, schemaless database for storing non-relational data
Big DataBigQuery. Run fast, SQL-like queries against multi-terabyte datasets in seconds
https://github.com/GoogleCloudPlatform
![Page 40: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/40.jpg)
Cloud Computing-Google
![Page 41: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/41.jpg)
Cloud Computing-Amazonhttp://aws.amazon.com/products/
![Page 42: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/42.jpg)
More on Cloud Computing
Challenges and Opportunities for India (from http://chennai.vit.ac.in/isbcc/)http://www.slideshare.net/ajayohri/data-analytics-using-the-cloud-challenges-and-opportunities-for-india
Big Data Big Analytics (http://krishnarajpm.com/bigdata/abstract.pdf Workshop on Statistical Machine Learning and Game Theory Approaches for Large Scale Data Analysis)
http://www.slideshare.net/ajayohri/big-data-big-analytics
![Page 43: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/43.jpg)
Rhttp://www.r-project.org/
Open Source
Free
5000+ Packages
Growing Faster
>2 million users
RAM constraints??
![Page 44: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/44.jpg)
Rhttp://www.r-project.org/
Object Oriented
has GUI and IDE
has Commercial offerings
![Page 45: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/45.jpg)
Rhttp://www.r-project.org/
Object Oriented
has GUI and IDE
has Commercial offerings
![Page 46: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/46.jpg)
R - Rattle- Data Mining GUIhttp://www.r-project.org/
Object Oriented
has GUI and IDE
has Commercial offerings
![Page 47: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/47.jpg)
R - R Commanderhttp://www.r-project.org/
Object Oriented
has GUI and IDE
has Commercial offerings
![Page 48: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/48.jpg)
R -R Studio
![Page 49: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/49.jpg)
R -Revolution Analytics Free for Academics
World Wide !!
RevoScaleR package
for Big Data
Recommended Install -
http://info.revolutionanalytics.com/free-academic.html
![Page 50: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/50.jpg)
R -Revolution Analytics Free for Academics
World Wide !!
RevoScaleR package
for Big Data
![Page 51: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/51.jpg)
My favorite places to learn Rhttp://www.swirlstats.com/
![Page 52: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/52.jpg)
My favorite places to learn Rhttp://www.twotorials.com/
![Page 53: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/53.jpg)
My favorite places to learn Rhttp://tryr.codeschool.com/
![Page 54: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/54.jpg)
My favorite places to learn Rhttps://www.coursera.org/course/rprog
also see http://blog.datacamp.com/complete-list-of-coursera-courses-using-r-ranked-by-popularity/
![Page 55: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/55.jpg)
R Case Study
Who are my Facebook friends?
Step 1
http://thinktostart.wordpress.com/2013/11/19/analyzing-facebook-with-r/
Step 2
https://gist.github.com/decisionstats/f18126aea544be324169
![Page 56: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/56.jpg)
Case Study
my FB friends?
Step 1
http://thinktostart.wordpress.com/2013/11/19/analyzing-facebook-with-r/
Step 2
https://gist.github.com/decisionstats/f18126aea544be324169
![Page 57: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/57.jpg)
Twitter Analysishttp://www.slideshare.net/ajayohri/twitter-analysis-by-kaify-rais
http://www.rdatamining.com/examples/social-network-analysis
![Page 58: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/58.jpg)
Big Data Social Network AnalysisAnalyzing A Big Social Network using R and distributed graph engineshttp://thinkaurelius.com/2012/02/05/graph-degree-distributions-using-r-over-hadoop/
![Page 59: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/59.jpg)
Big Data Social Media Analysis
Can be used for Customers (and also for latent influencers) -http://www.r-
bloggers.com/an-example-of-social-network-analysis-with-r-using-package-igraph/
![Page 60: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/60.jpg)
Big Data Social Media Analysis
R package twitteR http://cran.r-project.org/web/packages/twitteR/index.html can be used for prototyping but Twitter's API is rate limited to 1500 per hour(?)/day, so we can use Datasift API http://datasift.com/pricing#costs
![Page 61: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/61.jpg)
Big Data Social Media Analysis
How does information propagate through a social network?http://www.r-bloggers.com/information-transmission-in-a-social-network-dissecting-the-spread-of-a-quora-post/
![Page 62: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/62.jpg)
Big Data Social Network AnalysisCan be used for Terrorists (and also for potential protestors ) -Drew Conway http://riskecon.com/wp-content/uploads/2012/02/Conway-Socio_Terrorism.pdf
Primary focus is one three aspects of network analysis
1. Identifying leadership and key actors
2. Revealing underlying structure and intra-network community structure
3. Evolution and decay of social networks
![Page 63: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/63.jpg)
R -Big Data Packages http://cran.r-project.org/web/views/HighPerformanceComputing.html
● The RHIPE package, started by Saptarshi Guha and now developed by a core team via GitHub, provides an interface between R and Hadoop for analysis of large complex data wholly from within R using the Divide and Recombine approach to big data. ( link )
● The rmr package by Revolution Analytics also provides an interface between R and Hadoop for a Map/Reduce programming framework. ( link )
● A related package, segue package by Long, permits easy execution of embarassingly parallel task on Elastic Map Reduce (EMR) at Amazon. ( link )
● The RProtoBuf package provides an interface to Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data. This package can be used in R code to read data streams from other systems in a distributed MapReduce setting where data is serialized and passed back and forth between tasks.
● The HistogramTools package provides a number of routines useful for the construction, aggregation, manipulation, and plotting of large numbers of Histograms such as those created by Mappers in a MapReduce application.
![Page 64: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/64.jpg)
R -Hadoop Packages https://github.com/RevolutionAnalytics/RHadoop/wiki
● plyrmr - higher level plyr-like data processing for structured data, powered by rmr
● rmr - functions providing Hadoop MapReduce functionality in R
● rhdfs - functions providing file management of the HDFS from within R
● rhbase - functions providing database management for the HBase distributed database from within R
http://amplab-extras.github.io/SparkR-pkg/
SparkR is an R package that provides a light-weight frontend to use Apache Spark from R.
https://github.com/nexr/RHive
RHive is an R extension facilitating distributed computing via HIVE query. RHive allows easy usage of HQL(Hive SQL) in R, and
allows easy usage of R objects and R functions in Hive.
![Page 65: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/65.jpg)
R - Cloud Computinghttp://cran.r-project.org/web/views/WebTechnologies.html
![Page 66: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/66.jpg)
R -Big Data Packages http://cran.r-project.org/web/views/HighPerformanceComputing.htmlLarge memory and out-of-memory data
● The biglm package by Lumley uses incremental computations to offer lm() and glm() functionality to data sets stored outside of R's main memory.
● The ff package by Adler et al. offers file-based access to data sets that are too large to be loaded into memory, along with a number of higher-level functions.
● The bigmemory package by Kane and Emerson permits storing large objects such as matrices in memory (as well as via files) and uses external pointer objects to refer to them. .
● A large number of database packages, and database-alike packages (such as sqldf by Grothendieck and data.table ● The HadoopStreaming package provides a framework for writing map/reduce scripts for use in Hadoop Streaming; it also
facilitates operating on data in a streaming fashion which does not require Hadoop.● The speedglm package permits to fit (generalised) linear models to large data. ● The biglars package by Seligman et al can use the ff to support large-than-memory datasets for least-angle regression,
lasso and stepwise regression.● The bigrf package provides a Random Forests implementation with support for parellel execution and large memory.● The MonetDB.R package allows R to access the MonetDB column-oriented, open source database system as a backend.
![Page 69: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/69.jpg)
C’est la vie
IN INDUSTRY - a R expert is one who knows which package to use from
IN RESEARCH- a R expert is one who creates a new popular and improved package
![Page 70: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/70.jpg)
CRAN Views help expertshttp://cran.r-project.org/web/views/
![Page 71: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/71.jpg)
SAP with RDeparture of Aeroplanes-SAP Hana 200m http://allthingsr.blogspot.in/#!/2012/04/big-data-r-and-hana-analyze-200-million.html
R using SAP Hana
http://www.decisionstats.com/interview-blag-sap-labs-montreal-using-sap-hana-with-rstats/
![Page 72: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/72.jpg)
SAP Hana DB uses Rhttp://scn.sap.com/community/in-memory-business-data-management/blog/2011/11/28/dealing-with-r-and-hana
![Page 73: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/73.jpg)
Oracle R EnterpriseCase Studies and Exampleshttp://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise/index.html
![Page 74: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/74.jpg)
Oracle R EnterpriseCase Studies and Exampleshttp://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise/index.html
![Page 75: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/75.jpg)
Additional
http://www.slideshare.net/ajayohri/open-source-analytics
Open Source in Analytics (OSSCamp 2014) http://osscamp.in/
http://osscamp.in/events/6/open-source-analytics-overview-r-python-and-others
![Page 76: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/76.jpg)
How does this affect decision making
Lots of Data IT is not a support function
Analytical Organizations with cross functional domains
and
Employees as first line of analysis
is education and research keeping up?
![Page 77: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/77.jpg)
Lets do a revision
Requirements and People a=NULL
a$req=c("Met","Unmet")
a$counts=c(50,50)
a=as.data.frame(a)
a
pie(a$counts,label=a$req)
library(RColorBrewer)
p=NULL
p$req=c("Satisfied","Unsatisfied","Busy Sleeping")
p$counts=c(50,40,10)
p=as.data.frame(p)
pie(p$counts,label=p$req,col=brewer.pal(3, "Set1"))
![Page 78: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/78.jpg)
Thanks for listening
Contact - [email protected]
LinkedIN -http://linkedin.com/in/ajayohri
Questions please?
![Page 79: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/79.jpg)
One more thing
a movie on a murdered IIM batchmate of mine fighting against corruption just released yesterday
http://www.imdb.com/title/tt3056632/
![Page 80: Decision making in the era of cloud computing and big data](https://reader034.fdocuments.in/reader034/viewer/2022052307/54c631bc4a7959991a8b4621/html5/thumbnails/80.jpg)
Dedicated to