Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger...
Transcript of Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger...
![Page 1: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/1.jpg)
Marko Grobelnik
[email protected] Stefan Institute, Slovenia
Brdo, Nov 10th 2015
Big Data Tutorial: http://www.slideshare.net/markogrobelnik/big-datatutorial-grobelnikfortunamladenicsydneyiswc2013
![Page 2: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/2.jpg)
![Page 3: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/3.jpg)
‘Big-data’ is similar to ‘Small-data’, but bigger◦ Recently getting popular expression “Midsize data”
…but having data bigger it requires somewhat different approaches:◦ techniques, tools, architectures
…with an aim to solve new problems◦ …or old problems in a better way.
![Page 4: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/4.jpg)
Volume –challenging to load and process (how to index, retrieve)
Variety – different data types and degree of structure (how to query semi-structured data)
Velocity – real-time processing influenced by rate of data arrival
From “Understanding Big Data” by IBM
![Page 5: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/5.jpg)
1. Volume (lots of data = “Tonnabytes”) 2. Variety (complexity, curse of
dimensionality) 3. Velocity (rate of data and information flow)
4. Veracity (verifying inference-based models from comprehensive data collections)
5. Venue (location) 6. Vocabulary (semantics) 7., 8., 9. …: V…, V…, V…
![Page 6: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/6.jpg)
![Page 7: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/7.jpg)
Comparing volume of “big data” and “data mining” queries
http://www.google.com/trends/explore#q=big%20data%2C%20data%20mining
![Page 8: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/8.jpg)
…adding “web 2.0” to “big data” and “data mining” queries volume
http://www.google.com/trends/explore#q=big%20data%2C%20data%20mining%2C%20web%202.0
![Page 9: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/9.jpg)
Big-Data
![Page 10: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/10.jpg)
Big-Data
![Page 11: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/11.jpg)
Big-DataData-Science
![Page 12: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/12.jpg)
![Page 13: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/13.jpg)
![Page 14: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/14.jpg)
![Page 15: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/15.jpg)
![Page 16: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/16.jpg)
![Page 17: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/17.jpg)
![Page 18: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/18.jpg)
Source: WikiBon report on “Big Data Vendor Revenue and Market Forecast 2012-2017”, 2013
![Page 19: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/19.jpg)
![Page 20: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/20.jpg)
Where processing is hosted?◦ Distributed Servers / Cloud (e.g. Amazon EC2)
Where data is stored?◦ Distributed Storage (e.g. Amazon S3)
What is the programming model?◦ Distributed Processing (e.g. MapReduce)
How data is stored & indexed?◦ High-performance schema-free databases (e.g.
MongoDB)
What operations are performed on data?◦ Analytic / Semantic Processing / Visualization
![Page 21: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/21.jpg)
![Page 22: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/22.jpg)
![Page 23: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/23.jpg)
An excellent overview of the “Big Data” algorithms is the book “Leskovec, Rajaraman, Ullman: Mining of Massive Datasets”◦ Downloadable from: http://www.mmds.org/
◦ Associated MOOC (from Oct 2014): https://www.coursera.org/course/mmds
![Page 24: Marko Grobelnik Marko.Grobelnik@ijs · Big-data’ is similar to ‘Small-data’, but bigger Recently getting popular expression “Midsize data” …but having data bigger it requires](https://reader034.fdocuments.in/reader034/viewer/2022042222/5ec81b4320c2e94c905e947b/html5/thumbnails/24.jpg)
Big-Data is everywhere, we are just not used to deal with it
The “Big-Data” hype is very recent◦ …growth seems to be going up◦ …evident lack of experts to build Big-Data apps
Can we do “Big-Data” without big investment?◦ …yes – many open source tools, computing machinery is
cheap (to buy or to rent)◦ …the key is knowledge on how to deal with data◦ …data is either free (e.g. Wikipedia) or to buy (e.g.
twitter)