Prezentare: Big Data demistificat
-
Upload
altbrasov -
Category
Engineering
-
view
88 -
download
3
description
Transcript of Prezentare: Big Data demistificat
![Page 1: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/1.jpg)
![Page 2: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/2.jpg)
2
N o v 2 0 1 4 – B i g d a t a 2 Of 53
What is Big Data ?
* Data so large and complex that it becomes difficult to process with traditional systems
* First time coined in 1997, NASA report
* Petabytes and Exabytes of data
![Page 3: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/3.jpg)
3
N o v 2 0 1 4 – B i g d a t a 3 Of 53
Big data is everywhere
* Every 2 days we create as much information as we did from the beginning of time until 2003
* Google processes over 40 thousand search queries per second, making it over 3.5 billion in a single day.
* Around 100 hours of video are uploaded to YouTube every minute and it would take you around 15 years to watch every video uploaded by users in one day
* Every minute we send 204 million emails, generate 1,8 million Facebook likes, send 278 thousand Tweets, and upload 200,000 photos to Facebook
* Trillions of sensors monitor, track, communicate with each other , populating the IoT with realtime data
![Page 4: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/4.jpg)
4
N o v 2 0 1 4 – B i g d a t a 4 Of 53
Big data is not new
![Page 5: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/5.jpg)
5
N o v 2 0 1 4 – B i g d a t a 5 Of 53
Characteristics
![Page 6: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/6.jpg)
6
N o v 2 0 1 4 – B i g d a t a 6 Of 53
Volume
* More data beats == better model* Scalable storage, and distributed approach to querying
![Page 7: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/7.jpg)
7
N o v 2 0 1 4 – B i g d a t a 7 Of 53
Variety
* Big data includes all data* Data no longer fits into neatly structured tables
![Page 8: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/8.jpg)
8
N o v 2 0 1 4 – B i g d a t a 8 Of 53
Velocity
* Frequency at which data is generated, captured , stored and processed* Need for real-time processing
![Page 9: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/9.jpg)
9
N o v 2 0 1 4 – B i g d a t a 9 Of 53
Data sources
![Page 10: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/10.jpg)
10
N o v 2 0 1 4 – B i g d a t a 10 Of 53
Importance of Big Data
* Media* Retailing* Public service* Health* Industry
![Page 11: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/11.jpg)
11
N o v 2 0 1 4 – B i g d a t a 11 Of 53
Importance of Big Data
* Gaining a more complete understanding of business
customers productscompetitors
* Which can lead to efficiency improvements
increased saleslower costsbetter customer serviceimproved products
![Page 12: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/12.jpg)
12
N o v 2 0 1 4 – B i g d a t a 12 Of 53
The problem
* Overall information available10% structured data
used in decision making90% unstructured data
wasted, not captured or analyzed
* Valuable information VS data which is best left ignored
* 37.5% of large organizations said that analyzing big data is their biggest challenge
* More that 90% said that Big Data is a top ten priority
![Page 13: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/13.jpg)
13
N o v 2 0 1 4 – B i g d a t a 13 Of 53
It’s not the only the size
* Collect -> Analyze -> Understand -> Generate Value
* Find a meaning* Find interconnexions* Find hidden data
![Page 14: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/14.jpg)
14
N o v 2 0 1 4 – B i g d a t a 14 Of 53
Purpose
* Take more precise actions that brings value and reduce costs * Make the right decision within the right amount of time
![Page 15: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/15.jpg)
15
N o v 2 0 1 4 – B i g d a t a 15 Of 53
How big will big data get?
* 3.2 zettabytes today to 40 zettabytes in only six years. * More than 30 billion devices will be wirelessly connected by 2020.
![Page 16: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/16.jpg)
16
N o v 2 0 1 4 – B i g d a t a 16 Of 53
Challenges
* Storing data* Analysis* Search* Sharing * Transfer * Visualization
![Page 17: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/17.jpg)
17
N o v 2 0 1 4 – B i g d a t a 17 Of 53
NoSQL and Big Data Analytics
* Storing data* Distribution* Processing
![Page 18: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/18.jpg)
18
N o v 2 0 1 4 – B i g d a t a 18 Of 53
NoSQL
* Scalability/ cluster friendly* Availability/ fault tolerance* Schema-less* Low latency* High performance* Open-source
![Page 19: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/19.jpg)
19
N o v 2 0 1 4 – B i g d a t a 19 Of 53
Dynamic scaling
* adding/removing nodes dynamically
→ storage/performance capacity can grow or shrink as needed
![Page 20: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/20.jpg)
20
N o v 2 0 1 4 – B i g d a t a 20 Of 53
Auto-sharding
* Natively and automatically spread data across servers* Data and query load automatically balanced across servers
![Page 21: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/21.jpg)
21
N o v 2 0 1 4 – B i g d a t a 21 Of 53
Replication
* Support automatic replication → high availability → disaster recovery → no need for separate applications to manage these tasks
![Page 22: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/22.jpg)
22
N o v 2 0 1 4 – B i g d a t a 22 Of 53
Schemaless
* No predefined schema* Insertion of aggregates → puts together data that is commonly accessed together
![Page 23: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/23.jpg)
23
N o v 2 0 1 4 – B i g d a t a 23 Of 53
NoSQL vanillas
![Page 24: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/24.jpg)
24
N o v 2 0 1 4 – B i g d a t a 24 Of 53
NoSQL vanillas
* Key-value store→ Amazon DynamoDB, Redis→ Content caching (focus on scaling to huge amounts of data, designed to handle
massive load), logging, etc
* Document store → CouchDB, MongoDb→ Web applications
* Column family store → Cassandra, HBase→ Distributed file systems
* Graph store → Neo4J, InfoGrid, Infinite Graph→ Social networking, Recommendations (Focus on modeling the structure of data –
interconnectivity)
![Page 25: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/25.jpg)
25
N o v 2 0 1 4 – B i g d a t a 25 Of 53
Reasons for choosing NoSQL
* Working on large amount of data
* Scaling out with ease
* Need of: → high-availability → low-latency systems with eventual consistency
* Model fits aggregate: → as a natural choice → structure is changing with time
![Page 26: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/26.jpg)
26
N o v 2 0 1 4 – B i g d a t a 26 Of 53
… and associates
![Page 27: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/27.jpg)
27
N o v 2 0 1 4 – B i g d a t a 27 Of 53
What is hadoop?
● Distributed file system
● Distributed processing system
● Batch / offline oriented
● Open source
![Page 28: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/28.jpg)
28
N o v 2 0 1 4 – B i g d a t a 28 Of 53
In the beginning...
● Created by Doug Cutting and Mike Cafarella
● Inteded as a distribution support for
● Built based on Google's MapReduce and Google File System● papers
![Page 29: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/29.jpg)
29
N o v 2 0 1 4 – B i g d a t a 29 Of 53
Who uses Hadoop?
Most notable users are …
+ many others
![Page 30: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/30.jpg)
30
N o v 2 0 1 4 – B i g d a t a 30 Of 53
Hadoop in the real world
● Recommandation system● Data warehousing● Financial analysis● Market research/forecasting● Log analysis● Threat analysis● Image processing● Social networking● Advertising
![Page 31: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/31.jpg)
31
N o v 2 0 1 4 – B i g d a t a 31 Of 53
Why Hadoop?
● Scalable
● Cost effective
● Flexible
● Efficient
● Resilient to failure
● Schema on read
![Page 32: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/32.jpg)
32
N o v 2 0 1 4 – B i g d a t a 32 Of 53
Why not Hadoop?
● Inefficient when used at small scale● Not good for real time systems
![Page 33: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/33.jpg)
33
N o v 2 0 1 4 – B i g d a t a 33 Of 53
Hadoop major components
● Hadoop commons● YARN● HDFS● Map/Reduce
![Page 34: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/34.jpg)
34
N o v 2 0 1 4 – B i g d a t a 34 Of 53
Arhitecture
![Page 35: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/35.jpg)
35
N o v 2 0 1 4 – B i g d a t a 35 Of 53
Arhitecture
![Page 36: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/36.jpg)
36
N o v 2 0 1 4 – B i g d a t a 36 Of 53
Arhitecture
![Page 37: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/37.jpg)
37
N o v 2 0 1 4 – B i g d a t a 37 Of 53
Arhitecture
![Page 38: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/38.jpg)
38
N o v 2 0 1 4 – B i g d a t a 38 Of 53
Arhitecture
![Page 39: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/39.jpg)
39
N o v 2 0 1 4 – B i g d a t a 39 Of 53
MapReduce
● Split input files● Operate on key/value ● Mappers filter & transform input data
● Reducers aggregate mappers output
● Move code to data
![Page 40: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/40.jpg)
40
N o v 2 0 1 4 – B i g d a t a 40 Of 53
![Page 41: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/41.jpg)
41
N o v 2 0 1 4 – B i g d a t a 41 Of 53
… and associates
![Page 42: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/42.jpg)
42
N o v 2 0 1 4 – B i g d a t a 42 Of 53
Apache Ambari
The project is aimed at making Hadoop management simpler by developing software for provisioning, managing,
and monitoring Apache Hadoop clusters
![Page 43: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/43.jpg)
43
N o v 2 0 1 4 – B i g d a t a 43 Of 53
Apache Pig
Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure
for evaluating these programs
![Page 44: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/44.jpg)
44
N o v 2 0 1 4 – B i g d a t a 44 Of 53
Apache Hive
The Apache Hive ™ data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism
to project structure onto this data and query the data using a SQL-like language called HiveQL
![Page 45: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/45.jpg)
45
N o v 2 0 1 4 – B i g d a t a 45 Of 53
Apache Chukwa
It is a data collection system for monitoring large distributed systems. Chukwa comes with a flexible and powerful toolkit for displaying, monitoring and analyzing
results to make the best use of the collected data.
![Page 46: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/46.jpg)
46
N o v 2 0 1 4 – B i g d a t a 46 Of 53
Apache Avro
A remote procedure call and data serialization framework
![Page 47: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/47.jpg)
47
N o v 2 0 1 4 – B i g d a t a 47 Of 53
Apache Hbase
Apache Hbase offers random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables
-- billions of rows X millions of columns -- atop clusters of commodity hardware
![Page 48: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/48.jpg)
48
N o v 2 0 1 4 – B i g d a t a 48 Of 53
Apache Mahout
The Apache Mahout™ project's goal is to build a scalable machine learning library
![Page 49: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/49.jpg)
49
N o v 2 0 1 4 – B i g d a t a 49 Of 53
Apache Spark
Apache Spark™ is a fast and general engine for large-scale data processing
![Page 50: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/50.jpg)
50
N o v 2 0 1 4 – B i g d a t a 50 Of 53
Apache Zookeeper
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
![Page 51: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/51.jpg)
51
N o v 2 0 1 4 – B i g d a t a 51 Of 53
Big data – in the future
● 87% of enterprises believe Big Data analytics will redefine the competitive landscape of their industries within the next three years
● 89% believe that companies that do not adopt a Big Data analytics strategy in the next year risk losing market share and momentum.
![Page 52: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/52.jpg)
52
N o v 2 0 1 4 – B i g d a t a 52 Of 53
Big data – in the future
![Page 53: Prezentare: Big Data demistificat](https://reader033.fdocuments.in/reader033/viewer/2022060201/559b17d51a28ab94308b47cf/html5/thumbnails/53.jpg)
Va multumim!