big data overview ppt
-
Upload
vikas-katare -
Category
Engineering
-
view
224 -
download
12
Transcript of big data overview ppt
![Page 1: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/1.jpg)
ASEMINAR ON
BIG DATA
PRESENTED BY:-VIKAS KATAREM.TECH(I.T.)
EMail: [email protected] no.+917031120786
![Page 2: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/2.jpg)
WHAT IS DATA
• The data is binary sequence with weighing factor.
• Information of any thing is consider as data.
• Data is distinct pieces of information , usually formatted in a special way.
![Page 3: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/3.jpg)
Big Data Definition
• No single standard definition…
“Big Data” is data whose scale, diversity, and
complexity require new architecture, techniques,
algorithms, and analytics to manage it and extract
value and hidden knowledge from it…
![Page 4: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/4.jpg)
3 V’S OF BIG DATA
![Page 5: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/5.jpg)
Lots of Data
• 2.5 quintillion bytes of data are generated every day!– A quintillion is 1018
• Data come from many quarters.– Social media sites
– Sensors
– Digital photos
– Business transactions
– Location-based data
![Page 6: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/6.jpg)
Who’s Generating Big Data
Social media and networks(all of us are generating data)
Scientific instruments(collecting all sorts of data)
Mobile devices (tracking all objects all the time)
Sensor technology and networks(measuring all kinds of data)
6
![Page 7: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/7.jpg)
Challenges
How to transfer Big Data?
![Page 8: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/8.jpg)
• Storage & Transport issue
• Data management issue
• Processing issue
• Privacy & security
• Data access and sharing information
• Fault tolerence
![Page 9: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/9.jpg)
9
Past Big Data Solutions
• Data Shard’ing
– Is a “shared nothing” partitioning scheme for large databases across a number of servers increasing scalability of performance of traditional relational database systems. Essentially, you are breaking your database down into smaller chunks called “shards” and spreading them across a number of distributed servers. The advantages of Sharding is as follows:
• Easier to manage
• Faster
• Reduce Costs
![Page 10: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/10.jpg)
BIG DATA ANALYTICS
• Examining large amount of data
• Appropriate information
• Identification of hidden patterns unknown correlations
• Competitive advantages
![Page 11: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/11.jpg)
Types of Tools Typically Used in Big Data Scenario
• Where is the processing hosted?
– Distributed server/cloud
• Where data is stored?
– Distributed Storage (eg: Amazon s3)
• Where is the programming model?
– Distributed processing (Map Reduce)
• What operations are performed on the data?
– Analytic/Semantic Processing (Eg. RDF)
![Page 12: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/12.jpg)
12
Big Data Solutions
• SANS– SANS are essentially dedicated, high performance storage networks that transfer
data between servers and storage devices, separate from the Local Area Network (usually through fiber channels).
– ADVANTAGES
• Ability to move large blocks of data
• High level of performance and availability
• Dynamically balances loads across the network.
– DISADVANTAGES
• Complex to manage a wide scope of devices
• Lack of Standardization
• SANs are very expensive11
![Page 13: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/13.jpg)
RDF
• (RESOURCE DESCRIPTOR FRAMEWORK)
• Why is RDF uniquely suited to expressing data and data relationships?
• More flexible – data relationships can be explored from all angles
• More efficient – large scale, data can be read more quickly
– not linear like a traditional database
– not hierarchical like XML
![Page 14: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/14.jpg)
HADOOP
Software platform that lets one easily write and run applications that process vast
amounts of data. It includes:
– Map Reduce – offline computing engine
– HDFS – Hadoop distributed file system
– HBase (pre-alpha) – online data access
– Scalable: It can reliably store and process petabytes.
– Economical: It distributes the data and processing across clusters of commonly
available computers (in thousands).
– Efficient: By distributing the data, it can process it in parallel on the nodes
where the data is located.
– Reliable: It automatically maintains multiple copies of data and automatically
redeploys computing tasks based on failures.
![Page 15: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/15.jpg)
MAP REDUCE
• Parallel programming model meant for large clusters
– User implements Map() and Reduce()
• Parallel computing framework
– Libraries take care of EVERYTHING else• Parallelization
• Fault Tolerance
• Data Distribution
• Load Balancing
• Useful model for many practical tasks (large data)
![Page 16: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/16.jpg)
Map+Reduce
• Map:– Accepts input key/value
pair
– Emits intermediate key/value pair
• Reduce :– Accepts intermediate
key/value* pair
– Emits output key/value pair
Very big
data
ResultMAP
REDUCE
PartitioningFunction
![Page 17: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/17.jpg)
![Page 18: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/18.jpg)
Finally….
‘Big- Data’ is similar to ‘Small-data’ but bigger
.. But having data bigger it requires different approaches:
Techniques, tools, architecture
… with an aim to solve new problems
Or old problems in a better way
12
![Page 19: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/19.jpg)
THANKING YOU
![Page 20: big data overview ppt](https://reader030.fdocuments.in/reader030/viewer/2022012402/55a932ed1a28ab30368b4760/html5/thumbnails/20.jpg)
REFRENCES
• www.wikipedia.com
• www.slideshare.com
• www.powershow.com
• www.lv-aitp.org/2012-2013%20Programs/Big%20Data.ppsx