Conformations and dynamics of Ets-1 ETS domain–DNA complexes
Ets train ppt_big_data_basics_v2.0
-
Upload
eclipse-techno-consulting-global-p-ltd -
Category
Data & Analytics
-
view
135 -
download
0
Transcript of Ets train ppt_big_data_basics_v2.0
Big Data Basics
AUTHOR : MITHUN BANERJEEDATE: 05-OCTOBER-2016
C O P Y R I G H T P R O T E C T E D B Y E C L I P S E T E C H N O C O N S U LT I N G G L O B A L ( P ) LT D .
What is Big data?Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. --Wikipedia
Is the above definition fully comprehensive?
Lets try to go deep in next slides
Data units to measure exponential growth of data over the years
VOLUME of DATA
Type of data
• Relational Data (Tables/Transaction/Legacy Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph DataSocial Network, Semantic Web (RDF), …
• Streaming Data You can only scan the data once
• A single application can be generating/collecting many types of data
• Big Public Data (online, weather, finance, etc)
Variety (complexities) of data
Velocity of dataLate decisions missing opportunities
Example: Healthcare monitoring: sensors monitoring your activities and body any abnormal measurements require immediate reaction
Velocity of data
Social media and networks(all of us are generating data)Scientific instruments
(collecting all sorts of data)
Sensor technology and networks(measuring all kinds of data)
REAL TIME / FAST DATA
3Vs
4Vs
Generation and Consumption of Data
In past
In present
OLTP: O N L I N E T RA N S AC T I O N P R O C E S S I N G ( D B M S )
OLAP: O N L I N E A N A LY T I C A L P R O CE S S I N G ( DATA WA R E H O U S I N G )
RTAP: R EA L-T IME ANA LY T IC S P R OC ES S I NG (B IG DATA ARC H I T EC T U R E & T E CH NOLOGY )
Driver of Data
- Optimizations and predictive analytics- Complex statistical analysis- All types of data, and many sources- Very large datasets- More of a real-time
- Ad-hoc querying and reporting- Data mining techniques- Structured data, typical sources- Small to mid-size datasets
The Evolution of Business Intelligence
BI ReportingOLAP &
Dataware houseBusiness Objects, SAS,
Informatica, Cognos other SQL Reporting
Tools
Interactive Business
Intelligence & In-memory
RDBMS
QliqView, Tableau, HANA
Big Data:Real Time &Single ViewGraph Databases
Big Data: Batch Processing
& Distributed Data
StoreHadoop/Spark;
HBase/Cassandra1990’s 2000’s 2010’s
Speed
Scale
Scale
Speed
Topic 1: Data Analytics & Data Mining• EXPLORATORY DATA ANALYSIS• • L INEAR CLASSIF ICATION (PERCEPTRON &
LOGIST IC REGRESSION) • • L INEAR REGRESSION
• C4.5 DECIS ION TREE
• APRIORI
• K-MEANS CLUSTERING• • EM ALGORITHM
• PAGERANK & HITS
• COLLABORATIVE F ILTERING
Topic 2: Hadoop/MapReduce Programming & Data Processing
ARCHITECTURE OF HADOOP, HDFS, AND YARNPROGRAMMING ON HADOOP
BASIC DATA PROCESSING: SORT AND JOININFORMATION RETRIEVAL USING HADOOPDATA MINING USING HADOOP (KMEANS+HISTOGRAMS)MACHINE LEARNING ON HADOOP (EM)
HIVE/PIGHBASE AND CASSANDRA
Topic 3: Graph Database and Graph Analytics
GRAPH DATABASE (HTTP://EN.WIKIPEDIA.ORG/WIKI/GRAPH_DATABASE)
Native Graph Database (Neo4j) Pregel/Giraph (Distributed Graph Processing Engine)
NEO4J/TITAN/GRAPHLAB/GRAPHSQL
Reference to read for in depth home work
• Hadoop: The Definitive Guide, Tom White, O’Reilly
• Data Mining: Concepts and Techniques, Third Edition, by Jiawei Han et al.
• https://www.mongodb.com/collateral/big-data-examples-and-guidelines-enterprise-decision-maker
• • http://
www.aptude.com/blog/entry/hadoop-vs-mongodb-which-platform-is-better-for-handling-big-data
• • http://
www.slideshare.net/wlaforest/an-introduction-to-big-data-nosql-and-mongodb
• http://www.infoworld.com/article/2608460/application-development/the-10-worst-big-data-practices.html
•
THANK YOU ETS