A QUICK WAY TO WRITE REALLY, REALLY BIG OR REALLY, REALLY SMALL NUMBERS. Scientific Notation.
Big data - A Really Big Enchilada?
-
Upload
keshav-deshpande -
Category
Technology
-
view
232 -
download
2
description
Transcript of Big data - A Really Big Enchilada?
Big DataA really ‘Big’ deal or just another
hammer looking for a nail?
-Keshav DeshpandeSoftware Developer
A little bit of theory - The V’s of BigData
•Volume scaled at terabyte/petabyte levels•Variety structured, unstructured, hybrid data formats•Velocity data generated at internet speeds (tera, exa – range)
Often, Veracity is added to this list reliability of the data
Implications on IT Solutions Architectures•Current computing paradigm Data layer/Middleware/UI layer (n-tier architectures)
• fetch data from Data Layer• ship data to Middleware for processing (or to UI layer for
display) • ship data back to Data Layer for storage.
At ‘Big Data’ scale, this approach simply does not perform/scale!
What is so ‘big’ about Big Data
If you can’t go to the mountain, let the mountain come to - you !
Proposed ‘solution’
Ship processing to where the data is located, instead of shipping data to where process is located
Process smaller chunks of data, in parallel, then combine the results
OK, so with this scheme, we are assured of ‘scale’ and even ‘performance’ – so what do I do with it?
Remember the hammer and nail? It seems we have ourselves a hammer,
So lets look for the ‘nails’…..
• Besides storing/retrieving/processing data at scale
• parallel and distributed nature - necessitated by the 3 (or 4) V’s
• high level of concurrency - storing, retrieving or processing
• high level of asynchrony
• non-blocking, fire-and-forget
• call and then notify when “answer” is ready
• However Data is still ‘raw’
• Needs to be retrieved (mined) and processed (analyzed) to get at ‘Information’ or ‘Actionable Intelligence
Big Data Characteristics
Information is –
•not just confined to relationships between data entities (like in vanilla RDBMS) – • both data and associated meta-data are information
• increasingly expressed as graphs (sparse or dense) entity relations are still important, but they are now multi-dimensional
• very rich, data (and metadata) include•
• data entities (vertices)
• inter-relationships (links and edges)
• degrees of separation between vertices, links and edges
•RDBMS-like design approaches fall short, under-perform, and do not scale
The real Big Data challenges, then are -
What is involved?
•Retrieving data from large, distributed data stores mining of data for nuggets of information
•Analysis of data, but at internet scale to provide actionable intelligence• Analytics processing required to wring intelligence
out of raw data
•Information Visualization present analysis to the user• Dashboards/UI Composites
All of the above, but in real-time (or near real-time)
Big Data Processing
An emerging trend – data in constant motion
• Conventionally, data is at rest. Implication data is stale instantly• any analysis on at-rest is after-the-fact or post-
mortem, if you will…
• Data in motion implies as-it-happens, event-based, very loosely-coupled, asynchronous, non-blocking
• Analytics and BI at the point of streaming real-time, complex event processing
Big Data Processing
By no means, an exhaustive listing –
•Business Intelligence derive Insights better Decision-making
•Insights crystal ball possible future states • predictive and prescriptive analytics
•Automating development of such insight, developing algorithms• machine learning
Outcomes
•Predictive Analytics from both historical, and real time data
•Automated (and perpetual) Machine Learning
Applications of Big Data
Please stay in touch at - [email protected]