Post on 10-May-2015
description
© Copyright SELA software & Education Labs Ltd. | 14-18 Baruch Hirsch St Bnei Brak, 51202 Israel | www.selagroup.com
SELA DEVELOPER PRACTICEDecember 15-19, 2013
Manu Cohen-Yashar
The Cloud, Big Data and NoSQL
Agenda
What is the cloudData boom No SQLBig DataCloud DistributionsWhat’s next
Make sense of : Cloud , Big Data and No SQL
How they fit together
Make money !!!
What is the cloud
Cloud Computing is an Idea …
Infrastructure is provisioned by a cloud provider.Automatic Scale.Elasticity. Pay as you use.Availability.Simple, Automatic, Economic.
Type of Clouds
IAASPAASSAASand more…
Identity As A ServiceConnectivity As A Service
Storage As A Service
Lots of Data
Data is doubles every 18 monthPicturesWeb siteemailsSensorsGeo InformationFinancial InformationScienceArt. . . (Infinite list)
No Limits
With the cloud it is now possible to mount any size if cluster and conduct any computation in any scale.The one who will make sense of all available data will rule the world.
The conclusion: Use the cloud to analyze large scale of data.
Lets Talk about data
When we think of data we think of …
Data has many forms
Yet data comes in many forms and shapes
Graphs Documents
Time Series
Blobs
GeoSensors
UnstructuredStructured
Web
No Relational
Not all types of data fit well into the relational world.Not all data use cases fit well into the ACID conventionThe relational model does not scale very good
Difficult to distributeDifficult to replicate
The CAP Theory
RDBMS
Replicated NoSQL
ShardedNoSQL
During a network partition, a distributed system must choose either Consistency or Availability.
NO SQL
Large family of databasesNo SchemaNo relations enforcedDesigned for high scale and distribution
Types of NO SQL DBKey ValueWide ColumnsDocumentsGraph
Motivation for NO SQL
Large Scale and DistributionSimplicityLow costGood fit with the data modelVolume, Velocity and Variety
There is no one NO SQL solution for all use cases
Important
There are over than 150 possible offerings…
The Cloud and NO SQL
All Cloud Providers have NO SQL solutionsAzure TablesGoogle Big TableAmazon DynamoDB
NO SQL Databases are deployed on a cluster
There are large number of cloud hosting offerings for no-sql clusters
MongoHQ (MongoDB)Cassandra on Google Compute engineMany more
Example – Mongo in Azure
Big Data
What is Big?“Big” cannot fit on a single machine.
Conclusion:Big data has to be distributed.
Types of Big Data Processing
QueryGeneral AnalysisClassificationRecommendationClusteringAuditing and monitoringMore…
Challenges
Develop a parallel algorithmReduce the network traffic -> bring compute to dataMonitor and manage large number of parallel tasksSurvive failuresPerformanceLinear scale
Batch Processing VS Operational Intelligence
Batch ProcessingWork on existing dataProvide results within minutes
Operational IntelligenceWork on stream of dataProvide real-time results
Distributed File System
No one server can store Big Data filesDistribute files across clusterFailure is part of the gameSimilar API to traditional File SystemsExamples:
HDFSGFSCassandra FSMongo FS
Hadoop
Big Data Analysis PlatformBatch ProcessingBrings Compute tasks to data nodesParallel Processing using Map-ReduceOpen Source Huge eco system
Hadoop Eco System
Writing a valuable Map-Reduce job for Hadoop is not simpleMany open source projects provide abstractions
PigHiveHBaseSqoopMahoutZooKeeperMore
Hadoop on the Cloud
Hadoop runs on a clusterYou can use a cluster as a service on major cloud offerings
Storm
Real-Time big data analyticsProcess streams of dataCan be used with any programming languageWide integration with data sources
Check your schema
Be open to use NO-SQL data storesIdentify your use-case and find the right database for youCreate a simple POC
Look for Big Data
Ask yourself: What can I gain from big data?
How the new data or analysis scope can enhance your existing set of capabilities? What additional opportunities for intervention or processes optimisation does it present?
Identify your use case and find the right product and data model.Look for web distributions and create a simple POC
Questions