Choosing which big data, nosql or database technology to use
-
Upload
mark-madsen -
Category
Technology
-
view
112 -
download
0
description
Transcript of Choosing which big data, nosql or database technology to use
![Page 1: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/1.jpg)
One Size Doesn’t Fit AllChoosing which big data, NoSQL or database technology to use
March 14, 2012
Mark R. Madsenhttp://ThirdNature.net
![Page 2: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/2.jpg)
The problem of “big” is three problems of volume
Number of users!
Computations!
Amount of data!
![Page 3: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/3.jpg)
Unstructured data isn’t really unstructured.
The problem is that this data is unmodeled.
The real challenge is complexity.
Big data?
![Page 4: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/4.jpg)
The holy grail of databases under current market hype
A key problem is that we’re talking mostly about computation over data when we talk about “big data” and analytics, a potential mismatch for both relational and nosql.
![Page 5: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/5.jpg)
Solving the Problem Depends on the Diagnosis
![Page 6: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/6.jpg)
You must understand your workload ‐ throughput and response time requirements aren’t enough.▪ 100 simple queries accessing month‐to‐date data
▪ 90 simple queries accessing month‐to‐date data plus 10 complex queries using two years of history
▪ Hazard calculation for the entire customer master
▪ Performance problems are rarely due to a single factor.
![Page 7: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/7.jpg)
Workload: One big query or many small queries?
Retrieval: small return set or large?
Selectivity: large volume of data scanned or small?
![Page 8: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/8.jpg)
Important workload parameters to know
• Read‐intensive vs. write‐intensive
![Page 9: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/9.jpg)
Important workload parameters to know
• Read‐intensive vs. write‐intensive
• Mutable vs. immutable data
![Page 10: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/10.jpg)
Important workload parameters to know
• Read‐intensive vs. write‐intensive
• Mutable vs. immutable data
• Immediate vs. eventual consistency
![Page 11: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/11.jpg)
Important workload parameters to know
• Read‐intensive vs. write‐intensive
• Mutable vs. immutable data
• Immediate vs. eventual consistency
• Short vs. long access latency
![Page 12: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/12.jpg)
Important workload parameters to know
• Read‐intensive vs. write‐intensive
• Mutable vs. immutable data
• Immediate vs. eventual consistency
• Short vs. long access latency
• Predictable vs. unpredictable data access patterns
![Page 13: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/13.jpg)
Types of workloads
Write‐biased: ▪ OLTP▪ OLTP, batch▪ OLTP, lite▪ Object persistence▪ Data ingest, batch▪ Data ingest, real‐time
Read‐biased:▪ Query▪ Query, simple retrieval
▪ Query, complex
▪ Query‐hierarchical / object / network
▪ Analytic
Mixed?Inline analytic execution, operational BI
![Page 14: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/14.jpg)
Matching to parameters, at assumption of data scale
Workload parameters
Write‐biased
Read‐biased
Updateabledata
Eventual consistency ok
Un‐predictablequery path
Computeintensive
Standard RDBMS
ParallelRDBMS
NoSQL (kv,dht, obj)
Hadoop*
Streaming database
You see the problem: it’s an intersection of multiple parameters, and this chart only includes the first tier of parameters. Plus, workload factors can completely invert these general rules of thumb.
![Page 15: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/15.jpg)
Matching to parameters, at assumption of data scale
Workload parameters
Complex queries
Selective queries
Low latency queries
High concurrency
High ingest rate
Standard RDBMS
Parallel RDBMS
NoSQL (kv, dht, obj)
Hadoop
Streaming database
You have to look at the combination of workload factors: data scale, concurrency, latency & response time, then chart the parameters.
![Page 16: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/16.jpg)
Always build a proof of concept!
![Page 17: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/17.jpg)
Image Attributions
Thanks to the people who supplied the images used in this presentation:
Holy Grail – © Monty Python Ltd.Cupcakes – <lost attribution on Flickr>
rock‐fall‐roadblock.jpg ‐ http://www.flickr.com/photos/wsdot/4679360979/
roadblock‐sheep.jpg ‐ http://www.flickr.com/photos/brizo_the_scot/4013939756/
Slide 17
![Page 18: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/18.jpg)
About the PresenterMark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, analytics and information management. Mark is an award-winning author, architect and former CTO whose work has been featured in numerous industry publications. During his career Mark received awards from the American Productivity & Quality Center, TDWI, Computerworld and the Smithsonian Institute. He is an international speaker, contributing editor at Intelligent Enterprise, and manages the open source channel at the Business Intelligence Network. For more information or to contact Mark, visit http://ThirdNature.net.
![Page 19: Choosing which big data, nosql or database technology to use](https://reader038.fdocuments.in/reader038/viewer/2022103111/54c6829b4a795997128b4581/html5/thumbnails/19.jpg)
About Third Nature
Third Nature is a research and consulting firm focused on new and emerging technology and practices in analytics, business intelligence, and performance management. If your question is related to data, analytics, information strategy and technology infrastructure then you‘re at the right place.
Our goal is to help companies take advantage of information-driven management practices and applications. We offer education, consulting and research services to support business and IT organizations as well as technology vendors.
We fill the gap between what the industry analyst firms cover and what IT needs. We specialize in product and technology analysis, so we look at emerging technologies and markets, evaluating technology and hw it is applied rather than vendor market positions.