NewSQL - Deliverance from BASE and back to SQL and ACID
-
Upload
tony-rogerson -
Category
Data & Analytics
-
view
758 -
download
1
description
Transcript of NewSQL - Deliverance from BASE and back to SQL and ACID
![Page 1: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/1.jpg)
NewSQL - Deliverance from BASE and back to SQL and ACID
There are a number of NewSQL products now on market such as VoltDB and Progres-XL. These promise NoSQL performance and scalability but with ACID and relational concepts implemented with ANSI SQL.
This session will cover off why NoSQL came about, why it's had it's day and why NewSQL will become the backbone of the Enterprise for OLTP and Analytics.
Tony Rogerson, SQL Server MVP
[email protected]@tonyrogersonhttp://dataidol.com/tonyrogerson
![Page 2: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/2.jpg)
Who am I?Freelance SQL Server professional and Data Specialist
Fellow BCS, MSc in BI, PGCert in Data Science
28 years of development and database experience, 22 of which SQL Server – starting out in 1986 with VSAM, System W, Application System, DB2 and Oracle crossing over to Client/Server and SQL Server since 4.21a in 1993
Awarded SQL Server MVP yearly since 97
Founded UK SQL Server User Group back in ’99, founder member of DDD, SQL Bits, SQL Relay, SQL Santa
Interested in commodity based distributed processing of Data (naturally!)
![Page 3: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/3.jpg)
AgendaNoSQL
◦ Why the need?◦ What products are available?
Transactions◦ BASE◦ ACID
SQL◦ What is today’s SQL capable of?◦ SQL Server performance – NoSQL required?
NewSQL◦ SQL -> NoSQL -> NewSQL (distributed form of where we started)◦ Distributed Data and ACID
Discussion
![Page 4: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/4.jpg)
Not Only SQL (NoSQL)WHY THE NEED?
![Page 5: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/5.jpg)
Why the Need?The year is 2001 and
◦ It’s that Big Data thing….
◦ Mainstream Relational Databases (that use SQL) are scale up
◦ More grunt required – buy a bigger box
◦ SAN based storage is ridiculously expensive and complicated, heavy TCO
Y2K + 1◦ Developers twiddling their thumbs ;)
Web adoption accelerates◦ Google, Yahoo, Amazon and the like are born
◦ MySQL does not scale – too inflexible
◦ Up front costs of kit for projects/business that may fail – need elasticity
http://www.tomshardware.co.uk/15-years-of-hard-drive-history-uk,review-1908-7.html
![Page 6: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/6.jpg)
Products AvailableVaried – type of NoSQL database
◦ Graph
◦ Key-Value
◦ Column store/Column Family
◦ Document Store
◦ Object
◦ Relational but without SQL
You name it and there is a product to do it
![Page 7: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/7.jpg)
Performance Today [commodity]64KiB 100% Read
100% sequential 100% random
![Page 8: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/8.jpg)
ACIDAtomicity
◦ The bounds of the transaction – everything within those bounds is a single unit of work◦ All or nothing
Consistency◦ Data must reside in the correct Domain of values◦ Deferrable to the end of the unit of work
Isolation◦ Changes are Isolated from other users◦ Other connections cannot update what you have updated/updating◦ Multi-Value Concurrency Control (MVCC) – snapshots◦ Locking
Durability◦ In system failure your changes are still maintained – nothing is lost
![Page 9: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/9.jpg)
BASE (Basically Available, Soft-state, Eventually Consistent)BASE is a Transactional modelish (at the global level, rather than individual transactions)
Specific to Distributed database model
Basically Available – all or some of the system is available
Node 1 Node 2 Node 3
![Page 10: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/10.jpg)
BASE (Basically Available, Soft-state, Eventually Consistent)
Soft-stateEventually Consistent
System may change over time [as replica’s become up-to-date (consistent)]
Node 1 Node 2 Node 3
Insert value ‘A’
![Page 11: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/11.jpg)
Eventual Consistency in SQL ServerAsynchronous Availability Groups/Database Mirroring
Replication
Eventual / Causal Consistency◦ Eventual no good for order specific [and important] transactions
◦ Like Merge replication
◦ Causal: deliver messages in correct order [e.g. service broker]◦ Like Transactional Replication
![Page 12: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/12.jpg)
ACID - Distributed2PC is clunky and doesn’t scale across many nodes
PAXOS – Consensus theory – scales better
Remove the need for distributed ACID altogether
Coordinator
Subordinate
SubordinateINSERT
2PC Transaction
All or nothing
Subordinate
![Page 13: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/13.jpg)
Mixing BASE and ACID ACID applied local data node
BASE remote
![Page 14: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/14.jpg)
RelationalSets
Tables with Rows x Columns
Relational Theory dictates the row/column intersection is an Atomic value i.e. contains only a single value from the domain modelled for that column
Chris Date:◦ Atomicity cannot really be defined as absolute in Normal Form
◦ a column can contain “relational values” i.e. another table
Normal Form – the process used to define the schema around the data being modelled
![Page 15: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/15.jpg)
OldSQL rootsBuilt for disk storage
Built for single machine, scale-up
Mature SQL language (decades of research) over the Relational Model
SQL extensions to deal with unstructured data (freetext)
![Page 16: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/16.jpg)
OldSQL todayACI [no Durability]
In-Memory
Modified design to work with Flash
Still scale-up
![Page 17: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/17.jpg)
SQL ServerDelayed / No-Durability in SQL Server 2014
In-Memory extensions
Entity Attribute Value design combined with ColumnStore
Sparse Columns / Column sets
DEMOS
![Page 18: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/18.jpg)
NewSQLOLDSQL -> SQL -> NEWSQL
![Page 19: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/19.jpg)
Describe NewSQLNewSQL = OldSQL + Transparent_Data_Distribution + ACID
Also – add in the knobs and whistles for new tech◦ Flash
◦ RAM
◦ Processor cache improvements
◦ Better parallelisation across local processor cores
Basically -> Scale out with ACID
![Page 20: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/20.jpg)
Latency in a Distributed environment
Server
1Gbit ethernet
Server
Switch
Server
Server
Server
Server
SQL ServerFirstName Surname DOB
Query returns20,000 rows558KiBytes of data
FastestSlowerSlowest(Data Travel)
![Page 21: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/21.jpg)
Reduce Latency – Data Locality
SQL ServerServer1Gbit ethernetServer
Switch
Server
Server
Server
Server
SQL ServerServer
SQL ServerServer
![Page 22: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/22.jpg)
Distributed SQL with ACID
SQL ServerServer11Gbit ethernet
Switch
SQL ServerServer2BEGIN DISTRIBUTED TRAN
INSERT Server3.pres_NEWSQL.dbo.people( ….. )INSERT Server2.pres_NEWSQL.dbo.people( ….. )INSERT Server1.pres_NEWSQL.dbo.people( ….. )
COMMIT TRAN
• 2 Phase Commit using DTC• High Latency• All or nothing
SQL ServerServer2
![Page 23: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/23.jpg)
Querying a Distributed EnvironmentFinancial Trading – Global position of the book
TOP 10 customers
Not easy (at speed) in an OLTP setting
N1 N2 N3 N4
Network Switch
![Page 24: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/24.jpg)
Couple {Data, Processing} with {Machine-n}
![Page 25: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/25.jpg)
PartitioningChop big table up into “horizontal partitions”
Partition key required (Mash, Modulo, Key range)
Each partition is self-contained binding rows by the partitioning key
Access all data through logical view over all partitions (local database)
Table by table basis
![Page 26: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/26.jpg)
Shared NothingPartitioning+
Each Shard is self-contained and has all the procs, meta-data and of course your partition of data
Shard Key common to multiple tables, for example CustomerID, Email Address.
Greater autonomy across the distributed database
Seeing the entire database as a logical unit is more difficult – joining is a nightmare
Node 1
Node 2
Node 3
![Page 27: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/27.jpg)
Data Distribution using HashingDistributed Database Cluster has fixed number of data nodes
Your data is spread across the database cluster◦ 10 node cluster; each data item may reside on 3 nodes
◦ Which 3 nodes?
Data key is Hashed to a number – hashing algorithm is deterministic
data-node = f( data-key )◦ print ( checksum( 'All hale to the ale' ) * 1.) % 10
◦ print ( checksum( 'And a glass of wine for the ladies' ) * 1.) % 10
![Page 28: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/28.jpg)
Sharding Sync
LOGICAL DATABASE
Pick a node
Node 1
Node 2
Node 3
Full copy of data
Subset of data
Replication
Apps
![Page 29: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/29.jpg)
Postgres-XC
Coordinators(plans, 2pc trans, knows about data distribution)
Applications(issue SQL to coordinators)
Data Nodes
GTMGlobalTransactionManager
http://de.slideshare.net/PavanDeolasee/postgresxc-28475161
![Page 30: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/30.jpg)
Combine Sharding + ReplicationShard your big tables based on a hash (or something) around your business key e.g. Customer, EmailAddress etc.
Replicate static tables.
![Page 31: NewSQL - Deliverance from BASE and back to SQL and ACID](https://reader034.fdocuments.in/reader034/viewer/2022052623/559c601d1a28abdc3d8b474d/html5/thumbnails/31.jpg)
Discussion
@tonyrogerson
http://dataidol.com/tonyrogerson