IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Converged Data Platform
-
Upload
in-memory-computing-summit -
Category
Data & Analytics
-
view
263 -
download
0
Transcript of IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Converged Data Platform
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
NIKITA IVANOVGridGain Founder & CTO
Apache Ignite PMC
Apache 2.0 - Towards Converged Data PlatformFast Data Meets Open Source
http://ignite.apache.org @apacheignite
See all the presentations from the In-Memory Computing Summit at http://imcsummit.org
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
Agenda
• Fast Data vs Big Data– In-Memory Databases– In-Memory Data Grids– Hadoop & Spark
• Converged Data Platform• Big Data + Fast Data
•What is Apache Ignite– Big Bank Use Case– In-Memory Data Fabric– Shared Memory Layer• Share Spark RDDs• In-Memory File System
• Q & A
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
• Very Active Community• Great Way to Learn Distributed Computing• How To Contribute:
– https://ignite.apache.org/community/contribute.html#contribute
– https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
Apache Ignite: Join Us!
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
• Big Data– OLAP mostly– Larger Historical Data Set– Read-Mostly– Throughput Not Important– Low Query Latencies– Good-enough for interactive
analytics
Fast Data vs Big Data
• Fast Data– OLTP mostly– Smaller Operational Data Set– High Throughput (ops/sec)– Low Latencies– Consistent and Transactional
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
• Big Data– Hadoop• MapReduce• HDFS• HBase
– Spark• Machine Learning• Graph Processing• SQL
– Warehouse/DB Vendors
Fast Data vs Big Data
• Fast Data– Streaming• Flink• Kafka• Apex
– In-Memory Data Grid• Ignite• Geode (incubating)
– In-Memory Database• MemSQL• VoltDB
– NoSQL• MongoDB• Cassandra
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
• In-Memory Databases– MemSQL• Closed Source• Free Limited Community Edition
– VoltDB• Open Source Community Edition (AGPL)• Closed Source Enterprise Edition
• Main Features– High-Throughput– Low Latencies– Full SQL Support• However, SQL is the only API
– Disk Persistence• Disk is just a copy of memory
– Complete replacement of existing databases
Fast Data: In-Memory Databases
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
• In-Memory Data Grids– Apache Ignite – In-Memory Data Fabric– Apache Geode (incubating)– Hazelcast
• Main Features– High throughput– Low latencies– Key-value store– Transactions– Extensive data querying capability– Disk persistence• Read & write-through to databases• Keep your existing database
Fast Data: In-Memory Data Grids
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
• Apache Hadoop & Apache Spark– Big Family of Products– Batch Processing– In-Memory Processing (Spark)
• Main Features– Disk-based storage– Interactive Analytics– No Transactions– Read-Only Data Sets– Strong Querying Capabilities– Relatively Low Latencies• Good enough for human eye
Big Data Ecosystem
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
• Big Data+ Add Shared Memory Store+ Add Transactions
How To Bridge The Gap?
• Fast Data+ Add Disk-First Data Sets+ Add Disk-First Processing
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
• Fast Data + Big Data• Distributed and Scalable• Real Time Data-To-Action• Hybrid Transactional and Analytical Processing (HTAP)– Fast Data in Memory– Big Data on Disk– Combine RAM, NAND, HDD– No ETL– Query historical and analytical data– Transactions on historical and analytical data
What is a Converged Data Platform?
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
Apache IgniteTM In-Memory Data Fabric: Strategic Approach to IMC
• Supports Applications of various types and languages
•Open Source – Apache 2.0• Simple Java APIs• 1 JAR Dependency• High Performance & Scale• Automatic Fault Tolerance•Management/Monitoring• Runs on Commodity Hardware
• Supports existing & new data sources• No need to rip & replace
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
Apache Ignite In-Memory Data Fabric
© 2014 GridGain Systems, Inc.
Use Case: Largest bank in Russia and Eastern Europe, and the third largest in Europe
• Sberbank Requirements– Migrate to data grid architecture– Minimize dependency on Oracle– Move to open source
•Why Apache Ignite– More than a Data Grid– Best performance
• 10+ competitors evaluated– Demonstrated best
• Fault tolerance & scalability• ANSI-99 SQL Support• Transactional consistency
• Jointly Developing• Disk-Only Data Sets• Query Disk & Memory Together
130
Milli
on C
usto
mer
s
DepositWithdraw
alStatemen
tDisk Store
Disk Store
Disk Store
1000+ Servers
GridGainSecurity
DepositWithdrawalStatement
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
• Based on JCache (JSR 107)– In-Memory Key-Value Store– Basic Cache Operations– ConcurrentMap APIs– Collocated Processing (EntryProcessor)– Events and Metrics– Pluggable Persistence
• Ignite Data Grid– ACID Transactions– SQL Queries (ANSI 99)– In-Memory Indexes– On-Heap & Off-Heap Memory– Automatic RDBMS Integration
Apache Ignite Data Grid
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
Data Grid: Distributed Caching
Partitioned Cache Replicated Cache
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
• ANSI-99 SQL• Always Consistent• Fault Tolerant• In-Memory Indexes (On-Heap and Off-Heap)• Automatic Group By, Aggregations, Sorting• Cross-Cache Joins, Unions, etc.• Ad-Hoc SQL Support
Data Grid: Ad-Hoc SQL (ANSI 99)
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
SQL Cross-Cache GROUP BY Example
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
• IgniteRDD Deployment Modes– Share RDD across tasks on the host– Share RDD across tasks in the application– Share RDD globally– Embedded vs External Deployments
• Faster SQL– In-Memory Indexes– SQL on top of Shared RDD
Share RDDs Across Spark Jobs
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
• Ignite In-Memory File System (IGFS)– Hadoop-compliant– Easy to Install– On-Heap and Off-Heap– Caching Layer for HDFS– Write-through and Read-through HDFS– Performance Boost
Ignite In-Memory File System
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
Ignite In-Memory Map Reduce• In-Memory Native
Performance• Zero Code Change• Use existing MR code• Use existing Hive queries• No Name Node• No Network Noise• In-Process Data Colocation• Eager Push Scheduling
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
• More SQL– Non-collocated Joins– 100% Data Modification Language (DML)– 100% Data Definition Language (DDL)
• More Disk– ATMM - Advanced Tiered-Memory Model:• Disk-first data sets• Any DRAM/NAND/HDD mix
– Seamless querying across ATMM
Proposed Apache Ignite 2.0 RoadmapConverged Data Platform
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
ANY QUESTIONS?Thank you for joining us. Follow the conversation.
http://www.ignite.apache.org
@apacheignite