Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Other NoSQL Data Systems
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to...
-
Upload
christian-tzolov -
Category
Software
-
view
322 -
download
0
Transcript of Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to...
![Page 1: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/1.jpg)
Federated SQL on Hadoop and Beyond: Leveraging Apache
Geode to Build a Poor Man's SAP HANA
by Christian Tzolov @christzolov
![Page 2: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/2.jpg)
Whoami
Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD, Apache Committer, Crunch PMC member
[email protected] blog.tzolov.net @christzolov
![Page 3: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/3.jpg)
Contents• Data Systems - Principles
• Use Case: OLTP and OLAP Data Systems Integration
• Passive Data Synchronization (Demo)
• Federated Queries With HAWQ
• HAWQ Web Tables
• HAWQ PXF Architecture
• Geode PXF (Demo)
![Page 4: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/4.jpg)
Data Systems
![Page 5: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/5.jpg)
Compute Arbitrary Functions on Arbitrary Data
![Page 6: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/6.jpg)
Architectural Patterns• Data Lake
• Lambda
• Kappa
• Tachyon
• …
![Page 7: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/7.jpg)
Integration Stack
Apache HDFS Data Lake - PHD or HDP HadoopApache HAWQ SQL on Hadoop (OLAP)Apache Geode In-memory data grid (OLTP)Spring XD Integration and Streaming RuntimeApache Ambari Manages All ClustersApache Zeppelin Web UI for interaction with Data Systems
Hadoop/HDFS
Geode HAWQ
SpringXD
Ambari
Zeppelin
![Page 8: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/8.jpg)
Apache Geode (OLTP)• Cache - Performance / Consistency / Resiliency
• Region - Highly available, redundant, distributed Map
China Railway Corporation
5,700 train stations 4.5 million tickets per day 20 million daily users 1.4 billion page views per day 40,000 visits per second
Indian Railways
7,000 stations 72,000 miles of track 23 million passengers daily 120,000 concurrent users 10,000 transactions per minute
![Page 9: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/9.jpg)
Apache HAWQ (OLAP)• Built around a Greenplum MPP DB (C and C++)
• Hadoop Native: Parquet, HDFS and YARN
• 100% ANSI SQL compliant: SQL-92/99/2003…
• Extensible - Web Tables, PXF
• Connectivity: ODBC and JDBC
• Access internal store: HAWQ(Parquet)InputFormat
![Page 10: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/10.jpg)
HAWQ - TPC-DS• Outperforms Impala by overall 454%
• 344% of performance improvement over Hive/Tez
• Runs 100% of the TPC-DS queries. Unlike Impala or Hive
• References: http://bit.ly/1NUDcLl, https://github.com/dbbaskette/pivbench
![Page 11: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/11.jpg)
Spring XDOrchestrates and automates all steps across multiple data stream pipelines
• HTTP • Tail • File • Mail • Twitter• Gemfire • Syslog • TCP • UDP • JMS • RabbitMQ • MQTT • Kafka• Reactor TCP/UDP
• Filter • Transformer • Object-to-JSON • JSON-to-Tuple • Splitter • Aggregator • HTTP Client • Groovy Scripts • Java Code • JPMML Evaluator • Spark Streaming
• File • HDFS • JDBC • TCP • Log • Mail • RabbitMQ • Gemfire • Splunk • MQTT • Kafka• Dynamic Router • Counters
![Page 12: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/12.jpg)
Ambari Management
![Page 13: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/13.jpg)
Use Case: Join OLTP and OLAP
Data Systems
![Page 14: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/14.jpg)
Use Case
• Integrate Geode with HAWQ
• Unified data view
• Slowly Changing Dimensions (SCDs)
• Keep the Operational and Historical data in Sync
![Page 15: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/15.jpg)
Passive Data Synchronization
![Page 16: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/16.jpg)
Passive Sync Architecture
![Page 18: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/18.jpg)
Passive Sync Improved (gpfdist)
![Page 20: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/20.jpg)
Federated Queries With HAWQ
![Page 21: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/21.jpg)
HAWQ Web Tables• HAWQ Web Table - access dynamic data sources
on a web server or by executing OS scripts
• Leverage Geode REST API and OQL
• SpringBoot Controller to convert JSON into TSV
CREATE EXTERNAL WEB TABLE EMPLOYEE_WEB_TABLE (...) EXECUTE E'curl http://<hostname>/gemfire-api/v1/ queries/adhoc?q=<URLencoded OQL statement>' ON MASTER FORMAT 'text' (delimiter '|' null 'null' escape E'\\');
![Page 22: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/22.jpg)
HAWQ Web Tables Architecture
Access dynamic data sources on a web server or by executing OS scripts.
![Page 23: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/23.jpg)
HAWQ Web Tables Limitations
• Not Scalable
• No Push Down Predicates
• Static
• No Compression
• Requires Additional Components
![Page 24: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/24.jpg)
P(ivotal) Extension Framework (PXF)
• Java-Based
• Parallel, High Throughput Data Access
• ANSI-compliant SQL On Any Dataset
• Wide variety of PXF plugins
![Page 25: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/25.jpg)
PXF Architecture
![Page 26: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/26.jpg)
PXF Data Model• Data Source is modeled as a collection of one or more
Fragments.
• Each Fragment consists of many Rows that in turn are split into typed Fields.
• Analyzer (optional) provides PXF statistical data for the HAWQ query optimizer
• Metadata about the data source locations, access attributes, table schemas formats, SQL queries filters, etc
![Page 27: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/27.jpg)
PXF ProcessorsPlugin
InputData
FragmetergetFragments()
CustomAccessor CustomResolver
AnalyzergetEstimatedStat()
CustomAnalyzer
ReadResolvergetFields(OneRow)
WriteResolvergetFields(OneRow)
ReadAccessoropenForRead() readNextObject() closeForRead()
WriteAccessoropenForWrite() writeNextObject() closeForWrite()
CustomFragmeter
Extend ClassImplement Interface
![Page 28: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/28.jpg)
PXF Deployment ModelHAWQ Master
Query Dispatcher
NameNodePXF
Service
Date Node XPXF
ServiceQuery
Executor
data request for Fragment X
pxfwritable records
Metadata request
Fragment list
External (Distributed) Data System
Date Node ZPXF
ServiceQuery
Executor
data request for Fragment Z
pxfwritable records
Scan plan Result
SQL query
Result
Para
llel e
xecu
tion
![Page 29: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/29.jpg)
PXF External Tables CREATE EXTERNAL TABLE ext_table_name <Attribute list, …>
LOCATION('pxf://<host>:<port>/path/to/data? FRAGMENTER=package.name.FragmenterForX& ACCESSOR=package.name.AccessorForX& RESOLVER=package.name.ResolverForX& <Other custom user options>=<Value>’ ) FORMAT ‘custom'(formatter='pxfwritable_import');
![Page 30: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/30.jpg)
PXF Gallery•HdfsTextSimple
•HdfsTextMulti
•Hive
•HiveRC
•HiveText
•HBase
•Avro
• Accumulo
• Casandra
• JSON
• Redis
• Geode/Gemfire
• JDBC
![Page 31: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/31.jpg)
HAWQ PXF/Geode
![Page 32: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/32.jpg)
Federated Queries with PXF/Geode - Architecture
![Page 33: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/33.jpg)
PXF/Geode Table CREATE EXTERNAL TABLE <GEMFIRE_TABLE_NAME> (...) LOCATION('pxf://<namenode>/<path>? PROFILE=GEMFIRE & LOCATORS=<gemfire-server:port> & REGION=<region-name>') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
![Page 34: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/34.jpg)
Geode Profile
<profile> <name>GEMFIRE</name> <description>A profile for reading Gemfire data</description> <plugins> <fragmenter>io.pivotal.pxf.plugins.gemfire.GemfireFragmenter</fragmenter> <accessor>io.pivotal.pxf.plugins.gemfire.GemfireAccessor</accessor> <resolver>io.pivotal.pxf.plugins.gemfire.GemfireResolver</resolver> </plugins> </profile>
![Page 35: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/35.jpg)
Federated Queries With PXF/Geode - Demo
![Page 36: Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana](https://reader030.fdocuments.in/reader030/viewer/2022020108/5889bf901a28abca448b4be7/html5/thumbnails/36.jpg)
Stay Connected• PXF Maven Repository: https://bintray.com/big-data/maven/pxf/view
• PXF Community Plugins: https://bintray.com/big-data/maven/pxf-plugins/view
• Apache HAWQ: https://github.com/apache/incubator-hawq
• Apache Geode: https://github.com/apache/incubator-geode
• Apache Zeppelin: https://zeppelin.incubator.apache.org
• Spring XD: http://projects.spring.io/spring-xd/