Impala - University of...
Transcript of Impala - University of...
Impala AModern,OpenSourceSQLEngineforHadoop
YogeshChockalingam
Agenda
• Introduction• Architecture• FrontEnd• BackEnd• Evaluation• ComparisonwithSparkSQL
Introduction
Why not use Hive or HBase?
• HBaseisaNoSQLdatabasethatrunsontopofHDFSthatprovidesreal-timeread/writeaccess.
• HiveisadatawarehousingtoolbuiltontopofHadoopandusesHiveQueryLanguage(HQL)forqueryingdatastoredinaHadoopcluster.
• HQLautomaticallytranslatesqueriesintoMapReducejobs.
• Hivedoesn’tsupporttransactions.
Impala
• GeneralpurposeSQLqueryengine:• Worksacrossanalyticalandtransactionalworkloads
• Highperformance:• ExecutionenginewritteninC++• RunsdirectlywithinHadoop• DoesnotuseMapReduce
• MPPdatabasesupport:• Multi-userworkloads
Creating tables
CREATETABLET(...)PARTITIONEDBY(dayint,month
int)LOCATION'<hdfs-path>'STOREDASPARQUET;
Forapartitionedtable,dataisplacedinsubdirectorieswhosepathsreflectthepartitioncolumns'values.Forexample,forday17,month2oftableT,alldatafileswouldbelocatedin
<root>/day=17/month=2/
Metadata
• Tablemetadataincludingthetabledefinition,columnnames,datatypes,schemaetc.arestoredinHCatalog.
INSERT / UPDATE / DELETE
• Theusercanadddatatoatablesimplybycopying/movingdatafilesintothedirectory!
• DoesNOTsupportUPDATEandDELETE.• LimitationofHDFS,asitdoesnotsupportanin-placeupdate.• Recomputethevaluesandreplacethedatainthepartitions.
• COMPUTESTATS<table>afterinserts.• Thosestatisticswillsubsequentlybeusedduringqueryoptimization.
Architecture
I: Impala Daemon Impaladaemonserviceisduallyresponsiblefor:
1. Acceptingqueriesfromclientprocessesandorchestratingtheirexecutionacrossthecluster.Inthisroleit’scalledthequerycoordinator.
2. ExecutingindividualqueryfragmentsonbehalfofotherImpaladaemons.
• TheImpaladaemonsareinconstantcommunicationwiththestatestore,toconfirmwhichnodesarehealthyandcanacceptnewwork.
• Theyalsoreceivebroadcastmessagesfromthecatalogdaemonviathestatestore,tokeeptrackofmetadatachanges.
Catalog Statestore
ImpalaDaemon
... ...
II: Statestore Daemon
• Handlesclustermembershipinformation.• PeriodicallysendstwokindsofmessagestoImpaladaemons:
• Topicupdate:Thenewchangesmadesincethelasttopicupdatemessage• Keepalive:Aheartbeatmechanism
• IfanImpaladaemongoesoffline,thestatestoreinformsalltheotherImpaladaemonssothatfuturequeriescanavoidmakingrequeststotheunreachablenode.
III: Catalog Daemon
• Impala'scatalogserviceservescatalogmetadatatoImpaladaemonsviathestatestorebroadcastmechanism,andexecutesDDLoperationsonbehalfofImpaladaemons.
• ThecatalogservicepullsinformationfromHiveMetastoreandaggregatesthatinformationintoanImpala-compatiblecatalogstructure.
• ThisstructureisthenpassedontothestatestoredaemonwhichcommunicateswiththeImpaladaemons.
1.RequestarrivesfromclientviaThriftAPI
SQLApp
ODBCSQL
request
ImpalaDaemon ImpalaDaemon ImpalaDaemon
HiveMetastore HDFSNN Statestore
SQLApp
ODBC
HiveMetastore HDFSNN Statestore
2.Plannerturnsrequestintocollectionsofplanfragments.CoordinatorinitiatesexecutiononremoteImpaladaemons.
3.IntermediateresultsarestreamedbetweenImpaladaemons.Queryresultsarestreamedbacktoclient.
SQLApp
ODBC
QueryExecutorHDFSDN HBase
QueryPlanner
QueryCoordinator
QueryResults
HiveMetastore HDFSNN Statestore
Front-End
Query Plans
• TheImpalafrontendisresponsibleforcompilingSQLtextintoqueryplansexecutablebytheImpalabackends.
• Thequerycompilationprocessproceedsasfollows:• Queryparsing• Semanticanalysis• Queryplanning/optimization
• Queryplanning1. Singlenodeplanning2. Planparallelizationandfragmentation
Query Planning: Single Node
• Inthefirstphase,theparsetreeistranslatedintoanon-executablesingle-nodeplantree.
E.g.QueryjoiningtwoHDFStables(t1,t2)andoneHBasetable(t3)followedbyanaggregationandorderbywithlimit(top-n).
HashJoin
Scan: t1
Scan: t3
Scan: t2
HashJoin
Agg SELECTt1.custid,SUM(t2.revenue)ASrevenueFROMLargeHdfsTablet1JOINLargeHdfsTablet2ON(t1.id1=t2.id)JOINSmallHbaseTablet3ON(t1.id2=t3.id)WHEREt3.category='Online'GROUPBYt1.custidORDERBYrevenueDESCLIMIT10;
• Thesecondplanningphasetakesthesingle-nodeplanasinputandproducesadistributedexecutionplan.Goal:
• Tominimizedatamovement• Maximizescanlocalityasremotereadsareconsiderablyslowerthanlocalones.
• Cost--baseddecisionbasedoncolumnstats/estimatedcostofdatatransfers
• Decideparalleljoinstrategy:• BroadcastJoin:Joiniscollocatedwithleft-handsideinput;right--handsidetableisbroadcasttoeachnodeexecutingjoin.Preferredforsmallright-handsideinput.
• PartitionedJoin:Bothtablesarehash-partitionedonjoincolumns.Preferredforlargejoins.
Query Planning: Distributed Nodes
Back-End
Executing the Query
• Impala'sbackendreceivesqueryfragmentsfromthefront-endandisresponsiblefortheirexecution.
• Highperformance:• WritteninC++forminimalexecutionoverhead• Internalin-memorytupleformatputsfixed-widthdataatfixedoffsets• Usesintrinsic/specialCPUinstructionsfortasksliketextparsingandCRCcomputation.
• Runtimecodegenerationfor“bigloops”
Runtime Code Generation
Impalausesruntimecodegenerationtoproducequery-specificversionsoffunctionsthatarecriticaltoperformance.• Forexample,toconverteveryrecordtoImpala’sin-memorytupleformat:
• Knownatquerycompiletime:#oftuplesinabatch,tuplelayout,columntypes,etc.
• Generateatcompiletime:unrolledloopthatinlinesallfunctioncalls,deadcodeeliminationandminimizesbranches.
• CodegeneratedusingLLVM
Evaluation
Comparisonofqueryresponsetimesonsingle-userruns.
Comparisonofqueryresponsetimesandthroughputonmulti-userruns.
ComparisonoftheperformanceofImpalaandacommercialanalyticRDBMS.https://github.com/cloudera/impala-tpcds-kit
Comparison with Spark SQL
• ImpalaisfasterthanSparkSQLasitisanenginedesignedespeciallyforthemissionofinteractiveSQLoverHDFS,andithasarchitectureconceptsthathelpsitachievethat.
• ForexampletheImpala‘always-on’daemonsareupandwaitingforqueries24/7 — somethingthatisnotpartofSparkSQL.
Thank you!