Impala - University of...

Impala AModern,OpenSourceSQLEngineforHadoop

YogeshChockalingam

Agenda

•  Introduction• Architecture•  FrontEnd• BackEnd•  Evaluation• ComparisonwithSparkSQL

Introduction

Why not use Hive or HBase?

• HBaseisaNoSQLdatabasethatrunsontopofHDFSthatprovidesreal-timeread/writeaccess.

• HiveisadatawarehousingtoolbuiltontopofHadoopandusesHiveQueryLanguage(HQL)forqueryingdatastoredinaHadoopcluster.

• HQLautomaticallytranslatesqueriesintoMapReducejobs.

• Hivedoesn’tsupporttransactions.

Impala

• GeneralpurposeSQLqueryengine:• Worksacrossanalyticalandtransactionalworkloads

• Highperformance:•  ExecutionenginewritteninC++• RunsdirectlywithinHadoop• DoesnotuseMapReduce

• MPPdatabasesupport:• Multi-userworkloads

Creating tables

CREATETABLET(...)PARTITIONEDBY(dayint,month

int)LOCATION'<hdfs-path>'STOREDASPARQUET;

Forapartitionedtable,dataisplacedinsubdirectorieswhosepathsreflectthepartitioncolumns'values.Forexample,forday17,month2oftableT,alldatafileswouldbelocatedin

<root>/day=17/month=2/

Metadata

•  Tablemetadataincludingthetabledefinition,columnnames,datatypes,schemaetc.arestoredinHCatalog.

INSERT / UPDATE / DELETE

•  Theusercanadddatatoatablesimplybycopying/movingdatafilesintothedirectory!

• DoesNOTsupportUPDATEandDELETE.•  LimitationofHDFS,asitdoesnotsupportanin-placeupdate.•  Recomputethevaluesandreplacethedatainthepartitions.

• COMPUTESTATS<table>afterinserts.•  Thosestatisticswillsubsequentlybeusedduringqueryoptimization.

Architecture

I: Impala Daemon Impaladaemonserviceisduallyresponsiblefor:

1.  Acceptingqueriesfromclientprocessesandorchestratingtheirexecutionacrossthecluster.Inthisroleit’scalledthequerycoordinator.

2.  ExecutingindividualqueryfragmentsonbehalfofotherImpaladaemons.

•  TheImpaladaemonsareinconstantcommunicationwiththestatestore,toconfirmwhichnodesarehealthyandcanacceptnewwork.

•  Theyalsoreceivebroadcastmessagesfromthecatalogdaemonviathestatestore,tokeeptrackofmetadatachanges.

Catalog Statestore

ImpalaDaemon

... ...

II: Statestore Daemon

• Handlesclustermembershipinformation.• PeriodicallysendstwokindsofmessagestoImpaladaemons:

•  Topicupdate:Thenewchangesmadesincethelasttopicupdatemessage•  Keepalive:Aheartbeatmechanism

•  IfanImpaladaemongoesoffline,thestatestoreinformsalltheotherImpaladaemonssothatfuturequeriescanavoidmakingrequeststotheunreachablenode.

III: Catalog Daemon

•  Impala'scatalogserviceservescatalogmetadatatoImpaladaemonsviathestatestorebroadcastmechanism,andexecutesDDLoperationsonbehalfofImpaladaemons.

•  ThecatalogservicepullsinformationfromHiveMetastoreandaggregatesthatinformationintoanImpala-compatiblecatalogstructure.

•  ThisstructureisthenpassedontothestatestoredaemonwhichcommunicateswiththeImpaladaemons.

1.RequestarrivesfromclientviaThriftAPI

SQLApp

ODBCSQL

request

ImpalaDaemon ImpalaDaemon ImpalaDaemon

HiveMetastore HDFSNN Statestore

SQLApp

ODBC


2.Plannerturnsrequestintocollectionsofplanfragments.CoordinatorinitiatesexecutiononremoteImpaladaemons.

3.IntermediateresultsarestreamedbetweenImpaladaemons.Queryresultsarestreamedbacktoclient.

SQLApp

ODBC

QueryExecutorHDFSDN HBase

QueryPlanner

QueryCoordinator

QueryResults


Front-End

Query Plans

•  TheImpalafrontendisresponsibleforcompilingSQLtextintoqueryplansexecutablebytheImpalabackends.

•  Thequerycompilationprocessproceedsasfollows:•  Queryparsing•  Semanticanalysis•  Queryplanning/optimization

• Queryplanning1.  Singlenodeplanning2.  Planparallelizationandfragmentation

Query Planning: Single Node

•  Inthefirstphase,theparsetreeistranslatedintoanon-executablesingle-nodeplantree.

E.g.QueryjoiningtwoHDFStables(t1,t2)andoneHBasetable(t3)followedbyanaggregationandorderbywithlimit(top-n).

HashJoin

Scan: t1

Scan: t3

Scan: t2

HashJoin

Agg SELECTt1.custid,SUM(t2.revenue)ASrevenueFROMLargeHdfsTablet1JOINLargeHdfsTablet2ON(t1.id1=t2.id)JOINSmallHbaseTablet3ON(t1.id2=t3.id)WHEREt3.category='Online'GROUPBYt1.custidORDERBYrevenueDESCLIMIT10;

•  Thesecondplanningphasetakesthesingle-nodeplanasinputandproducesadistributedexecutionplan.Goal:

•  Tominimizedatamovement•  Maximizescanlocalityasremotereadsareconsiderablyslowerthanlocalones.

•  Cost--baseddecisionbasedoncolumnstats/estimatedcostofdatatransfers

• Decideparalleljoinstrategy:•  BroadcastJoin:Joiniscollocatedwithleft-handsideinput;right--handsidetableisbroadcasttoeachnodeexecutingjoin.Preferredforsmallright-handsideinput.

•  PartitionedJoin:Bothtablesarehash-partitionedonjoincolumns.Preferredforlargejoins.

Query Planning: Distributed Nodes

Back-End

Executing the Query

•  Impala'sbackendreceivesqueryfragmentsfromthefront-endandisresponsiblefortheirexecution.

• Highperformance:• WritteninC++forminimalexecutionoverhead•  Internalin-memorytupleformatputsfixed-widthdataatfixedoffsets•  Usesintrinsic/specialCPUinstructionsfortasksliketextparsingandCRCcomputation.

•  Runtimecodegenerationfor“bigloops”

Runtime Code Generation

Impalausesruntimecodegenerationtoproducequery-specificversionsoffunctionsthatarecriticaltoperformance.•  Forexample,toconverteveryrecordtoImpala’sin-memorytupleformat:

•  Knownatquerycompiletime:#oftuplesinabatch,tuplelayout,columntypes,etc.

•  Generateatcompiletime:unrolledloopthatinlinesallfunctioncalls,deadcodeeliminationandminimizesbranches.

•  CodegeneratedusingLLVM

Evaluation

Comparisonofqueryresponsetimesonsingle-userruns.

Comparisonofqueryresponsetimesandthroughputonmulti-userruns.

ComparisonoftheperformanceofImpalaandacommercialanalyticRDBMS.https://github.com/cloudera/impala-tpcds-kit

Comparison with Spark SQL

•  ImpalaisfasterthanSparkSQLasitisanenginedesignedespeciallyforthemissionofinteractiveSQLoverHDFS,andithasarchitectureconceptsthathelpsitachievethat.

•  ForexampletheImpala‘always-on’daemonsareupandwaitingforqueries24/7 — somethingthatisnotpartofSparkSQL.

Thank you!

Impala - University of...

Documents

Transcript of Impala - University of...