Query Processing

36
Query Processing Select avg(R.a), R.b From R, S Where R.c = S.a.m() and R.c > 5 Group by R.b; Parser Query Rewrite Query block graph Query block graph Type checker Optimizer Query block plan graph C o m p i l e - t i m e Query block graph Semantic Checker Parse tree

description

Query Processing. Parser. Select avg(R.a), R.b From R, S Where R.c = S.a.m() and R.c > 5 Group by R.b;. Parse tree. Semantic Checker. Query block graph. Compile-time. Type checker. Query block graph. Query Rewrite. Query block graph. Optimizer. Query block plan graph. - PowerPoint PPT Presentation

Transcript of Query Processing

Query Processing

Select avg(R.a), R.bFrom R, SWhere R.c = S.a.m() and R.c > 5Group by R.b;

Parser

Query Rewrite

Query block graph

Query block graph

Type checker

Optimizer

Query block plan graph

Com

pile-time

Query block graph

Semantic Checker

Parse tree

Query Processing

Select avg(R.a), R.bFrom R, ViewWhere R.c = View.a.m() and R.c > 5Group by R.b; Execution Engine

Execution plan

45, “t1”76, “t2”10, “t3”

Run-tim

e

Parse Tree

RelViewClass:from

selectionprojectiongroup byhaving

order by

Query Block Graph

SPJ Proj: R.a, R.b

SPJAccess to R

Sel: R.c = View.a.m() and R.c > 5

GBYProj: Avg(R.a), R.b

Gby: R.b

Agg: Avg(R.a)

Before Type Checking

Proj: S.a

Access to S

Query Block Graph

SPJ Proj: 0(1), 0(2)

Access to R

Sel: 0(3) = 1(1).m() and 0(3) > 5

GBYProj: Avg(0(1)), 0(2)

Gby: 0(2)

Agg: Avg(0(1))

After type checking

SPJ Proj: 0(1)

Access to S

Query Block Graph

SPJ Proj: 0(1), 0(2)

Access to R

Sel: 0(3) = 1(1).m() and 0(3) > 5

GBYProj: Avg(0(1)), 0(2)

Gby: 0(2)

Agg: Avg(0(1))

After query rewrite

Access to S

Execution Plan

Proj: 0(1), 0(2)

Sel: 0(3) = 1(1).m() and 0(3) > 5

Proj: Avg(0(1)), 0(2)

Gby: 0(2)

Agg: Avg(0(1))GbyPlanOp

SMJoinPlanOp

Access to R(indexed)

Access to S(sequential)

Executor

Stored Relation SStored Relation R

Query Processing modules

• Types - values - records – schema

• Expressions

• Relation• Storage manager• Catalog Manager• Indexes• Parser• Optimizer• Execution Engine

Independent of relational queryprocessing

Types and Records

Date (Type)

Specific Date Type

Date Value

M/D/Cye (Meta-Info)

ValuesArray:[…, Date Value, …]

Record offset

ADT arrayRecordSchema: […, Att i, …]

Abstract Data Types (fields)

• Identifier: index in ADT array• Type Name• MethArray: array of scalar methods• AggrMethod: array of aggregate methods• MaintainsCatalog: is there meta-

information to be stored in the catalog?• StoreInField: is the value stored in place in

a record?

Abstract Data Types (some methods)

• MaxObjectSize()• TypeCopy()• Equals()• ReadText(), WriteText()• GetMetaInfo()• CreateStatsInfo()• FuncTypeCheck()• FuncOptimize()• FuncReorganize()• GetMethByIndex()

• CastCheck()• CreateEnv()

ADT Values

• Data & Behavior– Using programming language objects

• Values are instances of type classes• Need for serialization mechanism to translate from in-memory

to on-disk data representation

– Using a specific mechanism (Predator approach)• Predator ADT Values are not instances of ADT classes• Data representation is similar in-memory and on-disk• Type information is more than behavior and storage

management:– Optimization– Catalog Management

ADT Values

• Header:– 4 bytes of flag: is null, little indian/big indian, …

• Value• Padding:

– For aligment purpose (value length must be a multiple of 8 bytes)

header value padding

Methods

• To register a function with an ADT– XxxFuncMethodInfo

• Fields: ArgInfo, ArgTypes• Methods: Constructor, Matches

– XxxAggrMethodInfo

• To represent a function in the parse tree:– XxxFuncParseInfo

• To represent a function in the execution plan:– XxxFuncPlanInfo

• Evaluate(XxxValueEnv *Env, XxxFuncMethodInfo*ThisMethInfo, XxxADTValue *ReturnValue)

Record and Record Schema

• Record– GetField(int position,

RecordSchema* Schema, char*& Field)

• Record Schema– GetAttribute

• Name• Type• Meta-information

– GetOffset• In record structure

Expressions

Unknown Value (fields)

• Name of the attribute

• Index – Source: child block– Attribute Index: position in child block

• Correlation Height

Unknown Value (some methods)

• Resolve Variables / Update Unknowns– Manipulate index for source and attribute

• Get Dependencies / Redirect Dependencies– Initialize dependency bitmap structure

• Match– Checks whether function expression matches a given

expression -- matching information is returned

• Evaluate– Extracts the ADT Value from position AttributeIndex

in SourceIndex child record

Expression & Plan

• Function Expression– FuncParseInfo– Owner, Arg– Update Unknowns, GetDependencies, Match– Optimize

• Function Plan– FuncPlanInfo– OwnerPlan, ArgPlan– evalOwner, evalArgs– Evaluate

Relational Query Processing

• Relational ADT– No ADT methods defined

• Data Engine– Relation– Storage Manager– Catalog Manager

• Query Processing Engine– Parse– Optimize– Execute

Relation

• Relation Type– Stored: Shore File Relation

• matchIndex (Expression, MatchedInfo)• chooseAccessPath• Add to / delete from index

– Derived

• Access– Indexed access– Sequential access:

• Init cursor, next item, close cursor• Delete record, insert record, update record

Relation

• Relation subclassed as ShoreFileRelation, DerivedRelation, …

• RelImplInfo– IndexImplInfo– IndexList– Stored / derived

• RelStatsInfo– Cardinality, average tuple size

• RelCatalogInfo– Relation name– Record schema

Storage Manager

• Storage Structures– Create, mount, delete

device– Create, delete file– Insert, update, delete object– Iterator, get, pin object

• Transaction support– Begin, commit, abort

• Indexes– Btree, Rtree– Clustered / Unclustered

• Sorting– Sort File

• Problem: how to pass expression used for sorting

• Threads– Thread model

• Preemptive scheduling

• Non preemptive scheduling

– Synchronization primitives

Catalog Manager

• Catalog relations– _STABLES(tablename, arity)

– _SATTRS(tablename, attname, attindex, atttype, key?, attmetainfo)

– _SINDXS(tablename, indexname, indexexpression)

– _SSTATS

– _SATSTATS

• One catalog per storage device– GetCatalogRel: Bootstrapping problem

Indexes• Index Class (superclass of ShoreBTree and

ShoreRTree)– Type Id– Index range: get range(given an Expression), merge ranges– Create / delete index– Insert, delete, get entry– Match (specific to Index class)

• IndexImplInfo associated to a relation– Index Type– Index expression– Match

SQL Parser

• Flex (tokenizer)/ Bison (grammar)• Interaction with:

– Expressions– Types– Data Engine

• Insert, Update, Delete Record into Relation• Create, Delete Index• Create, Delete Relation

– Query Engine• Store View• Exec Query

• Generates Parse Tree

Semantic Checker

• Creates Query Graph– SPJ block– One block per relation in the From clause – Views are developed– Aggregate block is added– If needed a SPJ block is added at the root

• Verifies conditions on SQL input– Targets with similar names, aggregates in where clause,

grouping without aggregates, same expression in aggregate and to group on, …

Type Checker

• Traverses the Query Block graph– Bottom-up then top down

• Relies on – Query graph structure– Methods defined for Unknown Variables

• Resolve Variables

Query Rewrite Rules

• Rule Engine– Vector list of rules– Execute rules on one downward and one upward pass

• Rules– Manipulation across query blocks

• Pushing projections, selections• Merging query blocks• Eliminating distinct clauses

– Each rule is a class that implements the following method• ApplyRule(RelQueryNode *In, XxxBool &Success,

RelQueryNode *Out)

Optimizer

• Predicates– Array of predicate plans, predicate

dependencies and predicate selectivities– Init, Selection, Join, Residual bitmaps

• Query Block Plan– Redirect dependencies

Optimizers

• Simple– Naive

• Join order fixed by order in the from clause. Generates a single N-way SPJ node.

– Greedy• Join order based on

cardinality of intermediate relations. Generates a left-deep pipeline of two-way joins.

• Cost based– Simplified KBZ

• Tries each relation as outer most relation and compares cost. Generates a left-deep pipeline of two-way joins.

– Dynamic Programming• System R like

enumeration of join space and pruning. Generates a left-deep pipeline of two-way joins.

SPJ Naive Optimization phases

• Step1: – Generate plan for children blocks

• Step2: – Create the predicate bitmap for the selections and joins

• Step3: – Construct a remapping of unknown variables depending on

schema of children

• Step4: – Modify all expressions based on remapping

• Step5: – Generate plan operator for SPJ

Relational Operators

• Iterator interface• Shared data structure (handles) for passing

arguments– State information: e.g., end-of-stream– Operator specific information: cursor position (nested

loop)

• Single records flowing across operators• Access Method is chosen dynamically for each

accesses

Execution

• Executor: wrapper on top of execution plan– creates a derived relation

– Initializes derived relation (recursively initializes execution plan)

– Iterates over records

– Process resulting records• Write to client

– Close iterator

– Clean-up

RequestThread

Server Architecture

MonitorThread

ServerThread

Client

Console

InitThread

RequestThreadRequest

ThreadRequestThread

Request Thread: Client interaction Relies on Protocol - text - binary

Summary

• Predator achieves extensibility by isolating these modules which are independent from the rest of the system – Types and Expressions are used throughout the

system and are prone to changes

• Predator reuses the clean internal data structures defined in Starbust