Tecnologie DB2 LUW per distribuzione dati - final - DISCo

53
© 2011 IBM Corporation Tecnologie DB2 LUW per la distribuzione dei dati Una panoramica Francesco Airoldi Executive Architect eTS Team - IBM Italia [email protected] Michele Benedetti Senior IT Specialist Software Group - IBM Italia [email protected] Mariangela Fumagalli Senior IT Specialist Software Group - IBM Italia [email protected]

Transcript of Tecnologie DB2 LUW per distribuzione dati - final - DISCo

Page 1: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation

Tecnologie DB2 LUW per la distribuzione dei dati

Una panoramica

Francesco Airoldi

Executive Architect

eTS Team - IBM Italia

[email protected]

Michele Benedetti

Senior IT Specialist

Software Group - IBM Italia

[email protected]

Mariangela Fumagalli

Senior IT Specialist

Software Group - IBM Italia

[email protected]

Page 2: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation22 DB2 LUW overview

Agenda

Introduction

Distributed Access

Data Federation

Data Replication

Page 3: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation33 DB2 LUW overview

Agenda

Introduction

Distributed Access

Data Federation

Data Replication

Page 4: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation4

Distributed data: summary

4 DB2 LUW overview

Appl

DBMS

DB

Basic (single db)

connect

Appl

DBMS

DB

DBMS

DB

connect

Fed Srv

Federation

Appl

DBMS

DB

Appl

DBMS

DB

ReplSrv

Replication

Appl

Appl

DBMS

DB

EPSrv

Event Publishing

Appl

DBMS

DB

Appl

DBMS

DB

ETL srv

DW

DBMS

Extract Trasform & Load

Appl

DBMS

DB

DBMS

DB

connect connect

Distributed Access

DA

TA

MO

VE

: N

OD

AT

A M

OV

E: Y

ES

Page 5: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation5

Basic (single db)

Appl

DBMS

DB

connect

DBMS engine based – SQL is used

• Insert (Appl � DB)

• Select (DB � Appl)

Utilities – DBMS engine bypassed – SQL not used

DBMS

DB

Load Unload

• Load (external data � DB)

• Unload (DB � external data)

For large and very large data volumes

Logging may be disabled

DB objects (e.g. tables) may be locked while the utility runs

Various external data format supported

Special techniques to handle anomalous data (e.g. duplicate keys)

Logging yes, concurrency control yes

Page 6: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation6

Distributed Access

Appl

DBMS

DB

DBMS

DB

connect connectBased on the DRDA standard

DRDA = Distributed Relational Database Architecture• Proposed by IBM, now adopted as a database

interoperability standard from The Open Group

• Implemented in all IBM products belonging to the DB2 Family, and by several non-IBM products

• SQL based

Key points:• How many SQL statements per Unit of Work (UOW)?

• How many databases per UOW?

• How many databases referenced in a single SQL statement?

• Read-only access or not?

• DBMS belonging to the same family (homogeneous) or not (heterogeneous scenarios)?

Page 7: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation7

Federation

Extension of the Distributed Access model to heterogeneous environments

Applications connect to a single “virtual database”

Can be extended to allow access to non-relational data sources

Key points:• Query optimization

• Differences in DBMS (SQL dialects, data types, semantics…)

• Two-phase commit required?

• How to handle the non-relational data sources?

• Performances

Appl

DBMS

DB

DBMS

DB

connect

Fed Srv

Page 8: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation8

Replication

Appl

DBMS

DB

Appl

DBMS

DB

ReplSrv

Replication

Data physically copied from a source system to a target system

Often between heterogeneous DBMS

Many topological variants:• One-to-one• One-to-many• Many-to-one• Fan-out

One-way or bidirectional

Key points:• Total replica (bulk) or delta replicas (change capture)

• Batch (e.g. once a day) or real-time (continuous replication)

• Table-based or transaction-based

• Change capture: triggers or log-based

• Performance impact on running applications

• Conflict detection and resolution (bidirectional, many-to-one)

• Data replicated as-is or transformed “in flight”

• Transport mechanism between source and target systems

Page 9: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation9

Event Publishing

Appl

Appl

DBMS

DB

EPSrv

Event Publishing is a variant of Replication

Data that change in a DBMS when certain “events” occur are sent to external applications

• New rows inserted

• Existing rows deleted

• Existing rows updated

• …..

Key points:• Data format (e.g. xml)

• Event published in real time or not

• May be a component of EAI (Enterprise Application Integration) scenarios (e.g. when the “target application” is an ESB system)

Page 10: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation10

Extract Transform & Load (ETL)

Appl

DBMS

DB

Appl

DBMS

DB

ETL srv

DW

DBMS

Theoretically, ETL is an extension of the Replication model

In practice, ETL is the key technology for feeding data into a Data Warehouse system:

• Extract data from operational data sources

• Transform data in a format suitable to be used in the DW environment

• Load data into the DW dbms

Occasionally, some variants may be used:• ELT (Extract – Load – Transform)

• TEL (Transform – Extract – Load)

Key points:• Availability of connectors for different data sources (also non-relational)

• Data volumes to be handled per unit of time (performances)

• Batch or near real-time

• Complexity of data transformation required

• Further data transformation within the DW are possible

• Data quality

• System management and overall governance

Page 11: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation1111 DB2 LUW overview

Agenda

Introduction

Distributed Access

Data Federation

Data Replication

Page 12: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation12

DRDA

Distributed Relational Database Architecture (DRDA) is a database interoperability standard from The Open Group.DRDA describes the architecture for distributed data. It defines the rules for accessing the

distributed data, but it does not provide the actual application programming interfaces (APIs) to

perform the access. It was first used in DB2 2.3.http://en.wikipedia.org/wiki/DRDA

High level architecture

AR

Appl

AS

DS

DS

SQL

connectApplication Support Protocol

Database Support Protocol

Database Support Protocol

Application Requester (AR)

The AR accepts SQL requests from an application and sends them to the appropriate application servers for processing. Using this function, application programs can access remote data.

Application Server (AS)The AS receives requests from application requesters and processes them. The AS acts upon the portions that can be processed and forwards the remainder to database servers for subsequent processing. The AR and the AS communicate through a protocol called the Application Support Protocol which handles data representation conversion.

Database Server (DS)The DS receives requests from AS or other DS servers. The DS supports distributed requests and will forward parts of the request to collaborating DS in order to fulfill the request. The AS and the DS among themselves communicate through a protocol called the Database Support Protocol.

Page 13: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation13

DRDA…Implementation levels and capabilities

DRDA Level 0: Remote Request

AR

Appl

AS

SQL

connect

• One DBMS

• One Unit Of Work (UOW)

• One SQL request

SQL example

connect to REM_DB

insert into REM_TAB1...

commit

DB2client

Appl

connect

DB2 LUWserver

REM_DB

Implementation example

DRDA Level 1: Remote Unit Of Work (RUOW)

AR

Appl

AS

SQL

connect

• One DBMS

• One Unit Of Work (UOW)

• Multiple SQL requests

Switch from one to another dbms is possible, but you need to close the UOW and disconnect from the first dbms, then connect to the second dbms and open another UOW

SQL example

connect to REM_DB

insert into REM_TAB1...

update REM_TAB2…

select … from REM_TAB3……..

commit

Page 14: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation14

DRDA…Implementation levels and capabilities…

DRDA Level 2: Distributed Unit Of Work (DUOW)

AR

Appl

AS

SQL

connect

• Multiple DBMS

• One Unit Of Work (UOW)• Two-Phase commit (2PC) required• Transaction Manager required

• Multiple SQL requests• Each SQL request limited to a single

DBMS

AS

connect

SQL example

connect to REM1_DBconnect to REM2_DB

select …. from REM1_TAB1

insert into REM2_TAB2

delete from REM1_TAB1

…..commit

Implementation example

• The 2PC functionality is compliant with the Open Group XA specification for distributed transaction processing, where the two basic roles are

• Transaction Manager TM)• Resource Manager (RM)

• The Transaction Manager role may be fulfilled by a DRDA AR, a DRDA AS, or even by an external component.

• The Resource Manager role may be fulfilled by a DRDA AS or a DRDA DS

http://en.wikipedia.org/wiki/X/Open_XA

DB2client

Appl

connect

DB2 LUWserver

REM1_DB

DB2 z/OSserver

REM2_DB

connect

Acts as Trx Mgr

Page 15: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation15

DRDA…Implementation levels and capabilities…

DRDA Level 3: Distributed Request (DR)

• Multiple DBMS

• One Unit Of Work (UOW)• Two-Phase commit (2PC) required• Transaction Manager required

• Multiple SQL requests• One SQL request may refer to objects managed by

different DBMS (e.g. distributed join)

Implementation example

AR

Appl

AS

DS

SQL

connect

SQL example

connect to FED_DBselect ....

from NICK_LOC,

NICK_REM...

update NICK_REM.....

insert into NICK_LOC….

commit

DB2client

Appl

connect

DB2 LUWserver

LOC_DB DB2 LUWserver

REM_DBActs as Federation Server and Trx Mgr

• The AR connects to an AS that owns local data and forwards part of the SQL request to a DS

• There is no explicit connection form the AR to the DS: the latter is connected “under the hood”by the AS

• The AR “sees” only the AS, which acts as “federator” over its local data and the data managed by the second DBMS (the DS)

• Database object are referenced through “nicknames”

• A natural evolution of this capability is the Data Federation scenario, where the AS does not own local data and the underlying DBMS may be heterogenous

Page 16: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation16

DRDA…

YYYYDB2 z/OS

Y(N)YYInfoSphere Federation

Server

YYYYDB2 LUW

YNNYDB2 Connect

NNNYDB2 Client

XA

Trx Mgr

DRDA

Database Server

DRDA

Appl Server

DRDA

Appl Req

Mapping DRDA roles on some IBM software productsAR

Appl

AS

DS

DS

Page 17: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation1717 DB2 LUW overview

Agenda

Introduction

Distributed Access

Data Federation

Data Replication

Page 18: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation18

Different Integration Techniques Meet Different Requirements

Product PerformanceReal-time

Inventory Level

Federation

Analytical &Reporting Tools

Region 1 Product Performance

Region 2 Product Performance

DataWarehouse

Consolidation

Federation Consolidation

Replication Event Publishing

Database

EAI Repl ETL RYO

Capture &Publish

Headquarters

Replication

Stores

Web Applications

PrimaryData Center

Replication

BackupData Center

Page 19: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation19

Federation- How does it work?

Product PerformanceReal-time

Inventory Level

Federation

Analytical &Reporting Tools

Federation

Web Applications

Region 1 Product Performance

Region 2 Product Performance

DataWarehouse

Consolidation

Consolidation

Replication Event Publishing

Database

EAI Repl ETL RYO

Capture &Publish

Headquarters

Replication

Stores

PrimaryData Center

Replication

BackupData Center

Page 20: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation20

• MQ UDF• Excel• Table-structured

files• Web services• OLE DB• Scripts• Custom-built

WebOther

SQL

InfoSphere Federation Server

• DB2 for iSeries• DB2 for z/OS• DB2 for LUW• Informix• Oracle• Sybase • Teradata • Microsoft SQL Server • ODBC• JDBC

Relationaldatabases

Re

ad

-W

rite R

ea

d o

nly

Federation Data Sources

Page 21: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation21

Data Source Client

FS Basic Concepts

Wrapper

ServerServer

Nic

knam

e

Nic

knam

e

Nic

knam

e

Federated server: a DB2

database enabled for

federation.

Wrapper: a library

allowing access to a

particular class of data

sources or protocols

(Net8, DRDA, CTLIB...).

Contains information

about data source

characteristics

Server: represents a

specific data source

Nickname: a local alias to

data on a remote server

(mapped to rows and

columns); appears as a

DB2 table

Federated Server

Stores information about:• Wrappers,servers,

nicknames

• Server attributes

• Nickname attributes

• Remote functions ServerN

icknam

e

Wrapper

ServerN

icknam

e

Wrapper

ServerN

icknam

e

Wrapper

ServerN

icknam

e

Wrapper

ServerN

icknam

eServer

Nic

knam

e

Wrapper

Server

Nic

knam

e

DB2 Catalog

Orders Customers

Data Source Client

Wrapper

Page 22: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation22

• Push Down Analysis (PDA) is a

component of query compilation

• PDA determines whether or not an

operation can be pushed down to

the data source

• Just because processing can be

pushed down, does not mean it

will be

• If an operation can be pushed down, the optimizer still has the

final say on whether or not the operation is pushed down.

Query Optimizer Flow for Federated Queries

Page 23: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation2323 DB2 LUW overview

Agenda

Introduction

Distributed Access

Data Federation

Data Replication

Page 24: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation24

Different Integration Techniques Meet Different Requirements

Product PerformanceReal-time

Inventory Level

Federation

Analytical &Reporting Tools

Region 1 Product Performance

Region 2 Product Performance

DataWarehouse

Consolidation

Federation Consolidation

Replication Event Publishing

Database

EAI Repl ETL RYO

Capture &Publish

Headquarters

Replication

Stores

Web Applications

PrimaryData Center

Replication

BackupData Center

Page 25: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation25

CD1SOURCE

TARGET TARGET TARGET

Data Distribution (1:many)

CD1SOURCE CD1SOURCE CD1SOURCE

TARGET

Data Consolidation (many:1)

CD1SOURCE

CD1STAGING CD1STAGING

TARGETTARGET

Multi-Tier Staging

TARGETTARGET

CD1SOURCE

Peer-to-Peer

CD1PRIMARY

Bi-directional

SECONDARY

CD1SOURCE CD1SOURCE

Conflic

t D

ete

ction/R

esolu

tio

n

Many Topologies Possible

Page 26: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation26

Changed Data Replication

• Applications make changes to a database (the source)

• Changes are then:• Read, ‘captured’, from the database log

• Copied to other systems (the targets)

• ‘Applied’ to tables

• Diagram shows one-way, or unidirectional, replication

SourceSOURCE2

SOURCE1

Log

Capture

Target

Apply

TARGET 1

TARGET 2

TARGET …

TARGET N

• Subsets• Transformations• History Tables

Page 27: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation27

Q Capture Process Flow

TX1: INSERT S1

TX2: INSERT S2

TX3: ROLLBACK

TX1: COMMIT

TX1: UPDATE S1

TX3: DELETE S1

DB2 Log

Q-SUBS

Q-PUBS

SOURCE2

SOURCE1

TX1: INSERT S1

TX1: COMMIT

TX1: UPDATE S1

CAPTURE

In-Memory-Transactions

Transaction is still „in-flight“

Nothing inserted yet. „Zapped“ at Abort

Never makes it to send queue

TX3: DELETE S1

TX3: ROLLBACK TX2: INSERT S2

Restart

Queue

MQ Put when Commit

record is found

Send Queue

Page 28: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation28

TGT3

TARGET

TGT1

Q Apply

Browser

Apply Agent

Apply Agent

Apply Agent

TGT2

METADATA

SOURCE

SOURCE2

SOURCE1

METADATA

DB2 Log

Q Replication - The BIG Picture

Q

Capture

• Subsets• Transformations• History Tables

• Applications make changes to a database source

• Changes are then:• Read, ‘captured’, from the database log• Copied to other systems (the targets)• ‘Applied’ to tables

ADMINISTRATION

Replication

MonitorReplication

Center

Page 29: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation29

Source

SOURCE2

SOURCE1

DB Log

Capture

• Conceptually, data replication without the apply

• Change data is made available to consuming applications• Examples, InfoSphere DataStage or a message broker

• One common delivery mechanism is WebSphere MQ

Target

InfoSphere

DataStage

SOA/User

Application

User

Application

WBI Event

Broker

TARGET

TARGET

TARGET

Data Event Publishing

Page 30: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation30

Essential links

http://www-01.ibm.com/software/data/db2/linux-unix-windows/

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp

Distributed Access

DB2 for Linux Unix and Windows home page

DB2 for LUW and DB2 Connect 9.7 InfoCenter (search for DRDA)

Data Federation and Data Replication

http://www-01.ibm.com/software/data/infosphere/federation-server/

http://www-01.ibm.com/software/data/infosphere/replication-server/

http://www-01.ibm.com/software/data/infosphere/data-event-publisher/

InfoSphere Federation Server home page

InfoSphere Replication Server home page

InfoSphere Event Publisher home page

http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r7/index.jsp

InfoSphere Information Server V8.7 InfoCenterOverview > Introduction to InfoSphere Information Server > Companion products

Page 31: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation3131 DB2 LUW overview

Questions ?

Page 32: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation3232 DB2 LUW overview

GrazieGrazie

Page 33: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation33

Additional slides on Data Federation

InfoSphere Federation Server

• Customer Requirements and Scenarios

• How it works

• Performance considerations

InfoSphere Federation Server

Page 34: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation34

Problems Delivering Data to the End User

• Multiple sources for the same entity

• Heterogeneous data sources:

DB2, Oracle, Microsoft SQL Server, XML files, spreadsheets, etc.

• Employees spend significant amount of their time (70%) searching for information

• Lack of an integrated view of information

• Time-consuming and costly aggregation

Page 35: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation35

Transparent• Appears to be one source

• Independent of how and where data is stored

• Applications continue to work despite of any change in how data is stored

Heterogeneous• Accesses data from diverse sources: relational,

structured, messages…

Extensible • Bring together almost any data source

• Wrapper Development Toolkit

High Function• Full query support against all data

• Capabilities of sources as well

Autonomous• Non-disruptive to data sources, existing applications,

systems

High Performance• Optimization of distributed queries

InfoSphere Federation Server

Access and integrate heterogeneous information across multiple sources

as if they were a single source

Extend value of existing analytical applications by providing real-time

access to integrated information

InfoSphere Federation Server

Web Services

Excel SQL Server

….Oracle SQL Server

Page 36: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation36

Customer Challenge:

� Providing a holistic view of information for

customer-facing or customer supporting

applications

� High development and maintenance costs to

access diverse data sources

� Maximizing value of customer data for

customer satisfaction/retention and increasing

sales

Customer value:

� Reduce coding and skills requirements when

integrating two or more sources

� Reduce redundant data by consolidating only

frequently accessed data

� Reduce application maintenance costs

� Extend customer data with document and other

content data

Application

Developer

Application

RDBMS

Non-relationaldata

Non-traditional data

Development effort to handle:

�Unique interfaces for

each data type

�Joining data from

varied sources

�Transformations

�Correlating data

InfoSphere Federation

Server

Customer-Data Integration

Page 37: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation37

Access to Regionally Distributed Data

Requirements

� Several regional databases with similar logical

data models, but unique data

� Application needs to see the data as one large

database with a single schema

� Impractical to physically consolidate data

Solution

� Access relevant remote tables via FS nicknames

� Connect matching nicknames from different

sources via a UNION ALL view

� Can optionally cache common data at the FS or

create local aggregates

Client

InfoSphereFederation Seattle

Phoenix

San Jose

Linux

ORACLE

Windows

SQL Server

Linux

ORACLE

InfoSphere

Federation Server

Linux

ORACLE

Linux

ORACLE

Linux

ORACLE

Windows

SQL Server

Linux

ORACLE

InfoSphere

Federation Server

Linux

ORACLE

Linux

ORACLE

Page 38: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation38

Speeding Portal Application Development

Customer Challenge:

� Integrating multiple data sources in a single

application is complex and costly

� Accessing non-traditional sources is too

impractical to leverage their benefit

� Time pressure to deploy new applications

� Scarcity of skills who can work with legacy,

non-traditional data sources

� Extending built-in search to new domains

Customer value:� Reduce amount of integration coding by 40-

65%

� Use existing SQL tools to access all data

� Give applications access to all the relevant data

sources

� Reduce application maintenance costs

� Deploy existing skills over wider range of

integration projects

Application

Developer

Legacy data

Non-relationaldata

Non-traditional dataDevelopment effort

to handle:

�Multiple portlets, one

for each source

�Unique interfaces for

each data type

�Joining data from

varied sources

�Transformations

�Correlating data

Portal

Application

InfoSphere Federation

Server local DB2 for "scratch" temp

tables

Federation Server

Oracle Excel/ODBC

DB2

Federated Application

Non-Federated Application

Connection to Federated server

Connection to all individual data sources

Page 39: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation39

Customer

Orders

1. Join rows from both sources 2. Sort them by cust_nation 3. Sum up total order price for each nation 4. Return result to application

What the application seesWhat FS does:

SELECT cust_nation, sum(o_totalprice)

FROM Customer, Orders

WHERE c_custkey = o_custkey

and o_orderstatus = 'OPEN‘

and c_mktsegment = 'BUILDING’

GROUP BY cust_nation

SELECT o_custkey

FROM Orders

WHERE o_orderstatus = 'OPEN‘

SELECT c_custkey, cust_nation

FROM Customer

WHERE c_mktsegment = 'BUILDING’

Example: SELECT

Page 40: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation40

SERVER

Physical Properties:Federated system configuration

Query Properties:Optimization class, data distribution,

operators used, query type,

cost models, FIRST N ROWS ?

Statistics:

•Table Statistics•Column statistics•Index statistics

Non-Relational WrapperWrapper Plans

Cost Models

•Characteristics

•Cpu/io ratio,

•Commrate

•Capabilities

•Type/version

Federated Cost-Based Query Optimization

Page 41: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation41

Federated Database System (Global) Catalog

• Contains information both about local objects and remote objects

• Global because it contains information about all the objects in the

federated database

• Table information is found in the following SYSCAT tables

• SYSCAT.TABLES• SYSCAT.NICKNAMES

• SYSCAT.TABOPTIONS

• SYSCAT.INDEXES• SYSCAT.INDEXOPTIONS

• SYSCAT.COLUMNS• SYSCAT.COLOPTIONS

• Global catalog also contains other information about remote sources

including: Connection, authorizations, etc.

Page 42: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation42

Global Catalog information

• Data type mappings describe the relationship between the data source data type and the FS data type• Can override defaults by altering local nickname column types if appropriate

• Function mappings tell FS that a remote function is semantically equivalent to a local function (need: compatible arguments + types)• Increases the opportunity for pushing down the function to the data source• Without a valid mapping, data has to be retrieved and function applied at FS• Can also tell FS about remote functions that have no local equivalent using function

templates

• Statistics are used by DBMS’s to describe the logical and physical structure of the data• Helps the optimizer generate optimal access strategies• FS retrieves statistics from remote-source catalog and populates DB2 catalog at

CREATE NICKNAME time

• There are no actual "local indexes" on nicknames. Information on remote indexes is kept in the FS catalog

• Normally, information about remote indexes is picked up during nickname creation (including index specification and statistics)

Page 43: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation43

Actual pushdown is cost-based

• Just because processing can be pushed down doesn't mean it will be. • Decision influenced by estimates of rows processed/returned.

• Consider a join of two nicknames ORA.T1 and ORA.T2 on a single remote source that is "nearly" a Cartesian product. • May be better to do the join at the InfoSphere Federation Server to avoid retrieval of

many rows.

• Retrieving (10,000 + 25) rows to do a local join is probably faster than retrieving (10,000 * 25) = 250,000 row remote join result

SELECT .... from ORA.T1, ORA.T2 where T1.a = T2.b

ORA.T1 ORA.T2

25 rows 10,000 rows

Single remote Oracle source

Page 44: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation44

'Pushdown' of Query Operations

• FS decides whether some or all parts of a query can be "pushed-down", i.e. processed at the remote data source(s). Pushdown-ability depends on• Availability of needed functionality at remote source• Server options (example: is collating sequence at FS and remote source the

same?) • Typically faster than processing the query at FS because of less data movement

from the data source to FS

• Example: A remote source that can handle an equality predicate, but not count(*)....

SELECT count(*) FROM t1 WHERE col = 27 SELECT count(*) FROM...

SELECT '1' FROM t1

Federation Server

Application

non-DB2 data

Compensation

Page 45: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation45

Sort Order

• Varies in some cases for different collating sequences• Data consists of combinations of letters and numeric characters

• Data contains both uppercase and lowercase letters

• Data contains special characters, e.g. #

• Affects how data is sorted in a query with an ORDER BY

• Affects how character comparisons are made• E.g., SELECT … WHERE Column3 > ‘Aa3@’

• Two data source of the same type (wrapper) can use different collating sequences

• E.g., in DB2 the collating sequence is specified when the database is

created

• Different databases can use difference collating sequences

• The collating sequence of the data source can be specified when the

Server is defined

Page 46: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation46

Server options: Collating Sequence Differences

�EBCDIC Sequence

... ab yz ... AB YZ ... 0 9 ...

�ASCII Sequence

... 0 9 ... AB YZ ... ab yz ...

�LEXICAL Sequence

... 0 9 ... AaBb YyZz ...

Page 47: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation47

Server options: Collating Sequence Differences

� ORDER BY COLM2– Different order

EBCDIC

COLM2

V1G

Y2W

7AB

ASCII/LEXICAL

COLM2

7AB

V1G

Y2W

� WHERE COLM2 > ‘TT3’– Different results

EBCDIC

COLM2

TW4

X72

39G

ASCII/LEXICAL

COLM2

TW4

X72

Page 48: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation48

...WHERE NAME = ‘MARIANGELA'

Assume that the data source column contains: ‘MariAngela’

TRUE FALSE

Databases using theInsensitive Collate option

(an optional parameter for

MS.SQL Server, Sybaseand Informix)

Databases not using theInsensitive Collate option

Server Options: Case Insensitive Collating Sequences

Page 49: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation49

Server Options: COLLATING_SEQUENCE option

COLLATING_SEQUENCE= 'Y'

• indicates that FS and the remote data source sort the same• all char sort and comparison operations can be pushed down

COLLATING_SEQUENCE = 'N' (Es. DB2/390)

• indicates that FS and the remote data source sort differently• char sort and most char comparison operations can not be pushed down

• only char = comparisons can be pushed down

COLLATING_SEQUENCE= 'I' (Es. SQL Server)

• indicates that the remote data source uses insensitive collating sequence• no char sort or char comparison operations can be pushed down

Set COLLATING_SEQUENCE as a Server Option or on a nicknameas the NUMERIC_STRINGS Column Option

You must inform Federation about Data Source collating

Page 50: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation50

• Column FIRSTNAME is VARCHAR(25)

• Actual contents are ‘MARYb’

DB2 (and all other major RDBMSs)

SELECT * ...

WHERE

FIRSTNAME = ‘MARY’

TESTS TRUE

Oracle

SELECT * ...

WHERE

FIRSTNAME = ‘MARY’

TESTS FALSE

Server Options: VARCHAR comparison semantics

• Forces COL1= ‘MARY' to be pushed down as

RTRIM(COL1) = ‘MARY'resulting in a relational scan by ORACLE

• Mitigated with VARCHAR_NO_TRAILING_BLANKS = 'Y'

--------> but know the data!

/

Page 51: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation51

More Federation Server Features

• Ability to define informational constraints over nicknames

• Ability to refer to and execute remote stored procedures for DB2, Oracle, Sybase, and MSSQL data sources

• Error Tolerant Nested Table Expression

UNION ALL

Remote 1 Remote 3Remote 2

Connection error

Remote 1+

Remote 3

Page 52: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation52

Client

SQL Server

Oracle

1)Connect

2)Withdraw

3)Commit

4)Connect

5)Deposit

6)Commit

CHECKING_ACCOUNT

SAVING_ACCOUNT

Money Transfer Example

WebSphere Federation Server w/ F2PC Update

Page 53: Tecnologie DB2 LUW per distribuzione dati - final - DISCo

© 2011 IBM Corporation53

Client

SQL Server

Oracle

2)Withdraw

1)Connect

3)Deposit

InfoSphere Federation

Server

CHECKING_ACCOUNT

SAVING_ACCOUNT

Money Transfer Example 4)PREPARE

4)PREPARE

5)COMMIT

5)COMMIT

I am the

TM_DATABASE

WebSphere Federation Server w/ F2PC Update