Click to add text © 2012 IBM Corporation 1 Streams – DataStage Integration InfoSphere Streams...

19
© 2012 IBM Corporation 1 Streams – DataStage Integration InfoSphere Streams Version 3.0 Mike Koranda Release Architect

Transcript of Click to add text © 2012 IBM Corporation 1 Streams – DataStage Integration InfoSphere Streams...

© 2012 IBM Corporation1

Streams – DataStage Integration

InfoSphere Streams Version 3.0

Mike KorandaRelease Architect

© 2012 IBM Corporation2

Agenda

What is InfoSphere Information Server and DataStage? Integration use cases Architecture of the integration solution Tooling

© 2012 IBM Corporation3

Transform Enterprise Business Processes & Applications with

Trusted InformationDeliver Trusted Information for Data Warehousing and

Business Analytics

Build and Manage a

Single View

Integrate & Govern Big

Data

Make Enterprise Applications more Efficient

Consolidate and Retire

Applications

Secure Enterprise Data & Ensure Compliance

Information Integration Vision

Address information integration in context of broad and changing environment

Simplify & accelerate: Design once and leverage anywhere

© 2012 IBM Corporation4

Traditional ApproachStructured, analytical, logical

New ApproachCreative, holistic thought, intuition

Internal App Data

Data Warehouse

Traditional Sources

StructuredRepeatable

Linear

Transaction Data

ERP data

Mainframe Data

OLTP System Data

HadoopStreams

New Sources

UnstructuredExploratory

Iterative

Web Logs

Social Data

Text & Images

Sensor Data

RFID

DataWarehouse

HadoopStreams

TraditionalSources

NewSources

IBM Comprehensive Vision

InformationIntegration &Governance

© 2012 IBM Corporation5

IBM InfoSphere DataStage

Industry Leading Data Integration for the EnterpriseSimple to design - Powerful to deploy

Rich capabilities spanning six critical dimensions

Developer Productivity Rich user interface features that simplify the design process and metadata management requirements

Transformation ComponentsExtensive set of pre-built objects that act on data to satisfy both simple & complex data integration tasks

Connectivity ObjectsNative access to common industry databases and applications exploiting key features of each

Runtime Scalability & Flexibility Performant engine providing unlimited scalability through all objects tasks in both batch and real-time

Operational ManagementSimple management of the operational environment lending analytics for understanding and investigation.

Enterprise Class AdministrationIntuitive and robust features for installation, maintenance, and configuration

© 2012 IBM Corporation6

Use Cases - Parallel real-time analytics

© 2012 IBM Corporation7

Use Cases - Streams feeding DataStage

© 2012 IBM Corporation8

Use Cases – Data Enrichment

© 2012 IBM Corporation9

Runtime Integration High Level View

DataStageDataStageDataStageDataStage StreamsStreamsStreamsStreams

JobJob JobJob

StreamsStreams ConnectorConnector

StreamsStreams ConnectorConnector

DSSource / DSSource / DSSink DSSink OperatorOperator

DSSource / DSSource / DSSink DSSink OperatorOperator

TCP/IPTCP/IP

Composite operators that wrap existing Composite operators that wrap existing TCPSource/TCPSink operatorsTCPSource/TCPSink operators

© 2012 IBM Corporation10

Streams Application (SPL)

use com.ibm.streams.etl.datastage.adapters::*;

composite SendStrings { type RecordSchema = rstring a, ustring b; graph stream<RecordSchema> Data = Beacon() { param iterations : 100u; initDelay:1.0; output Data : a="This is single byte chars"r, b="This is unicode"u; } () as Sink = DSSink(Data) { param name : "SendStrings"; } config applicationScope : "MyDataStage";}

• When the job starts, the DSSink/DSStage stage registers its name with the SWS nameserver

© 2012 IBM Corporation11

DataStage Job

User adds a Streams Connector and configures properties and columns

© 2012 IBM Corporation12

DataStage Streams Runtime Connector Uses nameserver lookup to establish connection (“name” + “application

scope”) via HTTPS/REST Uses TCPSource/TCPSink binary format Has initial handshaking to verify the metadata Supports runtime column propagation Connection retry (both initial & in process) Supports all Streams types Collection types (List, Set, Map) are represented as a single XML column Nested tuples are flattened Schema reconciliation options (unmatched columns, RCP, etc) Wave to punctuation mapping on input and output Null value mapping

© 2012 IBM Corporation13

Tooling Scenarios

User creates both DataStage job and Streams application from scratch – Create DataStage job in IBM Infosphere DataStage and QualityStage

Designer – Create Streams Application in Streams Studio

User wishes to add Streams analysis to existing DataStage jobs– From Streams Studio create Streams application from DataStage

Metadata User wishes to add DataStage processing to existing Streams application

– From Streams Studio create Endpoint Definition File and import into DataStage

© 2012 IBM Corporation14

Streams to DataStage Import

1. On Streams side, user runs ‘generate-ds-endpoint-defs’ command to generate an ‘Endpoint Definition File’ (EDF) from one or more ADL files

2. User transfers file to DataStage domain or client machine 3. User runs new Streams importer in IMAM to import EDF to StreamsEndPoint model4. Job Designer selects end point metadata from stage. The connection name and columns are

populated accordingly.

EDFEDFEDFEDF

Streams command line or Studio menu

Streams command line or Studio menu

ADLADLADLADLADLADLADLADL

EDFEDFEDFEDF IMAMIMAMXmetaXmeta

FTP

© 2012 IBM Corporation15

Stage Editor

© 2012 IBM Corporation16

Stage Editor

© 2012 IBM Corporation17

DataStage to Streams Import

SPLSPLSPLSPLREST APIREST APIXmetaXmeta

1. On Streams side, user runs ‘generate-ds-spl-code’ command to generate a template application that from a DataStage job definition

2. The command uses a Java API that uses REST to query DataStage jobs in the repository3. The tool provides commands to identify jobs that use the Streams Connector, and to extract the

connection name and column information4. The template job includes a DSSink or DSSource stage with tuples defined according to the DataStage

link definition

Java APIJava API

HTTP

Streams command line or Studio menu

Streams command line or Studio menu

© 2012 IBM Corporation18

DataStage to Streams Import

© 2012 IBM Corporation19

Availability

Streams Connector available in InfoSphere Information Server 9.1 The Streams components available in InfoSphere Streams Version

3.0 in the IBM InfoSphere DataStage Integration Toolkit