ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks...

140
MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA 8.1.1 PENTAHO DATA INTEGRATOR 3.0.0 [email protected]

Transcript of ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks...

Page 1: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 1

ETL Benchmarks

Comparing

� DATASTAGE SERVER 7.5

� DATASTAGE PX 7.5

� TALEND OPEN STUDIO 2.4.1

� INFORMATICA 8.1.1

� PENTAHO DATA INTEGRATOR 3.0.0

[email protected]

Page 2: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 2

This document is published under the Creative Commons license:

http://creativecommons.org/licenses/by/3.0/us/

You are free:

to Share — to copy, distribute, display, and perform the work

to Remix — to make derivative works

Under the following conditions:

Attribution. You must attribute the work in the manner specified by the author or

licensor (but not in any way that suggests that they endorse you or your use of the

work).

� For any reuse or distribution, you must make clear to others the license terms of this work.

The best way to do this is with a link to this web page.

� Any of the above conditions can be waived if you get permission from the copyright holder.

� Apart from the remix rights granted under this license, nothing in this license impairs or

restricts the author's moral rights.

Page 3: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 3

Table of Contents

General comments .................................................................................................................................. 5

Hardware Configuration.......................................................................................................................... 5

Test 1: File Input Delimited > File Output Delimited............................................................................... 6

Scenario: .............................................................................................................................................. 6

Test results: ....................................................................................................................................... 12

Test 2: File Input Delimited > Table MySQL Output.............................................................................. 14

Scenario: ............................................................................................................................................ 14

Test results: ....................................................................................................................................... 17

Test 3: Table Oracle Input > File Output Delimited............................................................................... 19

Scenario: ............................................................................................................................................ 19

Test results: ....................................................................................................................................... 25

Test 4: File Input Delimited > Table Output Oracle BULK ..................................................................... 27

Scenario: ............................................................................................................................................ 27

Test results: ....................................................................................................................................... 33

Test 5: File Input Delimited > Transform > File Output Delimited ........................................................ 34

Scenario: ............................................................................................................................................ 34

Tests result: ....................................................................................................................................... 46

Test 6: Table Input Oracle > Aggregation > Table Output Oracle (ELT) ................................................ 48

Scenario: ............................................................................................................................................ 48

Test results: ....................................................................................................................................... 54

Test 7: Tables Input Oracle > Transformation > Tables Output Oracle (ELT)........................................ 55

Scenario: ............................................................................................................................................ 55

Test results: ....................................................................................................................................... 64

Test 8: File Input Delimited > Sort > File Output Delimited .................................................................. 66

Scenario: ............................................................................................................................................ 66

Tests result: ....................................................................................................................................... 72

Test 9: File Input Delimited > Aggregate > File Output Delimited ........................................................ 76

Page 4: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 4

Scenario: ............................................................................................................................................ 76

Tests result: ....................................................................................................................................... 83

Test 10: File Input Delimited > Lookup > File Output Delimited ........................................................... 86

Scenario: ............................................................................................................................................ 86

Tests result: ....................................................................................................................................... 99

Test 11: File Input Delimited > Lookup > File Output Delimited && rejects....................................... 105

Scenario: .......................................................................................................................................... 105

Tests result: ..................................................................................................................................... 118

Page 5: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 5

General comments

� For the tests with DataStage PX, we used 2 nodes to take advantage of the dual cores and of

the parallelization feature of the tool.

� In terms of intuitiveness and ease of use, Talend Open Studio and DataStage Server are

ahead of the pack. DataStage PX comes in the third position, Informatica in fourth and the

least intuitive is Pentaho Data Integrator. Our main reason for this assessment of Pentaho is

mostly linked to the many parameters that need to be learnt. However, we think that if you

invest lots of time in it, it could become an powerful tool.

� Open Source ETL & Parallelization: Pentaho Data Integrator claims the first position here. It is

easier to parallelize with PDI. We did however fine some issues with the way the tool lets you

to parallelize all the components, but some results are inconsistent.

� ELT: Informatica has an ELT mode named Pushdown Optimization, but we could not figure

out how to use it. Thus, ELT processes were implemented as ETL with Informatica. Only

Talend Open Studio allows to use the ELT mode easily.

Hardware Configuration

� OS: Windows XP Pro SP2

� CPU: Intel Core2 Duo 2 GHz

� JVM 1.6.0_87

� RAM: 4 Go

Page 6: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 6

Test 1: File Input Delimited > File Output Delimited

Scenario:

Reading X lines from a file input delimited and writing in a file output delimited.

File input delimited extract:

Page 7: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 7

TALEND OPEN STUDIO

Job name: file_input_delimited__file_output_delimited

Job

Schema of file_input_delimited

Page 8: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 8

PENTAHO DATA INTEGRATION

Job name: file_input_delimited__file_output_delimited

Job

Schema of file_input_delimited

Page 9: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 9

DATASTAGE SERVER

Job name: file_input_delimited__file_output_delimited

Job

Schema of file_input_delimited

Page 10: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 10

DATASTAGE PX

Job name: PX_file_input_delimited__file_output_delimited

Job

Schema of file_input_delimited

Page 11: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 11

INFORMATICA

Job name: file_input_delimited__file_output_delimited

Job

Schema of file_input_delimited

Page 12: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 12

Test results:

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 2 2 3,4 40,67

Page 13: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 13

1 000 000 1,99 0,51 1,54 5,77

5 000 000 2,14 0,32 1,02 1,39

20 000 000 2,58 0,41 0,93 0,75

Page 14: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 14

Test 2: File Input Delimited > Table MySQL Output

Scenario:

Reading X lines from a file input delimited and writing into a table output MySQL.

Comments:

DataStage 7.5, DataStage PX 7.5 and Informatica 8.1.1 are not tested for this use case. To

begin, the test has been done with default parameters. To optimize the performances, the commit

parameter has been learned. To finish, the job has been parallelize. To parallelize with TOS 2.4.1, we

just have to cut through our file input delimited (With the header and the limit parameters) and

parallelize two sub-jobs. With PDI 3.0.0, we just have to increment the number of copy.

TOS 2.4.1 permits to use the extended insert, which is a MySQL feature. This feature limits

the number of database accesses and increases the performances. With this feature, TOS 2.4.1 is 6

times faster.

Page 15: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 15

TALEND OPEN STUDIO

Job name: file_input_delimited__table_output_mysql

Job (Multi-Thread Execution checked on Job Settings)

Schema of file_input_delimited

Page 16: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 16

PENTAHO DATA INTEGRATION

Job name: file_input_delimited__table_output_mysql

Job

Schema of file_input_delimited

Page 17: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 17

Test results:

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 TOS 2.4.1 Extended Insert

ratio compared with TOS 2.4.1

100 000 0,98 0,18

Page 18: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 18

1 000 000 1,05 0,17

5 000 000 1,15 0,18

Page 19: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 19

Test 3: Table Oracle Input > File Output Delimited

Scenario:

Reading X lines from a table output Oracle and writing into a file output delimited.

Page 20: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 20

TALEND OPEN STUDIO

Job name: table_input_oracle__file_output_delimited

Job

Schema of table_input_oracle

Page 21: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 21

PENTAHO DATA INTEGRATION

Job name: table_input_oracle__file_output_delimited

Job

SCHEMA VIEWER NOT POSSIBLE

Schema of table_input_oracle

Page 22: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 22

DATASTAGE SERVER

Job name: table_input_oracle__file_output_delimited

Job

Schema of table_input_oracle

Page 23: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 23

DATASTAGE PX

Job name: PX_table_input_oracle__file_output_delimited

Job

Schema of table_input_oracle

Page 24: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 24

INFORMATICA

Job name: table_input_oracle__file_output_delimited

Job

Schema of table_input_oracle

Page 25: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 25

Test results:

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 2,12 1,78 1,78 19,26

Page 26: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 26

500 000 3,39 1,76 1,28 7,67

1 000 000 2,62 1,33 1,05 3,56

Page 27: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 27

Test 4: File Input Delimited > Table Output Oracle BULK

Scenario:

Reading X lines from a file input delimited and writing into a table output Oracle BULK.

Page 28: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 28

TALEND OPEN STUDIO

Job name: file_input_delimited__table_output_oracle_bulk

Job

Page 29: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 29

PENTAHO DATA INTEGRATION

Job name: file_input_delimited__table_output_oracle_bulk

Job

Schema of file_input_delimited

Page 30: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 30

DATASTAGE SERVER

Job name: file_input_delimited__table_output_oracle_bulk

Job

Schema of file_input_delimited

Page 31: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 31

DATASTAGE PX

Job name: PX_file_input_delimited__table_output_oracle_bulk

Job

Schema of file_input_delimited

Page 32: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 32

INFORMATICA

Job name: file_input_delimited__table_output_oracle_bulk

Job

Schema of file_input_delimited

Page 33: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 33

Test results:

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 0,6 0,69 1,38 11,93

Page 34: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 34

1 000 000 1,38 0,81 1,22 2,71

2 000 000 1,46 0,8 1,11 1,61

Test 5: File Input Delimited > Transform > File Output Delimited

Scenario:

Reading X lines from a file input delimited and writing in a file output delimited after some

changes.

Changes list:

• The field `rate` content is multiplied by 100.

• The new field `name` is a concatenation (`firstname`+ « » +`lastname`).

• The fields `address` content is converted to uppercase.

Comments:

Pentaho Data Integration hasn’t any graphic component to transform data. Thus, we have to

use a custom code component. The used language is JavaScript. The four others ETL got a

transformer to do this. Talend Open Studio got a custom code too, named tJavaRow or tPerlRow.

Page 35: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 35

TALEND OPEN STUDIO

Job name: file_input_delimited__transformation__file_output_delimited

Job

Schema of file_input_delimited

Schema of file_output_delimited

Page 36: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 36

tMap

Page 37: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 37

PENTAHO DATA INTEGRATION

Job name: file_input_delimited__transformation__file_output_delimited

Job

Schema of file_input_delimited

Schema of file_output_delimited

Page 38: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 38

JavaScript Custom Code

Select Values

Select Values

Page 39: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 39

DATASTAGE SERVER

Job name: file_input_delimited__transformation__file_output_delimited

Job

Schema of file_input_delimited

Schema of file_output_delimited

Page 40: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 40

Transformer

Page 41: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 41

DATASTAGE PX

Job name: PX_file_input_delimited__transformation__file_output_delimited

Job

Schema of file_input_delimited

Schema of file_output_delimited

Page 42: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 42

Transformer

Page 43: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 43

INFORMATICA

Job name: file_input_delimited__transformation__file_output_delimited

Job

Schema of file_input_delimited

Page 44: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 44

Schema of file_output_delimited

Page 45: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 45

Mapping

Page 46: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 46

Tests result:

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 4,07 1,54 3,65 31,15

1 000 000 6 1,18 1,33 5,06

5 000 000 6,02 1,3 0,95 1,01

Page 47: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 47

20 000 000 6,16 0,97 0,84 0,97

Page 48: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 48

Test 6: Table Input Oracle > Aggregation > Table Output Oracle (ELT)

Scenario:

Reading X lines from tables input Oracle and writing into another tables output Oracle (ELT

Mod).

Comments:

Only Talend Open Studio permits to use an ELT mod. Informatica got the Push Down

Optimization, but I didn’t find this feature on the tool.

Page 49: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 49

TALEND OPEN STUDIO

Job names: ELT__table_input_oracle__aggregate_group_by_age_count__table_output_oracle

Job (ELT)

Schema of table_input_oracle

Page 50: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 50

PENTAHO DATA INTEGRATION

Job name: table_input_oracle__aggregate_group_by_age_count__table_output_oracle

Job

SCHEMA VIEWER NOT POSSIBLE

Schema of table_input_oracle

Page 51: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 51

DATASTAGE SERVER

Job name: table_input_oracle__aggregate_group_by_age_count__table_output_oracle

Job

Schema of table_input_oracle

Page 52: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 52

DATASTAGE PX

Job name: PX_table_input_oracle__aggregate_group_by_age_count__table_output_oracle

Job

Schema of table_input_oracle

Page 53: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 53

INFORMATICA

Job name: table_input_oracle__aggregate_group_by_age_count__table_output_oracle

Job

Schema of table_input_oracle

Page 54: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 54

Test results:

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 3,44 1,94 6,45 39,52

Page 55: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 55

500 000 15,9 5,71 8,57 36,43

1 000 000 28,28 8,09 10,36 30,77

Test 7: Tables Input Oracle > Transformation > Tables Output Oracle (ELT)

Scenario:

Reading X lines from tables input Oracle and writing into another tables output Oracle (ELT

Mod) after some changes.

Page 56: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 56

TALEND OPEN STUDIO

Job name: table_input_oracle__elt__table_output_oracle

Job (ELT)

Schema of table_lookup_oracle

Page 57: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 57

Schema of table_input_oracle

Page 58: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 58

PENTAHO DATA INTEGRATION

Job name: table_input_oracle__elt__table_output_oracle

Job

SCHEMA VIEWER NOT POSSIBLE

Schema of table_lookup_oracle

SCHEMA VIEWER NOT POSSIBLE

Schema of table_input_oracle

Page 59: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 59

DATASTAGE SERVER

Job name: table_input_oracle__elt__table_output_oracle

Job

Schema of table_lookup_oracle

Page 60: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 60

Schema of table_input_oracle

Page 61: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 61

DATASTAGE PX

Job name: PX_table_input_oracle__elt__table_output_oracle

Job

Schema of table_lookup_oracle

Schema of table_input_oracle

Page 62: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 62

INFORMATICA

Job name: table_input_oracle__elt__table_output_oracle

Job

Schema of table_lookup_oracle

Page 63: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 63

Schema of table_input_oracle

Page 64: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 64

Test results:

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 6,4 2,12 2,5 8,93

Page 65: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 65

500 000 8,67 2,79 1,31 3,05

1 000 000 7,26 2,2 0,9 1,9

Page 66: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 66

Test 8: File Input Delimited > Sort > File Output Delimited

Scenario:

Reading X lines from a file input delimited and writing in a file input delimited sorted.

Sorts list:

• Order by the integer field `age` ASC.

• Order by the string field `firstname` ASC.

• Order by the fields `age` and `firstname` ASC.

Comments:

With the version used, I can’t do sort in memory with Pentaho Data Integrator. But the

feature is present on latest version.

On Talend Open Studio, with a large volume (5 000 000 and 20 000 000), we have to use the

component tExternalSort which use GNU sort, a sort software.

Page 67: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 67

TALEND OPEN STUDIO

Job names:

• file_input_delimited__sort_on_age__file_output_delimited

• file_input_delimited__sort_on_firstname__file_output_delimited

• file_input_delimited__sort_on_firstname_and_age__file_output_delimited

Job

Schema of file_input_delimited

Page 68: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 68

PENTAHO DATA INTEGRATION

Job names:

• file_input_delimited__sort_on_age__file_output_delimited

• file_input_delimited__sort_on_firstname__file_output_delimited

• file_input_delimited__sort_on_firstname_and_age__file_output_delimited

Job

Schema of file_input_delimited

Page 69: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 69

DATASTAGE SERVER

Job names:

• file_input_delimited__sort_on_age__file_output_delimited

• file_input_delimited__sort_on_firstname__file_output_delimited

• file_input_delimited__sort_on_firstname_and_age__file_output_delimited

Job

Schema of file_input_delimited

Page 70: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 70

DATASTAGE PX

Job names:

• PX_file_input_delimited__sort_on_age__file_output_delimited

• PX_file_input_delimited__sort_on_firstname__file_output_delimited

• PX_file_input_delimited__sort_on_firstname_and_age__file_output_delimited

Job

Schema of file_input_delimited

Page 71: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 71

INFORMATICA

Job names:

• file_input_delimited__sort_on_age__file_output_delimited

• file_input_delimited__sort_on_firstname__file_output_delimited

• file_input_delimited__sort_on_firstname_and_age__file_output_delimited

Job

Schema of file_input_delimited

Page 72: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 72

Tests result:

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

Page 73: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 73

100 000 2,51 2,92 2,78 28,82

1 000 000 2,09 3,86 1,03 3,93

5 000 000 0,83 1,42 0,34 1,12

20 000 000 0,66 +++ 0,48 0,64

Page 74: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 74

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 2,01 3,55 2,37 24,9

1 000 000 1,73 3,21 0,89 3,45

5 000 000 0,93 2,53 0,34 1,26

20 000 000 0,69 +++ 0,58 0,77

Page 75: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 75

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 2,42 5,51 3,38 31,58

1 000 000 1,68 3,45 0,94 3,52

5 000 000 0,71 1,6 0,26 0,95

20 000 000 0,84 +++ 0,58 0,76

Page 76: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 76

Test 9: File Input Delimited > Aggregate > File Output Delimited

Scenario:

Reading X lines from a file input delimited, achieving an aggregation and writing the

operations result in a file output delimited.

1 – Group by the field `age`; Operation: COUNT.

2 – Group by the field `age`; Operations: COUNT, SUM(rate), AVG(rate), MIN(rate),

MAX(rate).

3 – Group by the field `firstname`; Operations: COUNT.

Comments:

When the output flow is too big (aggregate by firstname with big volume here), we have to

use the tSortedAggregateRow on Talend Open Studio. This component sorts rows before the

aggregation. On this case, Pentaho Data Integrator failed.

Page 77: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 77

TALEND OPEN STUDIO

Job names:

• file_input_delimited__aggregate_group_by_age_count__file_output_delimited

• file_input_delimited__aggregate_group_by_age_count_sum_avg_min_max__file_o

utput_delimited

• file_input_delimited__aggregate_group_by_firstname_count__file_output_delimit

ed

Job

Job using the tExternalSortRow component

Page 78: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 78

Schema of file_input_delimited

Schema of file_output_delimited

file_input_delimited__aggregate_group_by_age_count__file_output_delimited

Page 79: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 79

PENTAHO DATA INTEGRATION

Job names:

• file_input_delimited__aggregate_group_by_age_count__file_output_delimited

• file_input_delimited__aggregate_group_by_age_count_sum_avg_min_max__file_o

utput_delimited

• file_input_delimited__aggregate_group_by_firstname_count__file_output_delimit

ed

Job

Schema of file_input_delimited

Schema of file_output_delimited

file_input_delimited__aggregate_group_by_age_count__file_output_delimited

Page 80: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 80

DATASTAGE SERVER

Job names:

• file_input_delimited__aggregate_group_by_age_count__file_output_delimited

• file_input_delimited__aggregate_group_by_age_count_sum_avg_min_max__file_o

utput_delimited

• file_input_delimited__aggregate_group_by_firstname_count__file_output_delimit

ed

Job

Schema of file_input_delimited

Schema of file_output_delimited

file_input_delimited__aggregate_group_by_age_count__file_output_delimited

Page 81: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 81

DATASTAGE PX

Job names:

• PX_file_input_delimited__aggregate_group_by_age_count__file_output_delimited

• PX_file_input_delimited__aggregate_group_by_age_count_sum_avg_min_max__fi

le_output_delimited

• PX_file_input_delimited__aggregate_group_by_firstname_count__file_output_deli

mited

Job

Schema of file_input_delimited

Schema of file_output_delimited

file_input_delimited__aggregate_group_by_age_count__file_output_delimited

Page 82: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 82

INFORMATICA

Job names:

• file_input_delimited__aggregate_group_by_age_count__file_output_delimited

• file_input_delimited__aggregate_group_by_age_count_sum_avg_min_max__file_o

utput_delimited

• file_input_delimited__aggregate_group_by_firstname_count__file_output_delimit

ed

Job

Schema of file_input_delimited

Schema of file_output_delimited

file_input_delimited__aggregate_group_by_age_count__file_output_delimited

Page 83: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 83

Tests result:

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

Page 84: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 84

100 000 4,35 3,23 6,45 63,71

1 000 000 3,8 0,86 0,93 5,77

5 000 000 4,47 0,7 0,71 1,49

20 000 000 3,76 1,03 0,63 0,56

Statistics:

Page 85: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 85

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 3,1 2,38 13,39 47,23

1 000 000 3,39 1,48 2,06 5,6

5 000 000 3,68 1,33 0,89 1,3

20 000 000 3,06 1,32 1,91 0,65

Page 86: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 86

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 3,14 2,33 5,23 47,67

1 000 000 3,76 1,77 1,39 12,13

5 000 000 0,82 0,34 0,2 +++

20 000 000 0,59 0,46 0,54 +++

Test 10: File Input Delimited > Lookup > File Output Delimited

Scenario:

Reading X lines from a file input delimited, looking up to another file input delimited, for 4

fields using id_client column. Writing the jointure result into a file output delimited.

Page 87: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 87

TALEND OPEN STUDIO

Job name: file_input_delimited__file_lookup_delimited__file_output_delimited

Job

Schema of file_input_delimited

Schema of file_lookup_delimited

Page 88: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 88

Schema file_output_delimited

tMap Component

Page 89: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 89

PENTAHO DATA INTEGRATION

Job name: file_input_delimited__file_lookup_delimited__file_output_delimited

Job

Schema of file_input_delimited

Schema of file_lookup_delimited

Page 90: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 90

Schema of file_output_delimited

Mapping Component

Page 91: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 91

Page 92: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 92

DATASTAGE SERVER

Job name: file_input_delimited__file_lookup_delimited__file_output_delimited

Job

Schema of file_input_delimited

Page 93: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 93

Schema of file_lookup_delimited

Schema file_output_delimited

Page 94: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 94

Transformer Component

Page 95: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 95

DATASTAGE PX

Job name: PX_file_input_delimited__file_lookup_delimited__file_output_delimited

Job

Schema of file_input_delimited

Page 96: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 96

Schema of file_lookup_delimited

Schema file_output_delimited

Transformer Component

Page 97: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 97

INFORMATICA

Job name: file_input_delimited__file_lookup_delimited__file_output_delimited

Job

Schema of file_input_delimited

Schema of file_lookup_delimited

Page 98: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 98

Schema file_output_delimited

Transformer Component

Page 99: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 99

Tests result:

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

Page 100: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 100

100 000 2,86 3,45 3,45 30,34

1 000 000 3,35 1,66 1,91 8,87

5 000 000 3,05 1,15 1,39 4,13

20 000 000 2,67 1,28 1,13 3,18

Page 101: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 101

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 2,03 7,18 1,79 13,72

1 000 000 2,76 3,71 1,46 7,93

5 000 000 3,01 1,73 1,24 4,7

20 000 000 2,52 1,69 1,05 4,28

Page 102: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 102

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 1,47 6,93 0,94 5,48

1 000 000 2,26 5,61 1,05 5,22

5 000 000 3,02 2,64 1,04 4,49

20 000 000 4,01 1,67 1,01 4,67

Page 103: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 103

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

Page 104: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 104

100 000 Failed 6,53 0,42 1,71

1 000 000 Failed 5,89 0,43 1,58

5 000 000 Failed 2,49 0,28 1,18

20 000 000 Failed 1,75 0,24 1,13

Page 105: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 105

Test 11: File Input Delimited > Lookup > File Output Delimited &&

rejects

Scenario:

Reading X lines from a file input delimited, looking up to another file input delimited, for 4

fields using id_client column. Writing the jointure result into a file output delimited and the output

rejects into another files output delimited.

1 – Filter rejects: `age` content < 18

2 – Filter rejects: `age` content < 18 and inner join reject

Comments:

Talend Open Studio and DataStage Server are the more ergonomic tools to manage the

expression filter rejects and inner join rejects (with the Transformer component (tMap on Talend

Open Studio)). For DataStage PX, Pentaho Data Integrator and Informatica, we have to use filter

components.

Page 106: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 106

TALEND OPEN STUDIO

Job name:

file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_file_output_delimited

Job

Schema of file_input_delimited

Schema of file_lookup_delimited

Page 107: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 107

Schema of file_output_delimited (age>=18)

Schema of file_output_delimited (age<18) = Schema of file_ output _delimited

tMap Component

Page 108: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 108

PENTAHO DATA INTEGRATION

Job name:

file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_file_output_delimited

Job

Schema of file_input_delimited

Schema of file_lookup_delimited

Page 109: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 109

Schema of file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_ output _delimited

Page 110: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 110

Mapping Component

DATASTAGE SERVER

Job name:

file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_file_output_delimited

Job

Schema of file_input_delimited

Page 111: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 111

Schema file_lookup_delimited

Schema of file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_ output _delimited

Page 112: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 112

Transformer Component

Page 113: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 113

DATASTAGE PX

Job name:

PX_file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_file_output_delim

ited

Job

Schema of file_input_delimited

Page 114: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 114

Schema file_lookup_delimited

Schema of file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_output_delimited

Page 115: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 115

Transformer Component

Page 116: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 116

INFORMATICA

Job name:

file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_file_output_delimited

Job

Schema of file_input_delimited

Page 117: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 117

Schema file_lookup_delimited

Schema of file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_output_delimited

Transformer Component

Page 118: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 118

Tests result:

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

Page 119: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 119

100 000 2,19 3,97 4,64 29,8

1 000 000 2,54 1,56 2,08 8,61

5 000 000 2,65 1,22 1,39 4,37

20 000 000 3 1,42 1,35 3,71

Statistics:

Page 120: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 120

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 1,83 6,71 1,76 11,03

1 000 000 2,21 3,66 1,54 7,54

5 000 000 2,51 1,76 1,38 5,23

20 000 000 2,77 1,54 1,39 4,58

Page 121: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 121

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 1,38 6,47 0,88 5,78

1 000 000 2,13 4,47 1,18 5,45

5 000 000 2,91 1,7 1,33 4,92

20 000 000 2,52 1,74 1,21 4,75

Page 122: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 122

TALEND OPEN STUDIO

Job name:

file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_and_innerjoin_rejects

_file_output_delimited

Job

Schema of file_input_delimited

Page 123: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 123

Schema of file_lookup_delimited

Schema of file_output_delimited (age>=18)

Schema of file_output_delimited (age<18) = Schema of file_output_delimited

Schema of file_output_delimited (inner join rejects) = Schema of file_output_delimited

Page 124: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 124

tMap Component

Page 125: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 125

PENTAHO DATA INTEGRATION

Job name:

file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_and_innerjoin_rejects

_file_output_delimited

Job

Schema of file_input_delimited

Schema of file_lookup_delimited

Page 126: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 126

Schema of file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_output_delimited

Schema of file_output_delimited (inner join rejects) = Schema of file_output_delimited

Page 127: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 127

Mapping Component

DATASTAGE SERVER

Job name:

file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_and_innerjoin_rejects

_file_output_delimited

Page 128: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 128

Job

Schema of file_input_delimited

Schema of file_lookup_delimited

Page 129: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 129

Schema file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_output_delimited

Schema of file_output_delimited (inner join rejects) = Schema of file_output_delimited

Page 130: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 130

Transformer Component

Page 131: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 131

DATASTAGE PX

Job name:

PX_file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_and_innerjoin_rej

ects_file_output_delimited

Job

Schema of file_input_delimited

Page 132: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 132

Schema of file_lookup_delimited

Schema file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_output_delimited

Schema of file_output_delimited (inner join rejects) = Schema of file_output_delimited

Page 133: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 133

Transformer Component

Page 134: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 134

INFORMATICA

Job name:

file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_and_innerjoin_rejects

_file_output_delimited

Job

Schema of file_input_delimited

Page 135: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 135

Schema of file_lookup_delimited

Schema file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_output_delimited

Schema of file_output_delimited (inner join rejects) = Schema of file_output_delimited

Transformer Component

Page 136: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 136

Page 137: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 137

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 1,83 4,22 6,34 39,15

1 000 000 2,3 1,77 2,7 12,65

Page 138: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 138

5 000 000 2,43 1,22 1,92 5,1

20 000 000 3,07 1,28 1,37 3,46

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

Page 139: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 139

ratio compared with TOS 2.4.1

100 000 1,75 6,73 6,73 15,5

1 000 000 2,21 4,06 1,83 9,08

5 000 000 2,38 2,08 1,45 5,78

20 000 000 2,65 1,57 1,24 4,25

Page 140: ETL Benchmarks - Marc Russel's Blog · MANAPPS V 1.1 2008/10/20 ETL Benchmarks Pg 1 ETL Benchmarks Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA

MANAPPS

V 1.1 2008/10/20 ETL Benchmarks

Pg 140

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 1,21 3,51 1,18 5,8

1 000 000 1,8 5,96 1,25 5,5

5 000 000 2,05 2,81 1,27 4,78

20 000 000 3,27 1,83 1,06 4,47