HPCC Systems - ECL for Programmers - Big Data - Data Scientist

Post on 10-Jul-2015

274 views 4 download

Transcript of HPCC Systems - ECL for Programmers - Big Data - Data Scientist

By Fujio Turner

HPCC Systems - ECL Intro Big Data Querying Made EZ

Enterprise Control Language explained for Programmers

@FujioTurner

LexisNexis is a provider of legal, tax, regulatory, news, business information, and analysis to legal, corporate, government,!

accounting and academic markets. !!

LexisNexis has been in business since 1977 with over 30,000 employees worldwide. 

What is HPCC Systems?Who is LexisNexis?

LexisNexis Risk is the division of the LexisNexis which focuses on data, Big Data processing, linking and vertical expertise and supports HPCC Systems as an open source project under Apache 2.0 License.

Comparison

JAVA C++

Petabytes

1-80,000 Jobs/day

Since 2005

Exabytes

Non-Indexed 4X-13X

Since 2000

Indexed: 2K-3K Jobs/sec

? ? ? ? ? ?

Thor Roxie

Block Based File Based

What Is ECL?ECL (Enterprise Control Language) is a C++ based query language for use with HPCC Systems Big Data platform. ECLs syntax and format is very simple and easy to learn.!!

Note - ECL is very similar to Hadoop’s pig ,but!more expressive and feature rich.

Comparing ECL to General Programming

ECLGeneral

In this presentation you will see how in ECL loading and querying data is just like reading and finding data in a plain text file.!

general programming (general common logic)!vs.!

ECL

ECL Code HEREGeneral Code HERE

Example Text File

Kevin CA 45 Mark MI 27 Sara FL 64

Name State Age

Customer Data May 2010

~/cdata_2010.txt!example file name

~/hpcc::cdata_2010.txt=ECL example file distributed in HPCC cluster

d = fopen(‘~/cdata_2010.txt’)

Opening File: general programming vs ECL

ECLGeneral

File Location

d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

Opening File: general programming vs ECL

ECLGeneral

File Location

Open File Function

d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

Organizing: general programming vs ECL

new_d = split( d ,“\r\n”)

ECLGeneral

Kevin CA 45 Mark MI 27 Sara FL 64

Split Data(d) by Rowd := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

Organizing: general programming vs ECL

new_d = split( d ,“\r\n”)

cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END

ECLGeneral

Use This Schema on this file!to Give Structure to Data

Kevin CA 45 Mark MI 27 Sara FL 64

Split Data(d) by Rowd := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

Find “Sara”: general programming vs ECL

cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END

for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}

ECLGeneral

Kevin CA 45 Mark MI 27 Sara FL 64

0 1 2

d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

new_d = split( d ,“\r\n”)

Split Data by Column

Find “Sara”: general programming vs ECL

cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END

for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}

ECLGeneral

Kevin CA 45 Mark MI 27 Sara FL 64

0 1 2

d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

new_d = split( d ,“\r\n”)

Split Data by Column

Filter Data By

Find “Sara”: general programming vs ECL

cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END

for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}

ECLGeneral

Kevin CA 45 Mark MI 27 Sara FL 64

0 1 2

d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

new_d = split( d ,“\r\n”)

Split Data by Column

Filter Data By

Output

Find “Sara”: general programming vs ECL

cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END

for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}

sara := d(Name = ‘Sara’);

ECLGeneral

Kevin CA 45 Mark MI 27 Sara FL 64

0 1 2

d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

new_d = split( d ,“\r\n”)

Split Data by Column

Filter Data By

Output

Find “Sara”: general programming vs ECL

cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END

for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}

sara := d(Name = ’Sara’);

OUTPUT(sara);

ECLGeneral

Kevin CA 45 Mark MI 27 Sara FL 64

0 1 2

d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

new_d = split( d ,“\r\n”)

Split Data by Column

Filter Data By

Output

Find “Sara” & Older then 50: general programming vs ECL

cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END

for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = row.split(“ ”)!! if(new row[0] == ‘Sara’ and row[2] >50){!! ! print ”Found Sara”!! }!}

sara := d(Name = ‘Sara’ AND Age > 50);

OUTPUT(sara);

ECLGeneral

d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

new_d = split( d ,“\r\n”)

Kevin CA 45 Mark MI 27 Sara FL 64

0 1 2

ECL is EZ•Make your own functions & libraries in ECL.!•Modularize your code with “Import”: reuse old code

Machine Learning Built-in

http://hpccsystems.com/ml

ECL Plugin for Eclipse IDE

http://hpccsystems.com/products-and-services/products/plugins/eclipse-ide

ECL + Others Languages

ECL is C++ based so all your C/C++ code can be used in ECL.!&!

Use other languages and methods like below to query too.

ECL GUIDEhttp://hpccsystems.com/download/docs/ecl-language-referenceJOIN!

MERGE!LENGTH!REGEX!

ROUND!SUM!

COUNT!TRIM!WHEN!

AVE!ABS!

CASE!DEDUP!

NORMALIZE!DENORMALIZE!

IF!SORT!

GROUP!more ….

Query with Plain SQL

http://www.slideshare.net/FujioTurner/meet-up-sqldemopp

For More HPCC “How To’s” Go to

http://www.slideshare.net/hpccsystems/jdbc-hpcc

SQL TO ECLor

http://www.youtube.com/watch?v=8SV43DCUqJg

Watch how to install HPCC Systems

in 5 Minutes

Download HPCC Systems Open Source

Community Edition

or

Source Codehttps://github.com/hpcc-systems

http://hpccsystems.com/download/