Post on 10-Jul-2015
By Fujio Turner
HPCC Systems - ECL Intro Big Data Querying Made EZ
Enterprise Control Language explained for Programmers
@FujioTurner
LexisNexis is a provider of legal, tax, regulatory, news, business information, and analysis to legal, corporate, government,!
accounting and academic markets. !!
LexisNexis has been in business since 1977 with over 30,000 employees worldwide.
What is HPCC Systems?Who is LexisNexis?
LexisNexis Risk is the division of the LexisNexis which focuses on data, Big Data processing, linking and vertical expertise and supports HPCC Systems as an open source project under Apache 2.0 License.
Comparison
JAVA C++
Petabytes
1-80,000 Jobs/day
Since 2005
Exabytes
Non-Indexed 4X-13X
Since 2000
Indexed: 2K-3K Jobs/sec
? ? ? ? ? ?
Thor Roxie
Block Based File Based
What Is ECL?ECL (Enterprise Control Language) is a C++ based query language for use with HPCC Systems Big Data platform. ECLs syntax and format is very simple and easy to learn.!!
Note - ECL is very similar to Hadoop’s pig ,but!more expressive and feature rich.
Comparing ECL to General Programming
ECLGeneral
In this presentation you will see how in ECL loading and querying data is just like reading and finding data in a plain text file.!
general programming (general common logic)!vs.!
ECL
ECL Code HEREGeneral Code HERE
Example Text File
Kevin CA 45 Mark MI 27 Sara FL 64
Name State Age
Customer Data May 2010
~/cdata_2010.txt!example file name
~/hpcc::cdata_2010.txt=ECL example file distributed in HPCC cluster
d = fopen(‘~/cdata_2010.txt’)
Opening File: general programming vs ECL
ECLGeneral
File Location
d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
Opening File: general programming vs ECL
ECLGeneral
File Location
Open File Function
d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
Organizing: general programming vs ECL
new_d = split( d ,“\r\n”)
ECLGeneral
Kevin CA 45 Mark MI 27 Sara FL 64
Split Data(d) by Rowd := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
Organizing: general programming vs ECL
new_d = split( d ,“\r\n”)
cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END
ECLGeneral
Use This Schema on this file!to Give Structure to Data
Kevin CA 45 Mark MI 27 Sara FL 64
Split Data(d) by Rowd := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
Find “Sara”: general programming vs ECL
cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END
for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}
ECLGeneral
Kevin CA 45 Mark MI 27 Sara FL 64
0 1 2
d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
new_d = split( d ,“\r\n”)
Split Data by Column
Find “Sara”: general programming vs ECL
cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END
for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}
ECLGeneral
Kevin CA 45 Mark MI 27 Sara FL 64
0 1 2
d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
new_d = split( d ,“\r\n”)
Split Data by Column
Filter Data By
Find “Sara”: general programming vs ECL
cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END
for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}
ECLGeneral
Kevin CA 45 Mark MI 27 Sara FL 64
0 1 2
d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
new_d = split( d ,“\r\n”)
Split Data by Column
Filter Data By
Output
Find “Sara”: general programming vs ECL
cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END
for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}
sara := d(Name = ‘Sara’);
ECLGeneral
Kevin CA 45 Mark MI 27 Sara FL 64
0 1 2
d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
new_d = split( d ,“\r\n”)
Split Data by Column
Filter Data By
Output
Find “Sara”: general programming vs ECL
cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END
for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}
sara := d(Name = ’Sara’);
OUTPUT(sara);
ECLGeneral
Kevin CA 45 Mark MI 27 Sara FL 64
0 1 2
d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
new_d = split( d ,“\r\n”)
Split Data by Column
Filter Data By
Output
Find “Sara” & Older then 50: general programming vs ECL
cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END
for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = row.split(“ ”)!! if(new row[0] == ‘Sara’ and row[2] >50){!! ! print ”Found Sara”!! }!}
sara := d(Name = ‘Sara’ AND Age > 50);
OUTPUT(sara);
ECLGeneral
d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);
d = fopen(‘~/cdata_2010.txt’)
new_d = split( d ,“\r\n”)
Kevin CA 45 Mark MI 27 Sara FL 64
0 1 2
ECL is EZ•Make your own functions & libraries in ECL.!•Modularize your code with “Import”: reuse old code
Machine Learning Built-in
http://hpccsystems.com/ml
ECL Plugin for Eclipse IDE
http://hpccsystems.com/products-and-services/products/plugins/eclipse-ide
ECL + Others Languages
ECL is C++ based so all your C/C++ code can be used in ECL.!&!
Use other languages and methods like below to query too.
ECL GUIDEhttp://hpccsystems.com/download/docs/ecl-language-referenceJOIN!
MERGE!LENGTH!REGEX!
ROUND!SUM!
COUNT!TRIM!WHEN!
AVE!ABS!
CASE!DEDUP!
NORMALIZE!DENORMALIZE!
IF!SORT!
GROUP!more ….
Query with Plain SQL
http://www.slideshare.net/FujioTurner/meet-up-sqldemopp
For More HPCC “How To’s” Go to
http://www.slideshare.net/hpccsystems/jdbc-hpcc
SQL TO ECLor
http://www.youtube.com/watch?v=8SV43DCUqJg
Watch how to install HPCC Systems
in 5 Minutes
Download HPCC Systems Open Source
Community Edition
or
Source Codehttps://github.com/hpcc-systems
http://hpccsystems.com/download/