Big Data Analytics without Hadoop?by Dr. Bernhard Sünder, Managing Director, AMS GmbH
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 2
AMS GmbHlocated in Chemnitz (Saxonia)founded 1993 by Dr. B. Sünder
Since 1998 our vision is:
Using Internet Technologiesfor distributed Work-Flows inMeasurement Data Post Processing
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 3
Why we are here?
1. We are working with Measurement Data
2. Amount of data grows extremely1. Content of data files grow
1. >20,000 channels per file
2. >1GB per channel 20TB
2. The number of files grow (EvoBus: 100,000 files per month)
3. Old Microsoft desktop technologies are no solution1. Windows files system as a data base
2. Windows desktop tools for analysis and reporting
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 4
Are Big-Data Technologies a Solution?
From Silicon Valley a lot of Big-Data technologiesflood the market
1. Hadoop MapReduce
2. HDFS: Hadoop Distributed File System
3. Lucene / Elasticsearch: Data base with indexing technology
4. Parquet: Data File Format
5. Tableau: Analysis and Visualization
Plus a lot of derived software
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 5
Big Data vs Big Test Data
• Big Data is used mainly for Business / Office data
Google data
Amazon data
• But is it useful for Measurement data?
• What are the differences of both?
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 6
• Millions of files, which are naturally slicedno further Hadoop slicing necessary
• Quantities versus Numbers
• Analysis functions which need up to 100% overlapping
• Meta Data definitions describe use case
• Files formats, with 20 years expertise
Big Test Data for Measurement Data
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 7
• Slicing files in independent parts Not needed, we have files for each test
• Process each part independent Parallelism is the only way for performance
• Aggregate individual results to a common result Several aggregation methods are needed
Hadoop: MapReduce
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 8
• 2016: due to replication only cheap storage– Comparison of IT-managed storage with Saturn HD
• 2017: IT-Managed HDFS storage same price– High dynamics in statements
• Replication of data (factor of 3)– Needs a factor of 3 more storage capacity
HDFS: Hadoop Distributed File System
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 9
• Lucene: a big step, but limited index space– Great advantages compared with RDBs.
• Elasticsearch: distributed Lucene, no limit– 95% technology
• Full text search: intuitive vague search
• Faceted search: select from a given list
Indexing Database: Lucene or Elasticsearch
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 10
• The Parquet Introduction on Apache Website:“We invented the column based storage”
• Alternate Facts!
• In the Measurement data world column based storage exists since 20 years. (ATF(X), DAT, …)
• We see no relevance of such a new file format
• Just use the existing data file formats,even if you use a HDFS file system
Parquet: Data File Format
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 11
• Tableau has a fantastic nice user interface
• There is only one feature: Pivot Analysis
• Ideal for row based sales data (see Excel)
• But for measurement data it is only oneout of hundred calculations
• A good analysis tools has Pivot available, too
Tableau: Analysis and Visualization
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 12
Technology Big-Data Big-Test-Data
Hadoop MapReduce ++ -
Hadoop Distributed File System ++ + (?)
Parquet + -
Tableau + -
Elasticsearch /Lucene ++ ++
Conclusion on Big-Data Technologies
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 13
Measurement Data Management
Windows,Linux,Mac, iOS,Android
optimized traffic with
iPad
jBEAM
Server
MaDaM
Elasticsearch database
Web
Service
jBEAM
pure HTML-5
by
InteractiveAnalysis
ATFX,MDF4,
…
Test
Simulation
MaDaMImporterjBEAM
iBEAMClient
jBEAMClient
InteractiveAnalysisjBEAM
Desktop
MDF4
Search & FindStandard Reports
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 14
• All the Hadoop technologies are availableas Java-libraries
• MaDaM and jBEAM are both Java tools
• Using Big Data-technologies is easy for us
• Today we can show you how to read MDF files
from a HDFS
MaDaM and jBEAM:the partners for Big-Data technologies
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 15
ParallelizationLE
VEL
3
multiple data lakes with multiple MDMsShanghai Stuttgart Detroit
User
N x jBEAM
Cluster
one data lake with multiple jBEAMs
LEV
EL 2
jBEAM
Server
jBEAM
Client
LEV
EL 1
multiple threads withinone jBEAMcalculation
Datachannel
Calculation
6.377.228.23
11.6712.5413.83
14.8915.4116.60
Result
Split Aggregation
CalcT a
CalcT b
CalcT c
236.87
597.22
618.23
jBEAM
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 16
jBEAM-Cluster – Parallel Processing
Aggregated Reports:jBEAM-generated PDF files
N-jBEAMs are running in a cluster and analyzing file by file.Node results are aggregated to a common result.
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 17
Multi File Operation Mode Analysis (I)
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 18
Multi File Operation Mode Analysis (II)
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 19
1e-6 = 0.0001% Only meta dataexchange
1e-3 = 0.1% EnCom-minimized traffic
1e0 = 100% Complete file upload
*) optimized traffic with
long
dis
tance
Multiple-MaDaM Solution
IP traffic1e-6
Serv
erC
lien
t
USA
*)1e-3
WebBrowser
jBEAM
Client
MaDaM
Importer
File System
1e0≙100%
jBEAM
ServerHTML-5
MaDaMTM
Lucene Database
long
dis
tance
Germany
*)
WebBrowser
jBEAM
Client
MaDaM
Importer
File System
jBEAM
ServerHTML-5
MaDaMTM
Lucene Database
China
*)
WebBrowser
jBEAM
Client
MaDaM
Importer
File System
jBEAM
ServerHTML-5
MaDaMTM
Lucene Database
Search for Tests & Preview: Modern interactive web interface accessible by anybrowser
StandardizedReports:Server-jBEAM-generated PDF files can be viewed by PDF-Reader
WebBrowser
jBEAM
Client
MaDaM
Importer
Interactive Analysis: jBEAM with Java Web Start running on clientdesktop
Import of Tests:MaDaM Importer with Java Web Start running on client desktop
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 20
• Using the right technologies from Big Data
• Combine it with sophisticated technologies from the measurement world
• And you receive the right solution for
Big Test Data
Conclusion
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 21
Come to our booth #1822
• The brand new MaDaM2
– Elasticsearch
– Easy & new user-interface
• Hadoop (HDFS)
– ASAM-MDF file import
• jBEAM-Cluster
– 6 cheap PCs working in parallel
NAS
20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 22
… and if you are lucky
Win a local flight around Stuttgart with our business plane.
We will start directly after exhibition closes; fromStuttgart airport.
Bahnhofstraße 6 1760 Opdyke Court German Centre, Unit 719A09111 Chemnitz Auburn Hills, MI 48326 88 Keyuan Road, PudongGermany USA Shanghai 201203 / PR China
Tel.: +49 (371) 918 668-0 Tel.: +1 (248) 270-7779 Tel.: +86 (21) 289 866 19Fax.: +49 (371) 918 668-99 Fax: +1 (248) 393-0340 Fax: +86 (21) 289 865 11E-Mail: [email protected] E-Mail: [email protected] E-Mail: [email protected]: www.AMSonline.de Web: www.AMSonline.eu Web: www.AMSonline.cn
Gesellschaft für angewandteMess- und Systemtechnik mbH North America Inc. Liaison Office Shanghai