node-crate: NodeJS and big data
-
Upload
stefan-thies -
Category
Data & Analytics
-
view
836 -
download
4
description
Transcript of node-crate: NodeJS and big data
![Page 1: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/1.jpg)
node-crate: NodeJS & big data
![Page 2: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/2.jpg)
The path is the goal
![Page 3: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/3.jpg)
about me
• Stefan Thies
• Consulting, product evaluations & outsourcing with a partner company in Saarbrücken, Germany.
• follow me on twitter: seti321
• megastef (github, npmjs.org)
![Page 4: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/4.jpg)
how to get here? • 2012/2013 Systems with Elasticsearch &
• Mobile Apps (Geo) with Appcelerator Titanium
• Data enrichment & Webcrawlers (whois, geo, appstores)
• Distributed Regex-Processing for Cyber Security with 0MQ
• Security Layer around Elasticsearch (sails.js)
• … we do almost everything in NodeJS
![Page 5: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/5.jpg)
Product evaluations
MarkLogic*
MongoDB*
Elas1csearch*
CouchDB*
CRATE*
0*
10*
20*
30*
40*
50*
60*
70*
80*
90*
Document)oriented)data)stores)Points)for)product)evalua4on)criterias)of)the)specific)project)(RT,)scalability,)replica4on,))features)and)commercial))
Datenreihe1*
![Page 6: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/6.jpg)
Design criterias• Scalable & lean architecture
• Operations: NO Zoo of 3th party components
• We choosed Elasticsearch
• Automatic installation -> Docker containers
• One Language: JavaScript / NodeJS
SearchLoad Balancer
Master Nodes
search
indexing
Data Nodes
Collector(index query)
Web UI(search query)
indexing
search
![Page 7: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/7.jpg)
Security, Admin,
UI (AngularJS)
- Policies, Users, Roles - REST API - Websockets / RT
![Page 8: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/8.jpg)
data enrichment• Hey, we got Elasticsearch - lookup queries for ‚static‘ data sources will be fast!
• Distributed processing based on 0MQ (pull/push) - high throughput, parallel processing, distributed worker processes
collection
Information extraction and
processingdata lookups Elasticsearch
Information extraction and
processingdata lookups Elasticsearch
![Page 9: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/9.jpg)
any problem?
collect mass data
Elasticsearch
Analyze & Visualize
other data sources Geo Company
dataOpen Source Information
massive updates!
processing queue / workers
Reporting (PDF)Accurate Counts (Facets) -> Aggregation
![Page 10: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/10.jpg)
Operational issuesalternative ‚any‘ DB (for updates) + ES
• It’s a big mess regarding compatibility, maintenance and monitoring all components - each box can be multiple machines, River might not be updated to latest DB or ES version, a bug might force you to upgrade one of the components and there the trouble of dependency starts …
• Reporting: custom programming DSL Queries, Rendering HTML with PhantomJS to PDF - painful if you know standard Report generators from SQL world. How to tell the customer to adapt it to his needs? Using some ‚standard‘ DB (SQL or NoSQL) supported by the reporting tools would solve it.
DB Vx.x
Data-Procssing Services
DB-River V y.y
Elasticsearch V z.z
Search & Analytics V. b.b
![Page 11: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/11.jpg)
Don’t panic
![Page 12: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/12.jpg)
A match at Slideshare!
• An early presentation of CRATE from Jodok got my attention
![Page 13: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/13.jpg)
• The Mountain Hackathon
birthday of node-crate
![Page 14: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/14.jpg)
Package status
• Igor Likhamanov
• Stefan Thies
• Martin Heidegger joined recently and made high professional quality improvements!
![Page 15: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/15.jpg)
DevOps: Stack-Shrinking
• From 3 down to 1 storage service:
DB Vx.x
Data-Processing
DB-River V y.y
Elasticsearch V z.z
Search & Analytics V. b.b
Crate V a.a
Search & Analytics V. b.b
Data-Processing
![Page 16: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/16.jpg)
Data Enrichment performance• Elasticsearch has no „update by query“
• If we need to update e.g. 50.000 records it means ruining a query to identify the relevant records and send 50.000 HTTP requests for update or build a a large bulk update with 50.000 instructions
• In Crate
• update something where somethingelse = ‚othervalue‘
• One command, not 50.000 roundtrips …
![Page 17: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/17.jpg)
Data Enrichment - performance
collect mass data
CRATE data store
Analyze & Visualize
other data sources Geo … Open Source
Information
massive updates, no issue :)
processing queue / workers
Reporting (PDF) using CRATE JDBC
![Page 18: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/18.jpg)
BLOB’s (Images, videos, packet data, …)
• Traditionally
• Meta-Data in DB + Files in some filesystem / separate object storage
• Both behave different for scaling
• Crate stores BLOB’s like other shards including replicas
• More nodes more capacity, replicas etc.
• BLOB storage scales with the data store
• Would be perfect for ‚dropbox‘ like service :)
![Page 19: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/19.jpg)
Demo: Installation, usage, examples walk through …
• https://www.youtube.com/watch?v=ZaDFrd4ZwQk (demo crate.io)
• https://github.com/megastef/node-crate (node-crate on github)
• http://techblog.bigdata-analyst.de (sample applications)
• https://crate.io/docs/stable/ (documentation of crate)
![Page 20: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/20.jpg)
Simple Example
![Page 21: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/21.jpg)
Import Data (bulk insert)
COPY web_log FROM ‚/var/logs/web_log.json‘ WITH (bulk_size=15000, concurrency=2)
![Page 22: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/22.jpg)
create table web_log (ts timestamp, host
string, …);
Special data types for - IP - Geo Shapes - Objects (dynamic)
![Page 23: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/23.jpg)
insert into web_log (ts,useragent, ..) values (132323,
‚Safari‘, …)
![Page 24: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/24.jpg)
select
update
![Page 25: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/25.jpg)
Anything missing?
• „Kibana“
• see my blog how to add it (‚offically‘ not supported)
• „Marvel“ as detailed performance monitoring solution for Elasticsearch
• SPM from
![Page 26: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/26.jpg)
Using Kibana with Crate
![Page 27: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/27.jpg)
Monitoring - Sematext SPM supported Applications
+*proof of concept,
official release coming soon from Sematext
![Page 28: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/28.jpg)
SPM Monitoring
![Page 29: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/29.jpg)
Our NodeJS Modules• node-crate - DB driver for Crate for NodeJS - help for ‚Waterfall/sails.js‘ ORM appreciated!
We are open for other suggestions, we like sails.js Websocket capability and security features (policies) and would get that ‚for free‘
• winston-crate - logger transport layer for Crate using node-crate
• bro-ids - an interface to the BRO intrusion detection system (IP Monitoring)
• node-spm - Performance monitoring and Metrics API for http://www.sematext.com adapted for NodeJS - custom metrics, events, logging
• SPM - performance metrics, events, SOLR, Elasticsearch, Hadoop, Nginx, …
• using it with Crate might be one of my next blog posts.
• Logsene - centralized logs in the cloud or on premise we might provide winston-logsene
![Page 30: node-crate: NodeJS and big data](https://reader035.fdocuments.in/reader035/viewer/2022081413/547e4434b4af9fef158b55e2/html5/thumbnails/30.jpg)
Thank you for your attention.