Séminaire Big Data Alter Way - Elasticsearch - octobre 2014

Post on 01-Dec-2014

313 views 0 download

description

 

Transcript of Séminaire Big Data Alter Way - Elasticsearch - octobre 2014

Agenda & Intervenants

Introduction

Alter Way in 2 slides

Alter Way in 2 slides

Elasticsearch in 1 slide

• More than 11 million downloads

• 650,000 New Downloads per Month

• 1000s of Mission Critical Implementations

• Top Investors: Benchmark Capital, Index

Ventures

• Seasoned Executive Team

– Founded by Creator of Elasticsearch

– Seasoned Executives from SpringSource

Les enjeux de la recherche à

l’ère du BigData

Big Data in Todayʼs Business and Technology

Environment : some significant figures

• 2.7 Zetabytes of data exist in the digital universe today. (=1 billion Terabytes)

• 235 Terabytes of data has been collected by the U.S. Library of Congress in

April 2011.

• Facebook stores, accesses, and analyzes 30+ Petabytes of user generated

data.

• Akamai analyzes 75 million events per day to better target advertisements.

• Walmart handles more than 1 million customer transactions every hour,

which is imported into databases estimated to contain more than 2.5 petabytes

of data.

• The largest AT&T database boasts titles including the largest volume of data in

one unique database (312 terabytes) and the second largest number of rows in

a unique database (1.9 trillion), which comprises AT&Tʼ’s extensive calling

records.

• Hadoop :

– 94% of Hadoop users perform analytics on large volumes of data not

possible before

– 88% analyze data in greater detail;

– while 82% can now retain more of their data.

The Rapid Growth of Unstructured Data

• YouTube users upload 48 hours of new video every minute of the

day.

• 500+ new websites are created every minute of the day.

• Brands and organizations on Facebook receive 34,722 Likes every

minute of the day.

• 100 terabytes of data uploaded daily to Facebook.

• According to Twitterʼ’s own research in early 2012, it sees roughly

175 million tweets every day, and has more than 465 million

accounts.

• 30 Billion pieces of content shared on Facebook every month.

Data production will be 44 times greater in 2020 than it was in 2009.

Big Data & Real Business Issues

• 25+ % of decision‐makers surveyed predict that data volumes in their

companies will rise by more than 60% by the end of 2014, with the

average of all respondents anticipating a growth of no less than 42 %.

• 40% projected growth in global data generated per year vs. 5% growth in

global IT spending.

• According to estimates, the volume of business data worldwide, across all

companies, doubles every 1.2 years.

– Poor data can cost businesses 20%–35% of their operating revenue.

– Bad data or poor data quality costs US businesses $600 billion annually.

• 75+ % of decision-makers surveyed anticipate significant impacts in the

domain of storage systems as a result of the “Big Data” phenomenon.

• We anticipate a new challenge : to be able to Search and Analyse all

those datas … in real time !

Elasticsearch

A solution already in production

with significant french

implementations

Revolutionizing Data Search and

AnalyticsRichard Maurer– SEMEA Territory Manager

Purpose of Elasticsearch

• Organize data and make it easily accessible

– Through powerful search and analytics

– Easily consumable (even for non-data scientists)

– Elegantly handles extremely large data volumes

– Delivers results in real time

• Technology stack agnostic

• Used across all market verticals

Features of Elasticsearch

• Structured & unstructured search

• Advanced analytics capabilities

• Unmatched performance

• Real-time results

• Highly scalable

• User friendly installation and maintenance

Elasticsearch 1.4: a solution

production ready• Real time data Indexation

• Distributed

• High Availability

• Schema Free

• Real Time Data Analytics

• Multi Tenancy

• Much more….

Unprecedented Uptake

Elasticsearch has more than11 Million downloads

… and 650,000 more each month

Cumulative

French Users

French Use Cases

Bouygues Telecom:

Uses Elasticsearch in their Big Data Platform. Cut their web resolution time by 10X

Daily Motion:

Indexing their 20 million Videos on Elasticsearch. On production for over 2 years

Voyages SNCF

They have recently announced ES has being live on their “Usine Logicielle”

Fotolia:

Search Engine made on Elasticsearch, to access 24 Million Images, move over to ES

Orange:

With over 1.2 billion docs, looking at better solution and cost reduction

Product Offerings:Support Throughout Your Project

1. Core Elasticsearch Training (2 days)

2. ELK Workshop (1 day)

3. Development and Production Support

4. Marvel, Monitoring of your ES clusters

2: Support

Resources

• www.elasticsearch.com

• www.elasticsearch.org

• User Groups:

http://www.elasticsearch.org/community/forum/

• Contact:

Richard Maurer

Territory Manager

Richard.maurer@elasticsearch.com

MAKE SENSE OF YOUR (BIG) DATA!

David Pilato Technical advocate!!elasticsearch. @dadoonet

StartUp

data ?

StartUp

StartUp

StartUp

StartUp

StartUp

StartUp

BIG data ?

StartUp

BIG data ?

StartUp

Source: http://www.csc.com/insights/flxwd/78931-big_data_just_beginning_to_explode

35.000.000.000.000.000 mb

StartUp

Source: http://www.domo.com/learn/data-never-sleeps-2

StartUp

search = like % ?SELECT ! doc.*, country.* !FROM ! doc, country!WHERE ! doc.country_code = country.code AND! doc.date_doc > to_date('2011-12', 'yyyy-mm') AND ! doc.date_doc < to_date('2012-01', 'yyyy-mm') AND ! lower(country.name) = 'france' AND ! lower(doc.comment) LIKE ‘%product%' AND lower(doc.comment) LIKE ‘%david%';

StartUp

Search engine ?

StartUp

elasticsearch ?

plug & play

REST/JSON

scalable

Apache 2 license

Lucene

elasticsearch

Start…

$ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.1.1.tar.gz!$ tar -xf elasticsearch-1.1.1.tar.gz!$ ./elasticsearch-1.1.1/bin/elasticsearch![INFO ][node ][Ghost Maker] {1.1.1}[5645]: initializing

… and play!$ curl -XPUT localhost:9200/sessions/session/1 -d '{! "title" : "Elasticsearch",! "subtitle" : "Make sense of your (BIG) data !",! "date" : "2014-05-20T10:30:00",! "tags" : [ "elasticsearch", "alterway", "bigdata" ],! "speakers" : [{! "first_name" : "David", ! "last_name" : "Pilato" ! }]!}'

Search!$ curl http://localhost:9200/sessions/session/_search -d' { "query": { "multi_match": { "query": "elasticsearch alterway david", "fields": [ "title^3", "tags^2", "speakers.first_name" ] } }, "post_filter": { "range": { "date": { "from": "2014-05-01", "to": "2014-06" } } } }'

StartUp

Compute?

$ curl http://localhost:9200/sessions/session/_search -d' { "query": { ... }, "aggs": { "by_date": { "date_histogram": { "field": "date", "interval": "day", "format" : "dd/MM/yyyy" } } } }'

"by_date": [ { "key_as_string": "03/04/2014", "doc_count": 1 }, { "key_as_string": "12/04/2014", "doc_count": 2 }, { "key_as_string": "16/04/2014", "doc_count": 3 } ]

Compute!

#mstechdays #elasticsearch StartUp

• logs!

• twitter!

• github!

• marketing data!

• ...!

• your data!

• your big data

Let’s make sense of …

#mstechdays #elasticsearch StartUp

• logs!

• twitter!

• github!

• marketing data!

• ...!

• your data!

• your big data

Let’s make sense of …{ "name":"Pilato David", "dateOfBirth":"1971-12-26", "gender":"male", "children":3, "marketing":{ "fashion":334, "music":3363, "hifi":2351 }, "address":{ "country":"France", "city":"Paris", "location": [2.332395, 48.861871] } }

démo#mstechdays #elasticsearch StartUp

MAKE SENSE OF YOUR (BIG) DATA!

let’s inject some marketing documents…

elasticsearch.elasticsearch

kibana

logstash

Marvel

@dadoonet

thanks

Comment insérer ElasticSearch

dans votre Système d’Information

et en tirer le meilleur parti

ElasticSearch to do What ?

STORE

SEARCH

ANALYZE

Are you ready to use

ElasticSearch in your IT?

What you need to run it

• Java 8 update 20 or later, or Java 7 update 55 or later

• Only Oracle’s Java and the OpenJDK are supported.

Github projects• Many projects• Big activity• Many languages

6 mois !

Clients

Scripting Plugins Language

Why it ‘s easy

• One to many• ~ Zero conf• Cloud oriented• Scalability DNA• Replication• Sharding• Distributed• Resilience• Snapshot• Restore

Start Small Grow Big

• One to many• ~ Zero conf• Cloud oriented• Scalability DNA• Replication• Sharding• Distributed• Resilience• Snapshot• Restore

Start Small Grow Big

Where / How can you use

ElasticSearch?

VIA

Centralized Log Storage 1/2

Centralized Log Storage 2/2

CMS Search Engine

• Faceting• Fuzzy Search• Speed• Auto Completion• Geo Search• Log Analysis

Ecommerce Enhanced Search

Engine

• REST based• Memory and I/O efficient• Adaptive I/O• Map/Reduce API support• Pig support• Hive support

elasticsearch-hadoop

Combining Hadoop & ElasticSearch

What Else ?

It’s up to you to decide what to build with ES

Analysis / Dasboards

Some Examples

Kibana examples : IRC Activity

Kibana examples : Pfsense Monitoring

Kibana examples : Windows Events

Kibana examples : Inventory

Kibana examples : Syslog

Kibana examples : Web Activity

ES = No Limits

Conclusion

Conclusion

• Il est temps de révolutionner la façon dont vous valorisez

vos données : offrez Elasticsearch à vos applicatifs !

• La stack ELK (Elasticsearch, Logstash, Kibana) est déjà

massivement utilisée en production !

• Faites vous accompagner pour bénéficier des bonnes

pratiques et du support à tous les stades de votre projet :

conception, développement, production

Questions / Réponses