Log everything!

Log everything! Dr. Stefan Schadwinkel und Mike Lohmann

Who we are.

Log everything

Mike Lohmann Architektur

Author (PHPMagazin, IX, heise.de)

Dr. Stefan Schadwinkel Analytics

Author (heise.de, Cereb.Cortex, EJN, J.Neurophysiol.)

Agenda.

Log everything

§  What we do. What we need to do. What we are doing.

§  Requirement: Log everything!

§  Infrastructure and technologies.

§  We want happy business users.

Icans GmbH

Log everything

Numberfacts of PokerStrategy.com

Log everything

6.000.000 Registered Users

PokerStrategy.com Education since 2005

19 Languages

2.800.000 PI/Day

700.000 Posts/Day

7.600.000 Requests/Day

Topics of this talk

Log everything

- How to use existing technologies and standards. - Scalability and simplicity of the solution - „Good enough“ for now! - Showing way from requirement to solution. - OpenSource Sf2 bundles for logging.

- Livedemo.

- Out of the box solution - Ready to use scripts

What we do.

Log everything

§  We teach Poker.

§  We create webapplications.

§  We serve millions of users in different countries respecting

a multitude of market rules.

§  We make business decisions driven by complex

data analytics.

What we need to do.

Log everything

§  We need to try out other teaching topics, fast.

§  We need to gather data from all of these „try outs“ to accumulate them

and build business decisions on their analysis.

§  We need a bigger infrastructure to gather more data.

§  We need to hire more (good) people! J

What we are doing.

Log everything

§  We build ECF (Education Community Framework).

§  We (can) log everything!

§  We (now) use Amazon S3 and Amazon EMR to have a scaling

storage and map reduce solution.

§  We hire (good) people! J

Requirement: Log everything.

Log everything

§  „Are you mad?!“

§  „Be more specific, please!“

§  „But what about the user‘s data?!“

Logging Tools / Technologies

Producer

Symfony2 Application Server and Databases

15.10.12

Transport

Now: RabbitMQ

Erlang Consumer

Was: Flume

Storage

Now: S3 Storage Hadoop via

Amazon EMR

Virtualized Inhouse Hadoop

Analytics

MapReduce Hive

BI via QlikView

Logging Infrastructure

Producer

15.10.12

Transport

Storage

Analytics Databases

Rabbit MQ

Consumer

QlikView

Graylog

Zabbix

Reverse Proxy

App 1-x

Hadoop - Cluster

Producer

15.10.12

Page Controller

Monolog-Logger

Shovel

Local RabbitMQ

PageHit Event

Listener

Processor

Handler

Formatter

PageHit-Event

Logger::log()

LogMessage, JSON

Producer

15.10.12

§  LoggingComponent: Provides interfaces, filters and handlers

§  LoggingBundle: Glues all together with Symfony2

h=ps://github.com/ICANS/IcansLoggingComponent h=ps://github.com/ICANS/IcansLoggingBundle

Transport – First Try

15.10.12

§  Hey, if we use Hadoop, why not use Flume?

-  Part of the Ecosystem

-  Central config

-  Extensible via Plugins

-  Flexible Flow Configuration

-  How? : Flume Nodes à Flume Sinks

Transport – First Try

15.10.12

§  But, .. wait!

-  Ecosystem? Just like Hadoop version numbers…

-  Admins say: Central config woes!

-  issues: multi-master, logical vs. physical nodes, Java heap

space, etc.

-  Will my plugin run with flume-ng?

-  Ever tried to keep your complex flow and switch reliability levels?

Read: Our admins still hate me …

Transport – Second Try

15.10.12

§  RabbitMQ vs. Flume Nodes

-  Each app server has ist own local RabbitMQ

-  The local RabbitMQ shovels ist data to a central RabbitMQ

cluster

-  Similar to the Flume Node concept

-  Decentralized config: Producers and consumers simply connect

Transport – Second Try

15.10.12

§  But, .. wait! We still need Sinks.

-  Custom crafted RabbitMQ consumers

-  We could write them in PHP, but ..

-  Erlang, teh awesome!

- Battle-hardened OTP framework.

-  „Let it crash!“ .. and recover.

- Hot code change. If you want.

Read: Runs forever.

Storage – First Try

15.10.12

§  Use out-of-the-box Hadoop (Cloudera)

§  But:

-  Virtualized Infrastructure

-  Unknown usage patterns

-  Must be cost effective

-  Major Hadoop version upgrades

Hadoop

Storage – Second Try

15.10.12

§  Use Amazon Webservices

§  Provides flexible virtualized infrastructure

§  Cost-effective storage: S3

§  Hadoop on demand: EMR

Amazon S3

Storage – Storage Amazon S3

15.10.12

§  Erlang RabbitMQ consumer simply copies the

incoming data to S3

- Easy: exchange „hadoop“ command with „s3cmd“

Amazon S3

Storage – Storage Amazon S3

15.10.12

§  S3 bucket receives many small, compressed log file chunks

§  Amazon provides s3DistCp which does distributed data copy:

-  Aggregate many small files into partitioned large chunks

-  Change compression

Amazon S3

Analytics

15.10.12

§  We want happy business users.

§  We want to answer questions.

- People want answers to questions they have. Now.

- No, they couldn‘t tell you that question yesterday. If they had

known, they would have already asked for the answer. Yesterday.

§  We also want data-driven applications.

-  Production system analysis.

-  Fraud prevention.

-  Recommendations.

-  Social metrics for our users.

Analytics

15.10.12

§  Remember MapReduce.

- Custom Jobs.

- Streaming: Use your favorite.

-  Java API: Cascading. Use your favorite: Java, Groovy, Clojure,

Scala.

-  Data Queries.

-  Hive: similar to SQL.

-  Pig: Data flow.

-  Cascalog: Datalog-like QL using Clojure and Cascading.

Analytics

15.10.12

§  Cascalog is Clojure, Clojure is Lisp

(?<- (stdout) [?person] (age ?person ?age) … (< ?age 30))

Query Operator

Cascading Output Tap

Columns of the dataset generated

by the query

„Generator“ „Predicate“

§  as many as you want

§  both can be any clojure function

§  clojure can call anything that is

available within a JVM

Analytics

15.10.12

§  We use Cascalog to preprocess and organize that incoming flow of log messages:

Analytics

15.10.12

§  Let‘s run the Cascalog processing on Amazon EMR:

./elastic-mapreduce --create --name „Log Message Compaction"

--bootstrap-action s3://[BUCKET]/mapreduce/configure-daemons

--num-instances $NUM

--slave-instance-type m1.large

--master-instance-type m1.large

--jar s3://[BUCKET]/mapreduce/compaction/icans-cascalog.jar

--step-action TERMINATE_JOB_FLOW

--step-name "Cascalog"

--main-class icans.cascalogjobs.processing.compaction

--args "s3://[BUCKET]/incoming/*/*/*/","s3://[BUCKET]/icanslog","s3://[BUCKET]/icanslog-error

Analytics

15.10.12

§  After the Cascalog Query we have:

s3://[BUCKET]/icanslog/[WEBSITE]/icans.content/year=2012/month=10/day=01/part-00000.lzo

Hive ParSSoning!

Analytics

15.10.12

§  Now we can access the log data within Hive:

Analytics

15.10.12

§  Now we can run Hive queries on the [WEBSITE]_icanslog_content table!

§  But we also want to store the result to S3.

Analytics

15.10.12

§  Now, get the stats:

Analytics

15.10.12

§  We can now simply copy the data from S3 and import in any local analytical tool, like:

-  Excel (It must really make business people happy…)

-  QlikView (Anyone can be happy with it…)

-  R (If I want an answer…)

Merci.

15.10.12

Questions

Contacts.

15.10.12

Dr. Stefan Schadwinkel

stefan.schadwinkel@icans-gmbh.com

ICANS_StScha

Mike Lohmann

mike.lohmann@icans-gmbh.com

mikelohmann

Tools/Technologies

15.10.12

ICANS GmbH Valentinskamp 18 20354 Hamburg Germany Phone: +49 40 22 63 82 9-0 Fax: +49 40 38 67 15 92 Web: www.icans-gmbh.com

Log everything!

Documents

Transcript of Log everything!

Mathematics for Business and Economics - Ibrahms.emu.edu.tr/sbekar/MATH103_CH4_Lecture14The Logarithmic... · Ex4: log 16 36 - log 16 12 = ... log log 36 log 1 log log 36 4log 3 1

Everything You Always Wanted to Know about Log … paper reviews the original Log-Periodic Power Law (LPPL) model for ﬁnancial bubble modelling, and discusses early criticism and

EST phase 1 - Quartier Équestre · 2016. 9. 15. · 48 log. 210 48 log. 211 48 log. 212 6 log. 415 6 log. 18 log. 416 6 log. 417 6 log. 419 6 log. 429 6 log. 428 6 log. 427 6 log.

Log Log 4 Decadas

LOG HOMES - Michigan...LOG HOMES - Michigan ... log homes

Introducing WatchGuard Dimension. Oceans of Log Data The 3 Dimensions of Big Data Volume –“Log Everything - Storage is Cheap” –Becomes too much data –

Everything Touches Everything

New Food Diary & Exercise Log - WordPress.com · 2015. 6. 6. · Food Diary & Exercise Log. How to use your food diary record everything keep track of everything you eat. write down

This reading log belongs to - advancedenglishllnadvancedenglishlln.weebly.com/uploads/3/8/9/4/38949699/skellig... · This reading log belongs to: Everything seemed to have been going

Sk8ing on Thin Ice: A Crash Course in Kubernetes & Security · Source: Verizon DBIR - 2016. The retirement benefit that benefits everyone 19 Watch Everything Monitoring Log everything.

Modèle de commlink OS Implanté Modèle de commlink OS ... · Hacking Electronique Piratage Log Log Log Log Log Log Log Commlink Réponse Signal Système ... Fiches de personnage

DECK36 - Log everything! and Realtime Datastream Analytics with Storm

Analyzing Proxy and DNS Log Files - USALearning Proxy and DNS Log Files ... So take your proxy list, and take out . everything that's Google.com, msn.com, espn, sei.edu, or whatever

FeatureHub : towards collaborative data science … · • Log in to hosted Jupyter Notebook environment ... Automate everything else and produce output quickly • Engineer a cloud

Homework Log Tues 12/8 Lesson Rev Learning Objective: To remember everything in Chapter 4! Hw: #411 P. 268 1 - 47 odd.

Log Log Final

MATEMÁTICA A ABC k log ABC log k logA logB logC log kblog.portalpositivo.com.br/matematicaspe/files/2010/05/resolucoes... · 8 2 2 22 2 2 log x log x 8 log x log x 8 log x log x

Incident Log Book - Grenfell Tower Inquiry... · This log book is the official record of everything you did during an incident. It is where you record the decisions taken, the reasons

1 Module 6 Log Manager COP 6730. 2 Log Manager Log knows everything. It is the temporal database –The online durable data are just their current versions.

ss.teamsit.com18. (210%0.5) logos 2 log x log x x x 211 A-NET log,4 log 4 4 log uaaril x 2 9-1) (2Y - 3) log 2 ay log 2 - 3 log 2 y log 4 log 8 log 8 - log 3