Solrforindexingandsearchinglogs Lucenerevolution2013 131109120630 Phpapp01

60
Using Solr to Search and Analyze Logs Radu Gheorghe @radu0gheorghe @sematext

Transcript of Solrforindexingandsearchinglogs Lucenerevolution2013 131109120630 Phpapp01

  • Using Solr to Search and Analyze Logs

    Radu Gheorghe

    @radu0gheorghe@sematext

  • Elasticsearch API

    syslogreceiver

    Logsene

    Kibana

    syslogd

    Logstash

  • What about ?

  • defining and handling logs in general

    4 sets of tools to send logs to

    Performance tuning and SolrCloud

  • syslog

    Defining and Handling Logs(story time!)

    syslog

    syslog

    syslog

    ?

  • Requirements

    1) Whats wrong?

    http://eddysuaib.com/wp-content/uploads/2012/12/Keyword-icon.png

    ( for debugging)

  • Problem

    looooots of messages coming in

    http://www.sciencesurvivalblog.com/getting-published/unfinished-manuscripts_2346

  • Solved with no indexing

    BUT

  • Elasticsearch

  • Requirements

    1) Whats wrong? 2) What will go wrong? (stats)

  • Parsing Raw Logs

    BUT

    mickey mouse 10user item time

    still slow format changes

  • Parsing Raw Logs

    BUT

    mickey mouse 0 10add error code

    still slow format changes

  • Facets. Logging in JSON

    2013-11-06 mickey mouse

    { "date": "2013-11-06", "message": "mickey mouse"}

  • Facets. Logging in JSON

    2013-11-06 @cee:{"user": "mickey"}

    { "date": "2013-11-06", "user": "mickey"}

    2013-11-06 mickey mouse

    { "date": "2013-11-06", "message": "mickey mouse"}

  • Requirements

    1) Whats wrong?

    2) What will go wrong?

    3) Handle logs like production data

  • Requirements

    1) Whats wrong?

    2) What will go wrong?

    3) Handle logs like production data

    What is a log?

    How to handle logs?

  • 4 Ways of Sending Logs to Solr

    logger

    Logstash

    files

  • Schemaless

    % cd solr-4.5.1/example/% mv solr solr.bak

    % cp -R example-schemaless/solr/ .

  • Automatic ID generation

    solrconfig.xml

    ..

    id

    http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/

  • logger

    /dev/logmmjsonparse

    omprog + script

  • /dev/log -> parse -> format -> send to Solr

    % logger '@cee: {"hello": "world"}'

    rsyslog.conf

    module(load="imuxsock") # version 7+

  • /dev/log -> parse -> format -> send to Solr

    ...module(load="mmjsonparse")action(type="mmjsonparse")

  • /dev/log -> parse -> format -> send to Solr

    ...template(name="CEE" type="list") { property(name="$!all-json") constant(value="\n") }

  • /dev/log -> parse -> format -> send to Solr

    ...action(type="mmjsonparse")template(name="CEE"module(load="omprog")if $parsesuccess == "OK" then action(type="omprog" binary="/opt/json-to-solr.py" template="CEE")

  • /dev/log -> parse -> format -> send to Solr

    import json, pysolr, syssolr = pysolr.Solr('http://localhost:8983/solr/')

    while True: line = sys.stdin.readline() doc = json.loads(line) solr.add([doc])

  • Avro

    MorphlineSolr Sink

  • Avro -> buffer -> parse -> send to Solr

    https://github.com/mpercy/flume-log4j-example

    flume.confagent.sources = avroSrc

    agent.sources.avroSrc.type = avroagent.sources.avroSrc.bind = 0.0.0.0agent.sources.avroSrc.port = 41414

  • Avro -> buffer -> parse -> send to Solr

    flume.conf

    agent.channels = solrMemoryChannel

    agent.channels.solrMemoryChannel.type = memory

    agent.sources.avroSrc.channels = solrMemoryChannel

  • Avro -> buffer -> parse -> send to Solr

    flume.conf

    agent.sinks = solrSink

    agent.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSinkagent.sinks.solrSink.morphlineFile = conf/morphline.conf

    agent.sinks.solrSink.channel = solrMemoryChannel

  • Avro -> buffer -> parse -> send to Solr

    morphline.conf... commands : [ { readLine { charset : UTF-8 }}

    { grok { dictionaryFiles : [conf/grok-patterns] expressions : { message : """%{INT:pid} %{DATA:message}"""...

    https://github.com/cloudera/search/tree/master/samples/solr-nrt/grok-dictionaries

  • Avro -> buffer -> parse -> send to Solr

    morphline.conf

    SOLR_LOCATOR : { collection : collection1 #zkHost : "127.0.0.1:2181" solrUrl : "http://localhost:8983/solr/"}... commands : [... { loadSolr { solrLocator : ${SOLR_LOCATOR}...

  • fluent-logger fluent-plugin-solr

  • fluent-logger -> fluentd -> fluent-plugin-solr

    % pip install fluent-logger

    from fluent import sender,event

    sender.setup('solr.test')

    event.Event('forward', {'hello': 'world'})

  • fluent-logger -> fluentd -> fluent-plugin-solr

    type forward

    type solr host localhost port 8983 core collection1

  • fluent-logger -> fluentd -> fluent-plugin-solr

    % gem install fluent-plugin-solr

    doc = Solr::Document.new(:hello => record["hello"])

    https://github.com/btigit/fluent-plugin-solr

    out_solr.rb

  • file input solr_http output

    Logstashfile

    grok filter

  • logstash.conf:

    input { file { path => "/tmp/testlog" }}

    file input -> grok filter -> solr_http output

    % echo '2 world' >> /tmp/testlog

  • logstash.conf:

    filter { grok { match => ["message", "%{NUMBER:pid} %{GREEDYDATA:hello}"] }}

    file input -> grok filter -> solr_http output

    {"pid": "2", "hello":"world"}

  • logstash.conf:

    output { solr_http { # master or v1.2.3+ solr_url => "http://localhost:8983/solr" }}

    file input -> grok filter -> solr_http output

  • Fast and Cloud

  • It Depends

    http://www.bigskytech.com/wp-content/uploads/2011/02/guage.png

    load test monitor: SPM

    20% off: LR2013SPM20

  • |>>>>|Single Core: # of docs/update

    http://static.memrise.com.s3.amazonaws.com/uploads/blog-pictures/Simpsons_Updates.bmp

  • |>>>>|Single Core: Commits

    http://cache.desktopnexus.com/thumbnails/1306-bigthumbnail.jpghttp://www.musicfestivaljunkies.com/wp-content/uploads/2012/01/HardLogo.png

    ...

    false ???

    ???

  • |>>>>|Single Core: Size and Merges

    http://sweetclipart.com/multisite/sweetclipart/files/scissors_blue_silver.pnghttp://mergewords.com/gfx/logo-big.png

    omitNorms="true"omitTermFreqAndPositions="true" ??

  • |>>>>|Single Core: Caches

    http://vector-magz.com/wp-content/uploads/2013/06/diamond-clip-art4.pnghttp://www.clker.com/cliparts/1/f/6/3/11971228961330048838SaraSara_Ice_cube_2.svg.med.png

    http://clipartist.info/RSS/openclipart.org/2011/May/02-Monday/migrating_penguin_penguinmigrating-555px.png

  • SolrCloud: ZooKeeper

    bin/zkServer.sh start

    OR

    java -DzkRun -jar start.jarhttp://www.clker.com/cliparts/c/a/8/d/1331060720387485902Roaring%20Tiger.svg.hi.png

    http://fc03.deviantart.net/fs71/f/2012/196/6/a/piggy_back_rides_are_the_best_rides__by_yipped-d57b3sh.png

  • SolrCloud: ZooKeeper

    zkcli.sh -cmd upconfig \ -zkhost SERVER:2181 \ -confdir solr/collection1/conf/ \ -confname start

    -Dbootstrap_confdir=solr/collection1/conf -Dcollection.configName=start

    http://www.clker.com/cliparts/c/a/8/d/1331060720387485902Roaring%20Tiger.svg.hi.pnghttp://fc03.deviantart.net/fs71/f/2012/196/6/a/piggy_back_rides_are_the_best_rides__by_yipped-d57b3sh.png

  • SolrCloud: Start Nodes

    java -DzkHost=SERVER:2181 -jar start.jar

  • Timed Collections

    04Nov

    05Nov

    06 Nov

    07Nov

    search latest

    search all

    index

    optimize

  • Collections API

    05Nov

    06Nov

    07 Nov

    08Nov action=CREATE&name=08Nov

    &numShards=4

    action=DELETE&name=05Nov

  • Aliases. Optimize

    05Nov

    06Nov

    07 Nov

    08Nov

    action=CREATEALIAS&name=ALL&collection=06Nov,07Nov,08Nov

    action=CREATEALIAS&name=LATEST&collection=08Nov07Nov/update?optimize=true

  • logs =production

    data

  • logs =production

    data

    Logstash

  • logs =production

    data

    Logstash

    docs/updatecommits

    mergeFactor

    omit*docValues

    caches

  • logs =production

    data

    Logstash

    docs/updatecommits

    mergeFactor

    omit*docValues

    caches

  • logs =production

    data

    Logstash

    docs/updatecommits

    mergeFactor

    omit*docValues

    caches

    time

    Collections APIaliases

    optimize

  • Were hiring!

    sematext.com/about/jobs

  • Thank you!

    [email protected]@radu0gheorghe @sematext

    And @ our booth :)