listateofemployerbrandoct122-1351672933073-phpapp01-121031034418-phpapp01 (1)
Solrforindexingandsearchinglogs Lucenerevolution2013 131109120630 Phpapp01
-
Upload
damelys-yaguaracuto -
Category
Documents
-
view
9 -
download
1
Transcript of Solrforindexingandsearchinglogs Lucenerevolution2013 131109120630 Phpapp01
-
Using Solr to Search and Analyze Logs
Radu Gheorghe
@radu0gheorghe@sematext
-
Elasticsearch API
syslogreceiver
Logsene
Kibana
syslogd
Logstash
-
What about ?
-
defining and handling logs in general
4 sets of tools to send logs to
Performance tuning and SolrCloud
-
syslog
Defining and Handling Logs(story time!)
syslog
syslog
syslog
?
-
Requirements
1) Whats wrong?
http://eddysuaib.com/wp-content/uploads/2012/12/Keyword-icon.png
( for debugging)
-
Problem
looooots of messages coming in
http://www.sciencesurvivalblog.com/getting-published/unfinished-manuscripts_2346
-
Solved with no indexing
BUT
-
Elasticsearch
-
Requirements
1) Whats wrong? 2) What will go wrong? (stats)
-
Parsing Raw Logs
BUT
mickey mouse 10user item time
still slow format changes
-
Parsing Raw Logs
BUT
mickey mouse 0 10add error code
still slow format changes
-
Facets. Logging in JSON
2013-11-06 mickey mouse
{ "date": "2013-11-06", "message": "mickey mouse"}
-
Facets. Logging in JSON
2013-11-06 @cee:{"user": "mickey"}
{ "date": "2013-11-06", "user": "mickey"}
2013-11-06 mickey mouse
{ "date": "2013-11-06", "message": "mickey mouse"}
-
Requirements
1) Whats wrong?
2) What will go wrong?
3) Handle logs like production data
-
Requirements
1) Whats wrong?
2) What will go wrong?
3) Handle logs like production data
What is a log?
How to handle logs?
-
4 Ways of Sending Logs to Solr
logger
Logstash
files
-
Schemaless
% cd solr-4.5.1/example/% mv solr solr.bak
% cp -R example-schemaless/solr/ .
-
Automatic ID generation
solrconfig.xml
..
id
http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
-
logger
/dev/logmmjsonparse
omprog + script
-
/dev/log -> parse -> format -> send to Solr
% logger '@cee: {"hello": "world"}'
rsyslog.conf
module(load="imuxsock") # version 7+
-
/dev/log -> parse -> format -> send to Solr
...module(load="mmjsonparse")action(type="mmjsonparse")
-
/dev/log -> parse -> format -> send to Solr
...template(name="CEE" type="list") { property(name="$!all-json") constant(value="\n") }
-
/dev/log -> parse -> format -> send to Solr
...action(type="mmjsonparse")template(name="CEE"module(load="omprog")if $parsesuccess == "OK" then action(type="omprog" binary="/opt/json-to-solr.py" template="CEE")
-
/dev/log -> parse -> format -> send to Solr
import json, pysolr, syssolr = pysolr.Solr('http://localhost:8983/solr/')
while True: line = sys.stdin.readline() doc = json.loads(line) solr.add([doc])
-
Avro
MorphlineSolr Sink
-
Avro -> buffer -> parse -> send to Solr
https://github.com/mpercy/flume-log4j-example
flume.confagent.sources = avroSrc
agent.sources.avroSrc.type = avroagent.sources.avroSrc.bind = 0.0.0.0agent.sources.avroSrc.port = 41414
-
Avro -> buffer -> parse -> send to Solr
flume.conf
agent.channels = solrMemoryChannel
agent.channels.solrMemoryChannel.type = memory
agent.sources.avroSrc.channels = solrMemoryChannel
-
Avro -> buffer -> parse -> send to Solr
flume.conf
agent.sinks = solrSink
agent.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSinkagent.sinks.solrSink.morphlineFile = conf/morphline.conf
agent.sinks.solrSink.channel = solrMemoryChannel
-
Avro -> buffer -> parse -> send to Solr
morphline.conf... commands : [ { readLine { charset : UTF-8 }}
{ grok { dictionaryFiles : [conf/grok-patterns] expressions : { message : """%{INT:pid} %{DATA:message}"""...
https://github.com/cloudera/search/tree/master/samples/solr-nrt/grok-dictionaries
-
Avro -> buffer -> parse -> send to Solr
morphline.conf
SOLR_LOCATOR : { collection : collection1 #zkHost : "127.0.0.1:2181" solrUrl : "http://localhost:8983/solr/"}... commands : [... { loadSolr { solrLocator : ${SOLR_LOCATOR}...
-
fluent-logger fluent-plugin-solr
-
fluent-logger -> fluentd -> fluent-plugin-solr
% pip install fluent-logger
from fluent import sender,event
sender.setup('solr.test')
event.Event('forward', {'hello': 'world'})
-
fluent-logger -> fluentd -> fluent-plugin-solr
type forward
type solr host localhost port 8983 core collection1
-
fluent-logger -> fluentd -> fluent-plugin-solr
% gem install fluent-plugin-solr
doc = Solr::Document.new(:hello => record["hello"])
https://github.com/btigit/fluent-plugin-solr
out_solr.rb
-
file input solr_http output
Logstashfile
grok filter
-
logstash.conf:
input { file { path => "/tmp/testlog" }}
file input -> grok filter -> solr_http output
% echo '2 world' >> /tmp/testlog
-
logstash.conf:
filter { grok { match => ["message", "%{NUMBER:pid} %{GREEDYDATA:hello}"] }}
file input -> grok filter -> solr_http output
{"pid": "2", "hello":"world"}
-
logstash.conf:
output { solr_http { # master or v1.2.3+ solr_url => "http://localhost:8983/solr" }}
file input -> grok filter -> solr_http output
-
Fast and Cloud
-
It Depends
http://www.bigskytech.com/wp-content/uploads/2011/02/guage.png
load test monitor: SPM
20% off: LR2013SPM20
-
|>>>>|Single Core: # of docs/update
http://static.memrise.com.s3.amazonaws.com/uploads/blog-pictures/Simpsons_Updates.bmp
-
|>>>>|Single Core: Commits
http://cache.desktopnexus.com/thumbnails/1306-bigthumbnail.jpghttp://www.musicfestivaljunkies.com/wp-content/uploads/2012/01/HardLogo.png
...
false ???
???
-
|>>>>|Single Core: Size and Merges
http://sweetclipart.com/multisite/sweetclipart/files/scissors_blue_silver.pnghttp://mergewords.com/gfx/logo-big.png
omitNorms="true"omitTermFreqAndPositions="true" ??
-
|>>>>|Single Core: Caches
http://vector-magz.com/wp-content/uploads/2013/06/diamond-clip-art4.pnghttp://www.clker.com/cliparts/1/f/6/3/11971228961330048838SaraSara_Ice_cube_2.svg.med.png
http://clipartist.info/RSS/openclipart.org/2011/May/02-Monday/migrating_penguin_penguinmigrating-555px.png
-
SolrCloud: ZooKeeper
bin/zkServer.sh start
OR
java -DzkRun -jar start.jarhttp://www.clker.com/cliparts/c/a/8/d/1331060720387485902Roaring%20Tiger.svg.hi.png
http://fc03.deviantart.net/fs71/f/2012/196/6/a/piggy_back_rides_are_the_best_rides__by_yipped-d57b3sh.png
-
SolrCloud: ZooKeeper
zkcli.sh -cmd upconfig \ -zkhost SERVER:2181 \ -confdir solr/collection1/conf/ \ -confname start
-Dbootstrap_confdir=solr/collection1/conf -Dcollection.configName=start
http://www.clker.com/cliparts/c/a/8/d/1331060720387485902Roaring%20Tiger.svg.hi.pnghttp://fc03.deviantart.net/fs71/f/2012/196/6/a/piggy_back_rides_are_the_best_rides__by_yipped-d57b3sh.png
-
SolrCloud: Start Nodes
java -DzkHost=SERVER:2181 -jar start.jar
-
Timed Collections
04Nov
05Nov
06 Nov
07Nov
search latest
search all
index
optimize
-
Collections API
05Nov
06Nov
07 Nov
08Nov action=CREATE&name=08Nov
&numShards=4
action=DELETE&name=05Nov
-
Aliases. Optimize
05Nov
06Nov
07 Nov
08Nov
action=CREATEALIAS&name=ALL&collection=06Nov,07Nov,08Nov
action=CREATEALIAS&name=LATEST&collection=08Nov07Nov/update?optimize=true
-
logs =production
data
-
logs =production
data
Logstash
-
logs =production
data
Logstash
docs/updatecommits
mergeFactor
omit*docValues
caches
-
logs =production
data
Logstash
docs/updatecommits
mergeFactor
omit*docValues
caches
-
logs =production
data
Logstash
docs/updatecommits
mergeFactor
omit*docValues
caches
time
Collections APIaliases
optimize
-
Were hiring!
sematext.com/about/jobs
-
Thank you!
[email protected]@radu0gheorghe @sematext
And @ our booth :)