Be Lazy & Scale
-
Upload
logmaticio -
Category
Engineering
-
view
380 -
download
0
Transcript of Be Lazy & Scale
Be Lazy & ScaleFull-Text Tagging Billions Of Messages
reverse mapping checking getaddrinfo for xxxxx [xxx.xxx.xxx.xxx] failed - POSSIBLE BREAK-IN ATTEMPT!
pam_unix(sshd:session): session opened for user xxxxxx by (uid=0)
Bad protocol version identification 'root' from xxx.xx.xxx.xx port xxxxx
reverse mapping checking getaddrinfo for xxxxx [xxx.xxx.xxx.xxx] failed - POSSIBLE BREAK-IN ATTEMPT!
Bad protocol version identification 'root' from xxx.xx.xxx.xx port xxxxx
pam_unix(sshd:session): session opened for user xxxxxx by (uid=0)
PercolatorTraditionally you design documents based on your data, store them into an index, and then define queries via the search API in order to retrieve these documents. The percolator works in the opposite direction. First you store queries into an index and then, via the percolate API, you define documents in order to retrieve these queries.https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html
reverse mapping checking getaddrinfo for xxxxx [xxx.xxx.xxx.xxx] failed - POSSIBLE BREAK-IN ATTEMPT!
reverse mapping checking getaddrinfo for xxxxx [xxx.xxx.xxx.xxx] failed - POSSIBLE BREAK-IN ATTEMPT!
"possible break-in attempt!"
"bad protocol version identification"
"session opened"
/0-10/173
$$$
$$$
Bad protocol version identification ...
"bad protocol"Phrase Query
versionTerm Query
ident*Prefix Query
Boolean Query AND, OR, NOT
105s
1 Big OR
+3.8%
109s
160
500000
~ 33%
Tags(real life)Runs(based on real messages)Matches
-8.5%96s
Using single char message
'a'
105s
Trivial 1 Term
clause / tag
-72.8%28.6s
160
~ 29550000
0
~ 33%
Tags(real life)Terminal ClausesRuns(based on real messages)Matches
-41%62.7s
Keep only 1 clause / tag
Perco. Queries Index
Register Queries
In-Memory Index
Bad protocol ...
Bad protocol ...
Perco. Req. Bad protocol ...
Perco. Resp.
ExecuteEachQuery
[0, 1, 2, 3]"POSSIBLE BREAK-IN ATTEMPT!"
connect*
version
Query Term Index
possible --> 0break --> 1in --> 2attempt --> 3version --> 4
Query Clauses Rewritten Clauses
connect*
4
Query Term Indexpossible --> 0break --> 1in --> 2attempt --> 3version --> 4
reverse mapping checking getaddrinfo for xxxxx [xxx.xxx.xxx.xxx] failed - POSSIBLE BREAK-IN ATTEMPT!
Raw Message
[reverse, mapping, checking, getaddrinfo, for, xxxxx, xxx, xxx, xxx, xxx, failed, possible, break, in, attempt]
Analyzed Message
[-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3]Message Rewritten in Query Space
truetruetruetruefalse
Query Term Presence Bitset
[reverse, mapping, checking, getaddrinfo, for, xxxxx, xxx, xxx, xxx, xxx, failed, possible, break, in, attempt]
[-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3]
Analyzed Message
Message Rewritten in Query Space
truetruetruetruefalse
Query Term Presence Bitset
[0, 1, 2, 3]"POSSIBLE BREAK-IN ATTEMPT!"
Quick Check / Early Termination
Actual Check~ contains
[reverse, mapping, checking, getaddrinfo, for, xxxxx, xxx, xxx, xxx, xxx, failed, possible, break, in, attempt]
[-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3]
Analyzed Message
Message Rewritten in Query Space
truetruetruetruefalse
Query Term Presence Bitset
connect*connect*
Brute Force /startsWith (FAST!)
[reverse, mapping, checking, getaddrinfo, for, xxxxx, xxx, xxx, xxx, xxx, failed, possible, break, in, attempt]
[-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3]
Analyzed Message
Message Rewritten in Query Space
truetruetruetruefalse
Query Term Presence Bitset
4version
Simple Lookup
AND/OR/NOT
105s
160Tags
500000Runs
~ 33%Matches
7.3s
x14.4Faste
r 8.8s
x22.2Faste
r
195s
320Tags500000Runs~ 33%Matches