ast-rspamd

19
Expressions speed up … when you are too lazy to calculate everything

Transcript of ast-rspamd

Page 1: ast-rspamd

Expressions speed up… when you are too lazy to calculate everything

Page 2: ast-rspamd

What is optimised

• General logic expression:

• Complex logic:

• Arithmetic logic:

• All above:

A & B & C & D

(A & !B) | C & D

A + B + C + D > 2

F & (A + B + C + D > 2)

Page 3: ast-rspamd

SA approach

• Evaluate all rules (terribly slow)

• Use only some of them

• Use regexps for everything

Page 4: ast-rspamd

Old rspamd• Reverse Polish Notation

A & B & C -> A B C & &

• Evaluated rules, applying basic optimisations:A & B & C & D

0 & 1 & 1 & 0

A & B & C & DIf A equal to 0 there is no need to evaluate other components

Page 5: ast-rspamd

Can we do better?

• We want to organise evaluations to execute faster ops before expensive ops

• We want to have a generic evaluation of the arguments to decide when to stop and return

Page 6: ast-rspamd

Solution: AST

• Abstract Syntax Tree - a tree of expressions

• Optimize branches in the tree by execution time and frequency

• Apply greedy algorithm to minimise calculations

• Be as lazy as possible (laziness is good!)

Page 7: ast-rspamd

AST building&

| C

! B

A

Page 8: ast-rspamd

AST eval (naive)&

| C

! B

A

A = 0, B = 1, C = 0

0

11 0

1

0

Page 9: ast-rspamd

AST branches cut&

| C

! B

A A = 0, B = 1, C = 00

11 0

1

0

Eval order

• 1 branch skipped

Page 10: ast-rspamd

Can we do better?

• In the previous slide we cut merely a single branch

• Not good, still have to evaluate too many unnecessary stuff

Page 11: ast-rspamd

AST branches reorder&

|C

! B

A

A = 0, B = 1, C = 0

0

110

1

0

Eval order

• 4 branches skipped

Page 12: ast-rspamd

AST branches reorder

• Prioritise branches with fewer operations in the underneath levels

• Skip unnecessary evaluations

• Reduce the total running time of the expression

Page 13: ast-rspamd

N-ary operations>

+ 2

! B

A

Eval order

C D E

Page 14: ast-rspamd

N-ary optimizations>

+ 2

! B

A

Eval order

C D E

What do we compare?

Here is our limit

Page 15: ast-rspamd

N-ary optimizations>

+ 2

! B

A

Eval order

C D E

What do we compare?

Here is our limit

Stop here

Page 16: ast-rspamd

Results

• Rspamd with RPN: 200ms on a normal message, 1.6 seconds on stupid large text message (10 Mb of text)

• Rspamd with AST: 40ms on a normal message, 400ms on stupid large text message

• SA: ??? (timeout?)

Page 17: ast-rspamd

Further steps

• Greedy algorithm to optimize execution time:

• calculate frequency and average time of a component

• minimize expression by applying greedy formula: min(freq / avg_time) for each component

Page 18: ast-rspamd

Learn dynamically

• We need to re-evaluate order of AST in the real time

• Solution: periodically evaluate atoms weights and resort tree using the same greedy algorithm

• Average time and cost is already evaluated

Page 19: ast-rspamd

Laziness is the source of the progress