1 / 25 FOSDEM 2018 | Michael Meeks
Michael MeeksGeneral Manager at Collabora Productivity
[email protected] - mmeeks,
G+ - [email protected]
Calc: The challenges ofscalable arithmetic
How threading can be challenging
“Stand at the crossroads and look; ask for the ancient paths, ask where the good way is, and walk in it, and
you will find rest for your souls...” - Jeremiah 6:16
www.collaboraoffice.com
2 2 / 25 FOSDEM 2018 | Michael Meeks
Calc threading - Overview
● LibreOffice 6.0 Calc
● Existing structure & parallelism
● Why thread ?
● The initial solution & problems● mis-factored code● dependency issues
● The group calculation piece
● Profiling & optimizing
● Future work & expansion …
Disclaimer & Thanks:Almost all of this
work was done by Tor Lillqvist & Dennis Francis – who can’t be here today.
Some great code reading & improvement.
Disclaimer & Thanks:Almost all of this
work was done by Tor Lillqvist & Dennis Francis – who can’t be here today.
Some great code reading & improvement.
3 3 / 25 FOSDEM 2018 | Michael Meeks
LibreOffice 6.0 Calc ...
● A 30+ year old code-base
● Primary Data structures hugely improved recently● Still some scope for improvement:
FormulaGroup vs. FormulaCell, per-cell dependency records etc.
● Calculation Engine in need of love● Some insights into how it works● Some problems wrt. threading.
4 / 25 FOSDEM 2018 | Michael Meeks
Core structures since 4.3(mdds::multi_type_vector)
ScDocument
ScTable
svl::SharedString block
double block
EditTextObject block
ScFormulaCell block
ScColumn
Broadcasters
Text widthsScript types
Cell values
Cell notes
This bit:This bit:
5 / 25 FOSDEM 2018 | Michael Meeks
FormulaCellGroups
ScFormulaCell
ScTokenArray
ScFormulaCell
ScFormulaCell
ScFormulaCell
ScFormulaCell
ScFormulaCell
ScFormulaCell
ScFormulaCellGroup
… Tokens
… RPN
Sample Token types (StackVar) ● svSingleRef → A1● svDoubleRef → A1:C3● svExternalSingleRef etc.● svDouble → 42.0● svString → “hello world”● svByte → ocDiv, ocMacro ...
Sample Token types (StackVar) ● svSingleRef → A1● svDoubleRef → A1:C3● svExternalSingleRef etc.● svDouble → 42.0● svString → “hello world”● svByte → ocDiv, ocMacro ...
6 / 25 FOSDEM 2018 | Michael Meeks
Normal Formula interpreting
double ScFormulaCell::GetValue(){ MaybeInterpret(); return GetRawValue();}
void ScFormulaCell::Interpret(){ … amazing recursion flattening … InterpretTail() // ie. ... { … new ScInterpreter( this, pDocument, rContext, aPos, *pCode /* those tokens */);
->Interpret()
StackVar ScInterpreter::Interpret(){ … execute reverse-polish stack … … execute functions … … get cell values from references …
Recursion++
7 / 25 FOSDEM 2018 | Michael Meeks
InterpretFormulaGroup
ScTokenArray
ScFormulaCellGroup
… Tokens
… RPN
1
2
2
1
7
6
9
6
5
2
3
4
getValuesCollected to Matrix
Interpret:OpenCLSoftware
Examine for safe casesExamine for safe cases
Even non-threaded software case: fasterShares function input collection work.Aggregated / linearized doubles / strings in the matrix
Why Thread ?
9 / 25 FOSDEM 2018 | Michael Meeks
CPUs get wider not faster
● Sometimes CPUs get slower …
● Process clocks stymied at 3-4 GHz● IPC improvements ~stalled
● Real IPC wins:● Laptops minimum 4 threads→
– Mid-range 8 threads.→
● PC / Workstation– 8 16 threads: the new normal.→
● Affordable too ...
● Many thanks to AMD for sponsoring this work.
10 / 25 FOSDEM 2018 | Michael Meeks
2017 Crash reporting stats
● Frustratingly ‘cores’ not threads.
2017
-01-
01
2017
-02-
01
2017
-03-
01
2017
-04-
01
2017
-05-
01
2017
-06-
01
2017
-07-
01
2017
-08-
01
2017
-09-
01
2017
-10-
01
2017
-11-0
1
2017
-12-
01
2018
-01-
010.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Crash report % by CPU core count over time.
48
36
32
24
16
12
10
8
6
4
2
1
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Reports from large core count machines.
48
40
36
32
24
16
12
10
Initial Solution ...
12 / 25 FOSDEM 2018 | Michael Meeks
Thread InterpretFormulaGroup
● Attempt re-use of existing formula core● Try to avoid special / sub-setting code-paths
for existing formula-group conversion: a more generic solution.
● Concept:● Pre-calculate dependent cells to control
recursion outside of threads.● Protect invariants with assertions● Black-list problematic functions ...● Parallelise using existing interpreter.
13 / 25 FOSDEM 2018 | Michael Meeks
Parallelize existing interpreter
double ScFormulaCell::GetValue(){ MaybeInterpret(); return GetRawValue();}
void ScFormulaCell::Interpret(){ … amazing recursion flattening … InterpretTail() // ie. ... { … new ScInterpreter( this, pDocument, rContext, aPos, *pCode /* those tokens */);
->Interpret()StackVar ScInterpreter::Interpret(){ … execute reverse-polish stack … … execute functions … … get cell values from references …
Pre-fetch all dependent values – and lock-that down:void ScFormulaCell::MaybeInterpret() ...assert(!pDocument->mbThreadedGroupCalcInProgress);
Pre-calculated →No recursion
14 / 25 FOSDEM 2018 | Michael Meeks
ScInterpreter: calcs formulae
ScDocumentScTable
ScFormulaCell block
Broadcasters
ScBroadcastAreaSlotMachine
ScColumn
DependenciesDependenciesScInterpreter
ScTokenArray
ScFormulaCellGroup
… Tokens
… RPN
Mutates: INDEX, OFFSET etc.
CloudWeb fn’s
MacrosExt’ns
Mutates!
VlookupCache
Number format, Link mgmt etc.
15 / 25 FOSDEM 2018 | Michael Meeks
ScInterpreter: some fixes
● Basic iteration - broken:● class FormulaTokenArray
– sal_uInt16 nIndex; // Current step index
– FormulaToken* FirstRPN() { nIndex = 0;
return NextRPN(); }
● Now has an external iterator
– a man-week+ to un-wind this, and debug the last pieces that relied on this.
● Added mutation guards:● ScMutationGuard aGuard(this, ScMutationGuardFlags::CORE);
– In all likely-looking places: where core state is changed.
16 / 25 FOSDEM 2018 | Michael Meeks
Disabling nasties:
● Dependency graph manipulation● During calculation:
– Indirect, Offset, Match, Cell, ocTableOp
● Other stuff● Macros – disabled for now.
– Could detect ‘pure’ ie. non-mutating functions
– Also parallelize the basic/ interpreter (?)
● Info grab-bag of bits.→
● ocExternal UNO extensions:→
– currently in: but can do ~un-controlled mutation (?)
17 / 25 FOSDEM 2018 | Michael Meeks
More nasties ...
● Several global variables● No-where obvious to hang them
● Now some thread_local variables
– Calculation stack
– Current-document being calculated
– Matrix positions – nC,nR
● Somewhat horrific: fix obsolete Mac toolchain.
● ScInterpreterContext● Added – passed through all functions.
– Impacts eg. ‘GetValue’ though ...
18 / 25 FOSDEM 2018 | Michael Meeks
single1 2 4 8 160.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
re-calculating 100k formulae on 1m doubles
Meeks/LinuxRyzen/Win10
Thread count
Se
con
ds
to c
alc
ula
te
How did that look: initially ...
● Faster● Getting some nice
speedups – ignoring the hyper-threaded-ness:
● 8.5s 2.5 with 4 →threads 3.4x→
● 4.7 0.86 - ~5.5x →with 8 threads
19 / 25 FOSDEM 2018 | Michael Meeks
Up to this point:
● Plain Old calculation – single threaded (POC)
● Group calculation
A) Single Threaded Software Group calc (STSG)
B) OpenCL: GPU parallelism after conversion
C) New threaded calculation (NTC)
● Then: C) slower than A) in some cases …– Collecting data from sheets, branching, type handling, etc. again
and again for each formulacell …● Expensive – threading doesn’t help.
– A) collects once – and has some SSE2 goodness …
● So add a ‘threaded A)’ - simple & better …→
● Weighting decision: POC vs. ... based on complexity.
20 / 25 FOSDEM 2018 | Michael Meeks
Improving performance ...
● Why don’t we get a 8x for 8 threads ?● Terrible profiling tools on Windows.● Linux – used ‘perf’ looking for threading
issues:– sudo perf record --call-graph dwarf \ --switch-events -c 1 # etc.
● Looking for false-sharing
– And other horrors.
21 / 25 FOSDEM 2018 | Michael Meeks
Horror: rampant heap thrash
● RPN calculation – stack based:● Tons of stack operations: pushing values etc.● Do memory allocation & frees.
– Using the ancient / internal allocator – never intended for heavy parallel use.
→ drop the custom allocator hugely faster→
→ Re-use tokens where possible too.
● std::stack deque lists …→ →● Horrible: std::vector instead far better.→
● Re-using ScInterpreterContext ...
22 / 25 FOSDEM 2018 | Michael Meeks
Other issues ...
● Where ‘GetDouble’ meets SfxItemSet ...● fixed SvNumberFormatter thread safety.
23 / 25 FOSDEM 2018 | Michael Meeks
Threading & optimizing story:
Row 1 Row 2 Row 3 Row 40
2
4
6
8
10
12
Column 1
Column 2
Column 3
Baseline from recent master
Group Interpreter work by Tor
Thread Software Interpreter
Avoid TokenArray thrash
halve the number of threads if HT is active
disable custom allocator
use a cache for FormulaDoubleToken allocation
make token cache thread local
up the token cache size to 16
halve threadcount if HT active for group interpreter too
Use C++ threads0
200
400
600
800
1000
1200
1400
1600
1800
Benchmarking some of our sample sheets ...
Note this perf. Regression from threading for some workloads came from avoiding the SoftwareGroupInterpreter
24 / 25 FOSDEM 2018 | Michael Meeks
Future work
● Stop the Crash-testing from asserting ...● Implicit intersection: killing us (again)
– Move RPN to have precise ranges
● Extend threaded unit tests further …
● Move more global variables to ScInterpreterContext
● Make FormulaCell a 1x item group● Make POC calcalation a forced-single-threaded calc
– Always thread SoftwareGroup Intepreter
● De-bong the format-typeuse● =J20 – should not change format type if J20
changes format.– A sheet-creation-time optimization …
– Intersects with ‘units’ work too.
25 25 / 25 FOSDEM 2018 | Michael Meeks
Conclusions
● Calculation can be threaded
● Significant speedups are possible
● Profiling & optimizing works● “it is slow” == “not enough invested yet”
– All problems are just economics
● Many thanks to AMD for their support.
Oh, that my words were recorded, that they were written on a scroll, that they were inscribed with an iron tool on lead, or engraved in rock for ever! I know that my Redeemer lives, and that in the end he will stand upon the earth. And though this body has been destroyed yet in my flesh I will see God, I myself will see him, with my own eyes - I and not another. How my heart yearns within me. - Job 19: 23-27
Top Related