Error-Free Inversion of Ill-Conditioned Matrices
in Distributed Computing System of RESTful-Services of Computer Algebra
Vladimir Voloshinov1, Sergey Smirnov2
1Institute for System Analysis RAS, Moscow, www.isa.ru2Moscow Institute of Physics and Technology, www.mipt.ru
Center of Grid-technologies and Distributed Computing,ISA RAS, http://dcs.isa.ru, http://www.mathcloud.org
Dubna, JINR, GRID'2010
Supported by RFBR, grant #08-07-00430-a and "SKIF-Grid" project
# 2
Multi-core processors desktops in LAN (specific “Desktop” Grid)
RESTful=REST+HTTP+JSONRepresentational State Transfer, JavaScript Object Notation
Specialized services
High-level computing applications ...
CAS Services
jLite
BNB-Grid
P2P
Subjects of our researches related to the report
MathCloud
# 3
Inherent feature of fixed-precision arithmetic computing — accumulation of rounding errors in intermediate operations.
Symbolic computing in CAS handles rational number x as
pairs of {numerator, denominator}, {p,q}:
All arithmetic operations are performed without loss of accuracy.Theoretically, regardless memory (and elapsed time!) unlimited accuracy may be expected.
If necessary, only final result may be rounded to float representation.
x= pq
Error-free symbolic computing in CAS (Comp. Algebra Sys.)
# 4
� GMP, GNU Multiple Precision library, C++, MPIM.I. Germanenko. Error-free Rational Calculations Software and Application for Solution Of Linear Systems // Vestnik of Lobachevsky State University of Nizhni Novgorod, 2009, N 4, с. 172-180 (in Russian) see references...
� SymGrid, symbolic computation systems to Grid ServicesMaple, GAP, Kant, MuPad, ...Glasgow parallel Haskell, MPI, Globus Toolkit
We propose more simple approach (may be of less performance):✔ GNU Maxima as CAS,✔ RESTful as middleware, JSON as data representation✔ MathCloud Workflow Editor and execution environment
Brief state-of-the-art in error-free computing
# 5
CAS (Computer Algebra System) Maxima (1).
Has been started in Massachusetts Institute of Technology, by prof. William Schelter. Since 1998 - GNU Public License.
Based on GCL (GNU Common Lisp) and its dialects, Open source for Windows, Linux
Almost the same symbolic computing capabilities as other CASes: differentiation&integration, series, ODE solving, matrices and linear algebra, polynomials, sets, lists, tensors...
http://maxima.sourceforge.net/ (GPL) http://maxima.sourceforge.net/ru/
Single-threaded Lisp interpreter
# 6
Condition number of HN is growing exponentially w.r.t N
Well-known ill-conditioned Hilbert matrices
Matrices HN of the type: H N={hm , n}m=1, n=1N , N , где hm ,n=
1mn−1
hmn=∫0
1
tm−1⋅tn−1 dt
cond H N =∥H N∥⋅∥H N −1∥~e3.5⋅N
Values of cond(HN) for
some N (calculated exactly in Maxima):
N cond(HN)
10 1.6⋅1013
50 1.5⋅1074
70 5.5⋅10104
100 4.1⋅10150
150 1.2⋅10227For double-precision arithmetic, «well-conditioned» matrices should have cond less than 1000.
Gram matrix of the monomial basis in L2[0,1]
∥AN×N∥= ∑m=1, n=1
N , N
Ai , j2
1 /2
# 7
Hilbert matrices inversion complexity
Maxima Lisp is single threaded
The more condition number the more digits in exact rational representation of the values the more time to process all of them. Size of Lisp-format (textual) representation:size (H300)
-1 ~ 34 Mb, size (H500)-1 ~ 140 Mb
# 8
Blocks processing provides more flexible for subsequent parallel and recursive algorithm because calculation of A-1 , S-1 and matrices multiplications may be parallelized as well.
Speedup evaluation at the next slide.
Matrix inversion by Schur complement (1)
Well-known inversion approach based on «block decomposition» and Schur complement, Cormen, Leiserson, Rivest, “Introduction to Algorithms”
# 9
Let M[N×N] be divided into four [N/2×N/2] blocks. The cost of inverse matrix blocks' “parallel” calculation (symbol «||») may be evaluated as follows.
Matrix inversion by Schur complement (2)
A-1 => ~O((N/2)3)VA-1 || A-1U => ~O((N/2)3)
VA-1 U => ~O((N/2)3)
B -VA-1 U => ~O((N/2)2)
S-1 => ~O((N/2)3)
S-1(VA-1) || (A-1U)S-1 => ~O((N/2)3)
A-1US-1VA-1 => ~O((N/2)3)
A-1+A-1US-1VA-1 => ~O((N/2)2)
Speedup eval. (for fixed-precision): 4 N 3
3 N 32 N 2≈43
(130% for large N)
1−A
( ) ( )1 1 1− − −=VA U V A U VA U
1−A U-1VA
1−= −S B VA U
-1S
( )1−-1S VA ( )1− -1A U S
( ) ( ) ( ) ( )1 1 1− − −=-1 -1 -1 -1 -1 -1A US VA A US VA A U S VA
1 1− −+ -1 -1A A US VA
# 10
MathCloud WorkFlow Editor and Execution Usability
http://www.mathcloud.org, REST+HTTP(RESTful)+JSON (JavaScript Object NotationUsability test case (for "novice"):� Work-flow programming in REST environment
(functional style)� Maxima RESTful-service capability & reliability� MathCloud "executor" capabilities for rather complex
structured work-flows�
As a result (less science more exercises ;o): debugging, request for features, and experience gaining
# 11
WF Sample
Typical part of workflow with Maxima-service/* ----- Schur4Wf.mac -----------*/computeIAxU() := block( IAxU : IA.U, save(outputFilePath, IAxU), "OK!");
computeVxIA() := block( VxIA : V.IA, save(outputFilePath, VxIA), "OK!");
computeIS() := block( VxIAxU : if (member('VxIA, values)) then VxIA.U else V.IAxU, S : B-VxIAxU, IS : Inv(S), save(outputFilePath, IS), "OK!");
Each part corresponds to
some “elementary”
(block(...)) Maxima-script
function.
# 12
Matrices multiplications by ribbon decomposition (WF)
# 13
Compact view (without all inputs and outputs)
# 14
Matrices multiplications by ribbon decomposition (running)
Block coloring:Green - passed,Grey - skipped,Light blue - runningDark blue - to do
# 15
Full view (with all inputs and outputs)
# 16
Scenario running (1)
Block coloring:Green - passed,Grey - skipped,Light blue - runningDark blue - to do
# 17
Scenario running (2)
Block coloring:Green - passed,Grey - skipped,Light blue - runningDark blue - to do
# 18
"Production" version (end user can set any input matrix)initBlocksOfMatrix(H)
Previously shown WF as composition
Tic-toc JavaScript blocks to measure elapsed time
# 19
Speedup of "four-block" inversion (210% - 230%)
301
150155
100250 300
400 400350
200
0
250
500
750
1000
1250
1500
1750
2000
2250
2500
1 2 3 4 5 6 7 8 9 10Experiments, {N,dim(A)}
sec,
N
NinvSchur
Preliminary results of Schur complement approach (standalone «simulation»)
Speedup appreciably depends on blocks sizes.
# 20
Performance results and conclusions
� Good enough for really hard cases (computations exeed MathCloud & Data-exchange overheads)
� Usability of MathCloud WF Editor should be improved to handle complex structured scenarios
� MathCloud execution system is rather stable for 24x7 intensive symbolic computing
Hilbert matrices, HN , inversion
� Good enough for really hard cases (computations exeed MathCloud & Data-exchange overheads)
� Usability of MathCloud WF Editor should be improved to handle complex structured scenarios
� MathCloud execution system is rather stable for 24x7 intensive symbolic computing
# 21
Thank you!
# 22
Let M be [N×N] matrix. LU-factor {L, U, P}: L - lower-triangular; U - upper-triangular; P – permutation matrix. L⋅U=P⋅Μ .Let EN be unit matrix. To obtain M-1 - solve (by forward and backward substitution): L⋅U⋅X=P⋅EN ⇒ X=M -1
Maxima has function lu_factor(M) (Gaussian elimination) and lu_backsub(LU,B) (LU — returned lu_factor(M)) to solve L⋅U⋅X=P⋅B for any rectangular matrix B[N×M]. invert_by_lu(M):=lu_backsub( lu_factor(M), EN)If EN is sliced vertically into K submatrices
,
then parallel call of lu_backsub(LU, ) gives K set of inverse columns M-1[nk-1:nk].
K=N means parallel calculations of M-1 columns.
2 31 1 2 11:1: 1: 1:Kn nn n n n NN N N N N
−++ + = E E E E E 0 1 2 3 11 K Kn n n n n n N−= ≤ < < < < < =
1 1:k kn nN
− +E
Possible speedup reasoning for «LU-inversion» of HN (1)
# 23
Possible speedup reasoning for «LU-inversion» of HN (2)
Durations of two phases of inversion : LU-factorization (Gaussian elimination) and calculation of inverse matrix's columns.
N invert_by_lu(HN),Tinv,
secLUP=lu_factor(HN), TLU
seclu_backsub(LUP, EN), Tbs
sec100 27 9 17150 122 43 76200 368 128 229250 826 306 513300 1684 631 1043
«Columns phase» (Tbs ) takes almost twice of LU-factorization.In the case of unlimited number of resources ~300% speedup may be expected.
# 24
0 phase: start nodes, scripts «dispatching»
putFile(*.mac)
Maxima Factory
Maxima FactoryMaxima
Factory
Maxima Factory
Single processor «invert_by_lu» scenario.
LAN
MaximaFactory Maxima
Master Factory
MaximaFactory
MaximaFactory
Scenario control dialog
Client application
(JAVA)
Laptop
register()getSlaves()
Scenario control dialog
# 25
Load balancing problem in «LU» scenarioHost id Processor Cores involved Memory OS Perform. 0 Intel Core Quad Q6600 2.4 GHz 3 3.8 Gb Linux 1.1 1 Intel Core Quad Q6600 2.4 GHz 3 2 Gb WinXP 1.0 2 Intel Centrino Dual-Core 1.466 GHz 2 1 Gb WinXP 0.6 3 Pentium 4, 3 GHz 2 (HyperThreading) 1 Gb WinXP 0.6 4 Pentium 4 2.8 GHz 2 (HyperThreading) 512 Mb WinXP 0.5
Duration of k-th column calculation, inv(H300)
0
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300k
sec
Duration of k-th column calc., inv(H200)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200k
sec
# 26
Top Related