Download - Error-Free Inversion of Ill-Conditioned Matrices in ...grid2010.jinr.ru/files/pdf/voloshinov.pdf · # 7 Hilbert matrices inversion complexity Maxima Lisp is single threaded The more

Error-Free Inversion of Ill-Conditioned Matrices

in Distributed Computing System of RESTful-Services of Computer Algebra

Vladimir Voloshinov1, Sergey Smirnov2

1Institute for System Analysis RAS, Moscow, www.isa.ru2Moscow Institute of Physics and Technology, www.mipt.ru

Center of Grid-technologies and Distributed Computing,ISA RAS, http://dcs.isa.ru, http://www.mathcloud.org

Dubna, JINR, GRID'2010

Supported by RFBR, grant #08-07-00430-a and "SKIF-Grid" project

http://dcs.isa.ru/

# 2

Multi-core processors desktops in LAN (specific “Desktop” Grid)

RESTful=REST+HTTP+JSONRepresentational State Transfer, JavaScript Object Notation

Specialized services

High-level computing applications ...

CAS Services

jLite

BNB-Grid

P2P

Subjects of our researches related to the report

MathCloud

# 3

Inherent feature of fixed-precision arithmetic computing — accumulation of rounding errors in intermediate operations.

Symbolic computing in CAS handles rational number x as

pairs of {numerator, denominator}, {p,q}:

All arithmetic operations are performed without loss of accuracy.Theoretically, regardless memory (and elapsed time!) unlimited accuracy may be expected.

If necessary, only final result may be rounded to float representation.

x= pq

Error-free symbolic computing in CAS (Comp. Algebra Sys.)

# 4

� GMP, GNU Multiple Precision library, C++, MPIM.I. Germanenko. Error-free Rational Calculations Software and Application for Solution Of Linear Systems // Vestnik of Lobachevsky State University of Nizhni Novgorod, 2009, N 4, с. 172-180 (in Russian) see references...

� SymGrid, symbolic computation systems to Grid ServicesMaple, GAP, Kant, MuPad, ...Glasgow parallel Haskell, MPI, Globus Toolkit

We propose more simple approach (may be of less performance):✔ GNU Maxima as CAS,✔ RESTful as middleware, JSON as data representation✔ MathCloud Workflow Editor and execution environment

Brief state-of-the-art in error-free computing

# 5

CAS (Computer Algebra System) Maxima (1).

Has been started in Massachusetts Institute of Technology, by prof. William Schelter. Since 1998 - GNU Public License.

Based on GCL (GNU Common Lisp) and its dialects, Open source for Windows, Linux

Almost the same symbolic computing capabilities as other CASes: differentiation&integration, series, ODE solving, matrices and linear algebra, polynomials, sets, lists, tensors...

http://maxima.sourceforge.net/ (GPL) http://maxima.sourceforge.net/ru/

Single-threaded Lisp interpreter

http://maxima.sourceforge.net/ru/

# 6

Condition number of HN is growing exponentially w.r.t N

Well-known ill-conditioned Hilbert matrices

Matrices HN of the type: H N={hm , n}m=1, n=1N , N , где hm ,n=

1mn−1

hmn=∫0

1

tm−1⋅tn−1 dt

cond H N =∥H N∥⋅∥H N −1∥~e3.5⋅N

Values of cond(HN) for

some N (calculated exactly in Maxima):

N cond(HN)

10 1.6⋅1013

50 1.5⋅1074

70 5.5⋅10104

100 4.1⋅10150

150 1.2⋅10227For double-precision arithmetic, «well-conditioned» matrices should have cond less than 1000.

Gram matrix of the monomial basis in L2[0,1]

∥AN×N∥= ∑m=1, n=1

N , N

Ai , j2

1 /2

# 7

Hilbert matrices inversion complexity

Maxima Lisp is single threaded

The more condition number the more digits in exact rational representation of the values the more time to process all of them. Size of Lisp-format (textual) representation:size (H300)

-1 ~ 34 Mb, size (H500)-1 ~ 140 Mb

# 8

Blocks processing provides more flexible for subsequent parallel and recursive algorithm because calculation of A-1 , S-1 and matrices multiplications may be parallelized as well.

Speedup evaluation at the next slide.

Matrix inversion by Schur complement (1)

Well-known inversion approach based on «block decomposition» and Schur complement, Cormen, Leiserson, Rivest, “Introduction to Algorithms”

# 9

Let M[N×N] be divided into four [N/2×N/2] blocks. The cost of inverse matrix blocks' “parallel” calculation (symbol «||») may be evaluated as follows.

Matrix inversion by Schur complement (2)

A-1 => ~O((N/2)3)VA-1 || A-1U => ~O((N/2)3)

VA-1 U => ~O((N/2)3)

B -VA-1 U => ~O((N/2)2)

S-1 => ~O((N/2)3)

S-1(VA-1) || (A-1U)S-1 => ~O((N/2)3)

A-1US-1VA-1 => ~O((N/2)3)

A-1+A-1US-1VA-1 => ~O((N/2)2)

Speedup eval. (for fixed-precision): 4 N 3

3 N 32 N 2≈43

(130% for large N)

1−A

( ) ( )1 1 1− − −=VA U V A U VA U

1−A U-1VA

1−= −S B VA U

-1S

( )1−-1S VA ( )1− -1A U S

( ) ( ) ( ) ( )1 1 1− − −=-1 -1 -1 -1 -1 -1A US VA A US VA A U S VA

1 1− −+ -1 -1A A US VA

# 10

MathCloud WorkFlow Editor and Execution Usability

http://www.mathcloud.org, REST+HTTP(RESTful)+JSON (JavaScript Object NotationUsability test case (for "novice"):� Work-flow programming in REST environment

(functional style)� Maxima RESTful-service capability & reliability� MathCloud "executor" capabilities for rather complex

structured work-flows�

As a result (less science more exercises ;o): debugging, request for features, and experience gaining

# 11

WF Sample

Typical part of workflow with Maxima-service/* ----- Schur4Wf.mac -----------*/computeIAxU() := block( IAxU : IA.U, save(outputFilePath, IAxU), "OK!");

computeVxIA() := block( VxIA : V.IA, save(outputFilePath, VxIA), "OK!");

computeIS() := block( VxIAxU : if (member('VxIA, values)) then VxIA.U else V.IAxU, S : B-VxIAxU, IS : Inv(S), save(outputFilePath, IS), "OK!");

Each part corresponds to

some “elementary”

(block(...)) Maxima-script

function.

# 12

Matrices multiplications by ribbon decomposition (WF)

# 13

Compact view (without all inputs and outputs)

# 14

Matrices multiplications by ribbon decomposition (running)

Block coloring:Green - passed,Grey - skipped,Light blue - runningDark blue - to do

# 15

Full view (with all inputs and outputs)

# 16

Scenario running (1)


# 17

Scenario running (2)


# 18

"Production" version (end user can set any input matrix)initBlocksOfMatrix(H)

Previously shown WF as composition

Tic-toc JavaScript blocks to measure elapsed time

# 19

Speedup of "four-block" inversion (210% - 230%)

301

150155

100250 300

400 400350

200

0

250

500

750

1000

1250

1500

1750

2000

2250

2500

1 2 3 4 5 6 7 8 9 10Experiments, {N,dim(A)}

sec,

N

NinvSchur

Preliminary results of Schur complement approach (standalone «simulation»)

Speedup appreciably depends on blocks sizes.

# 20

Performance results and conclusions

� Good enough for really hard cases (computations exeed MathCloud & Data-exchange overheads)

� Usability of MathCloud WF Editor should be improved to handle complex structured scenarios

� MathCloud execution system is rather stable for 24x7 intensive symbolic computing

Hilbert matrices, HN , inversion

� Good enough for really hard cases (computations exeed MathCloud & Data-exchange overheads)

� Usability of MathCloud WF Editor should be improved to handle complex structured scenarios

� MathCloud execution system is rather stable for 24x7 intensive symbolic computing

# 21

Thank you!

# 22

Let M be [N×N] matrix. LU-factor {L, U, P}: L - lower-triangular; U - upper-triangular; P – permutation matrix. L⋅U=P⋅Μ .Let EN be unit matrix. To obtain M-1 - solve (by forward and backward substitution): L⋅U⋅X=P⋅EN ⇒ X=M -1

Maxima has function lu_factor(M) (Gaussian elimination) and lu_backsub(LU,B) (LU — returned lu_factor(M)) to solve L⋅U⋅X=P⋅B for any rectangular matrix B[N×M]. invert_by_lu(M):=lu_backsub( lu_factor(M), EN)If EN is sliced vertically into K submatrices

,

then parallel call of lu_backsub(LU, ) gives K set of inverse columns M-1[nk-1:nk].

K=N means parallel calculations of M-1 columns.

2 31 1 2 11:1: 1: 1:Kn nn n n n NN N N N N

−++ + = E E E E E 0 1 2 3 11 K Kn n n n n n N−= ≤ < < < < < =

1 1:k kn nN

− +E

Possible speedup reasoning for «LU-inversion» of HN (1)

# 23

Possible speedup reasoning for «LU-inversion» of HN (2)

Durations of two phases of inversion : LU-factorization (Gaussian elimination) and calculation of inverse matrix's columns.

N invert_by_lu(HN),Tinv,

secLUP=lu_factor(HN), TLU

seclu_backsub(LUP, EN), Tbs

sec100 27 9 17150 122 43 76200 368 128 229250 826 306 513300 1684 631 1043

«Columns phase» (Tbs ) takes almost twice of LU-factorization.In the case of unlimited number of resources ~300% speedup may be expected.

# 24

0 phase: start nodes, scripts «dispatching»

putFile(*.mac)

Maxima Factory

Maxima FactoryMaxima

Factory

Maxima Factory

Single processor «invert_by_lu» scenario.

LAN

MaximaFactory Maxima

Master Factory

MaximaFactory

MaximaFactory

Scenario control dialog

Client application

(JAVA)

Laptop

register()getSlaves()

Scenario control dialog

# 25

Load balancing problem in «LU» scenarioHost id Processor Cores involved Memory OS Perform. 0 Intel Core Quad Q6600 2.4 GHz 3 3.8 Gb Linux 1.1 1 Intel Core Quad Q6600 2.4 GHz 3 2 Gb WinXP 1.0 2 Intel Centrino Dual-Core 1.466 GHz 2 1 Gb WinXP 0.6 3 Pentium 4, 3 GHz 2 (HyperThreading) 1 Gb WinXP 0.6 4 Pentium 4 2.8 GHz 2 (HyperThreading) 512 Mb WinXP 0.5

Duration of k-th column calculation, inv(H300)

0

1

2

3

4

5

6

7

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300k

sec

Duration of k-th column calc., inv(H200)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200k

sec