Models and Algorithms for Parallel Computing
-
Upload
octavian-barbanta -
Category
Documents
-
view
214 -
download
0
Transcript of Models and Algorithms for Parallel Computing
-
8/12/2019 Models and Algorithms for Parallel Computing
1/15
FACULTATEA DE AUTOMATICA, CALCULATOARE SI ELECTRONICA, CRAIOVA
Models and Algorithms
for arallel Com!"tingProject : Matrix Multiplication
Student:Spatarelu Andrei
Coordinator:Cosmin Poteras
1/22/2014
-
8/12/2019 Models and Algorithms for Parallel Computing
2/15
Problem statement
It is re#"ired to !aralleli$e the matri% m"lti!li&ation algorithm "sing the
follo'ing methods(
m"ltithreading
message !assing interfa&e )MI*
Generalities: Matrix multiplication algoritm and applications
A matrixis a re&tang"lar arra+ of n"mers or other mathemati&al o-e&ts,
for 'hi&h o!erations s"&h as addition and m"lti!li&ation are defined.
The n"mers, s+mols or e%!ressions in the matri% are &alled its entries or
its elements. The hori$ontal and /erti&al lines of entries in a matri% are
&alled ro's and &ol"mns.
The si$e of a matri% is defined + the n"mer of ro's and &ol"mns that it
&ontains. A matri% 'ith m ro's and n &ol"mns is &alled an m 0 n matri%
or m1+1n matri%, 'hile m and n are &alled its dimensions.
M"lti!li&ation of t'o matri&es is defined onl+ if the n"mer of &ol"mns of the
left matri% is the same as the n"mer of ro's of the right matri%. If Ais
an m1+1n matri% and !is an n1+1! matri%, then their matri% !rod"&t A!is
the m1+1! matri% 'hose entries are gi/en + dot !rod"&t of the
&orres!onding ro' of matri% Aand the &orres!onding &ol"mn of matri% !.
-
8/12/2019 Models and Algorithms for Parallel Computing
3/15
The matri% m"lti!li&ation general form"la (
E%am!le(
If 'e define the matri&es (
A 2
3 2
Then the m"lti!li&ation res"lt is(
4A352
6eneral Algorithm)also 7no'n as 8i-79 algorithm*(
If 'e &onsider a matri% A 'ith m lines and n &ol"mns and a matri% 3 'ith n
lines and ! &ol"mns then their !rod"&t algorithm 'ill e (
-
8/12/2019 Models and Algorithms for Parallel Computing
4/15
1. ori =1to M2. orj =1to N3. c (i , j )=0
4. ork =1to P5. c (i , j )=c (i , j )+a (i ,
k )*b (k , j )
6. end
. end
!. end
Oser/ations(
For s#"are matri&es the M 2 N 2
Noti&e that A is a&&essed ro' + ro' "t 3 is a&&essed &ol"mn +
&ol"mn.
The &ol"mn inde% for C /aries faster than the ro' inde%, "t these are
&onstant 'ith res!e&t to the inner loo! so is m"&h less signifi&ant.
:e ha/e an effi&ient a&&ess !attern for A "t not for 3
Applications
There are n"mero"s a!!li&ations of matri&es, oth in mathemati&s and
other s&ien&es. Some of them merel+ ta7e ad/antage of the &om!a&t
re!resentation of a set of n"mers in a matri%. For e%am!le, in games
theor+ and e&onomi&s, the !a+off matri% en&odes the !a+off for t'o
!la+ers, de!ending on 'hi&h o"t of a gi/en )finite* set of alternati/es the
!la+ers &hoose.
Grap teor"
The ad-a&en&+ matri% of a finite gra!h is a asi& notion of gra!h theor+. It
sa/es 'hi&h /erti&es of the gra!h are &onne&ted + an edge. Matri&es&ontaining -"st t'o different /al"es ); and < meaning for e%am!le =+es=
and =no=, res!e&ti/el+* are &alled logi&al matri&es. The distan&e )or &ost*
matri% &ontains information ao"t distan&es of the edges.
-
8/12/2019 Models and Algorithms for Parallel Computing
5/15
S"mmetries and trans#ormations in p"sics
Linear transformations and the asso&iated s+mmetries !la+ a 7e+ role in
modern !h+si&s. For e%am!le, elementar+ !arti&les in #"ant"m field
theor+are &lassified as re!resentations of the Lorent$ gro"! of s!e&ialrelati/it+ and, more s!e&ifi&all+, + their eha/ior "nder the s!in gro"!.
The !h+si&s "sage of matri% &om!"tations is fo"nd often in game
gra!hi&s.
$lectronics
Traditional mesh anal+sis in ele&troni&s leads to a s+stem of linear
e#"ations that &an e des&ried 'ith a matri%. The eha/io"r ofman+ ele&troni& &om!onents &an e des&ried "sing matri&es.
To &al&"late a &ir&"it the !rolem is red"&ed to m"lti!l+ t'o matri&es.
%ter domains:
Anal+sis and geometr+
Probabilit" teor" and statistics
Parallel implementation using multitreading
The im!lementation 'as done "sing C>> threads on :indo's OS.
The !arallel /ersion of the matri% im!lementation &onsist of s!litting the
ro's of the first matri% at the n"mer of defined threads. If the rest of the
di/ision is igger than < ,then it 'ill e s!litted again.
The matri&es ,the start inde% for e/er+ thread and their dimensions are
stored in str"&t"re &alled info(
t"#edef$tr%ctinfo
&
do%b'e**a
do%b'e**b
http://en.wikipedia.org/wiki/Quantum_field_theoryhttp://en.wikipedia.org/wiki/Quantum_field_theoryhttp://en.wikipedia.org/wiki/Spin_grouphttp://en.wikipedia.org/wiki/Spin_grouphttp://en.wikipedia.org/wiki/Quantum_field_theoryhttp://en.wikipedia.org/wiki/Quantum_field_theory -
8/12/2019 Models and Algorithms for Parallel Computing
6/15
intd1,d2,d3
intinde
info
Forea&h thread 'e &reate an info str"&t"re 'ith the data needed +
res!e&ti/e thread to &om!"te the matri% m"lti!li&ation(
forinti =0,N-/ ++i
info *#ara$ =neinfo()
#ara$7a =a
#ara$7b =b
#ara$7d1 =
#ara$7d2 =n
#ara$7d3 =#
#ara$7inde =i
10. t8read$9i:=;reate-8read(
11. N
-
8/12/2019 Models and Algorithms for Parallel Computing
7/15
M"ltiF"n&tion effe&ti/el+ &om!"tes the matri% m"lti!li&ation. Ea&h thread 'ill
tra/erse onl+ a !art of the lines of the first matri% and 'ill m"lti!l+ ea&h of
them 'ith the se&ond matri%. The res"lts 'ill e added in the third matri%
) C *. The s!e&ifi& lines of the first matri% are tra/ersed indi/id"all+ + ea&h
thread in the follo'ing 'a+(
1ea&h thread 'ill start 'ith the line n"mer of its ran7 and 'ill add e/er+
time the n"mer of threads in order to &om!"te the follo'ing line. In o"r
&ase the res"lts are stored in the !artialRes"lts matri% 'hi&h initiali$es ea&h
element 'ith < and adds the &om!"ted res"lt to it.
-
8/12/2019 Models and Algorithms for Parallel Computing
8/15
The main thread 'ill 'ait "ntil ea&h thread 'ill finish its 'or7 and 'ill
dis!la+ the res"lts.
Parallel implementation using MP&
A/ A@NP@ M%'ti>%nction(P?@ #ara4$)&
info*info =(info*)#ara4$ intinde =info7inde
or(inti =indei Binfo7d1i +=N-/) &
or(intj =0j Binfo7d3++j) &
10. #artia'/e$%'t$9i:9j:=0
11. or(intk =0k Binfo7d2++k)12. &
13. #artia'/e$%'t$9i:9j:+=info7a9i:9k:*info
7b9k:9j:
14.
15.
16.
1.
-
8/12/2019 Models and Algorithms for Parallel Computing
9/15
MI is a lang"age1inde!endent &omm"ni&ations !roto&ol "sed to
!rogram !arallel &om!"ters.
Its interfa&e is meant to !ro/ide essential /irt"al to!olog+, s+n&hroni$ation,
and &omm"ni&ation f"n&tionalit+ et'een a set of !ro&esses )that ha/e een
ma!!ed to nodes?ser/ers?&om!"ter instan&es* in a lang"age1inde!endent
'a+.
MI !rograms al'a+s 'or7 'ith !ro&esses, "t !rogrammers &ommonl+
refer to the !ro&esses as !ro&essors. T+!i&all+, for ma%im"m !erforman&e,
ea&h CU 'ill e assigned -"st a single !ro&ess. This assignment ha!!ens at
r"ntime thro"gh the agent that starts the MI !rogram.
In the MI im!lementation the matri% &om!"tation 'ill e done + the
!ro&esses ;,n.The !ro&ess < 'ill s!lit the matri% in s"matri&es )&ontaining
the assigned lines from the first matri%*.
The &ode from ao/e 'ill e e%e&"ted + n1; times.After the s"matri% 'ill
e &reated then it 'ill e assigned to the m"lti!l+art entit+ 'hi&h 'ill
&ontain also the t'o dimensions.
The m"lti!l+art entit+ 'ill e seriali$ed and sent to the &orres!onding
!ro&esses thro"gh the MI Send f"n&tion 'hi&h ta7es as !arameter the
instan&e of the &lass and the !ro&ess ran7 at 'hi&h the information 'ill e
sent.
1. orn# =1,co.iCen#++2. &
3. int#artia'No =04. orj =n#1,j +=co.iCe15. &
6. #artia'No++
.
!. do%b'e9:9:#Matri =nedo%b'e9#artia'No:9:D. or $ =0,#artia'No$++10. &
-
8/12/2019 Models and Algorithms for Parallel Computing
10/15
11. #Matri9$:=nedo%b'e9n:12.
13. intinde =014. or(inti =n#1i Bi+=co.iCe1)15. &
16. or(intk =0k Bnk++)1. &
1!. #Matri9inde:9k:=a9i:9k:
1D.
20. inde++
21.
22. %'ti#'"Part P =ne%'ti#'"Part()
23. P.a=#Matri
24. P.d1=#artia'No
25. i (P.a.enEt870)26. co.endB%'ti#'"Part7(P,n#,0)
2.
The se&ond matri% 'ill e also assigned to another entit+ &alled matrixStruct
'hi&h 'ill e seriali$ed and road&asted to all the !ro&esses thro"gh the MI
road&ast f"n&tion.
1. %'ti#'" atritr%ct =ne%'ti#'"()
2. atritr%ct.b=b
3. atritr%ct.d2=n
4. atritr%ct.d3=#
5. co.Froadca$t(re atritr%ct, 0)
-
8/12/2019 Models and Algorithms for Parallel Computing
11/15
Then ea&h !ro&ess 'ill re&ei/e its !art and 'ill &om!"te the m"lti!li&ation 'ith
the se&ond matri%, the res"lt eing a s"matri% of the final res"lt.
Ea&h !artial res"lt 'ill e sent a&7 to !ro&ess < 'here ea&h line 'ill e !la&ed
in the final res"lt matri% on the &orres!onding !la&e.
Re&ei/e ea&h s"matri%(
1. %'ti#'" = ne %'ti#'"()
2. %'ti#'"Part P = ne %'ti#'"Part()
3. co.Froadca$tB%'ti#'"7(ref , 0)
4. co./eceiGeB%'ti#'"Part7(0, 0, o%t P)
5.
Com!"te Matri% M"lti!li&ation(
6. for i=0,P.d1 i++
. &
!. for j, 0,.d3 j++
D. &
10. #artia'/e$%'t$9i:9j: = 0
11. for k = 0,.d2 k++
12. &
13. #artia'/e$%'t$9i:9j: += P.a9i:9k: * .b9k:9j:
14.
15.
16.
Send !artial res"lts to !ro&ess < (
fina'/e$%'t$f/= nefina'/e$%'t$()f/.c = #artia'/e$%'t$f/.d1 = P.d1
-
8/12/2019 Models and Algorithms for Parallel Computing
12/15
f/.d3 = .d3co.end(f/, 0, co./ank)
Create the matri% 'ith the final res"lts(
1. for =1,co.iCe ++
2. &
3. fina'/ =
co./eceiGeBfina'/e$%'t$7(;o%nicator.an"o%rce, )
4. int nr = 0
5. for i = H 1, i += co.iCe 1
6. &
. for k = 0,# k++)
!. &
D. #/e$%'t$9i:9k:+= fina'/.c9nr:9k:
10.
11. nr++
12.
13.
'ests
Matrix A(imension
Matrix !(imension
Procedural&mpl)
*o) o#'reads
Multitreading&mpl)
*o)o#
Proc)
MP&&mpl)
+,00-+,00- +,00-+.00- 400 2 1 2 4,1
4 1 4 244
-
8/12/2019 Models and Algorithms for Parallel Computing
13/15
. , . 1
10 ., 10 41
+1000-+,,- +,,-+1200- 2,22 2 21. 2 10
4 41. 4 20.1
. ,02 . 221..
10 02, 10 20.4
+,00-+100- +100-+14,0- 200 2 ,2 2 244,4 22. 4 11.4
. ,12. . 20144,
10 ,4, 10 11
(ataset1
(ataset 2
-
8/12/2019 Models and Algorithms for Parallel Computing
14/15
(ataset
The multithreaded solution was implemented in C++ and the procedural solution and
the one in MPI in C# (may be one reason of the timings difference).s you can see the
-
8/12/2019 Models and Algorithms for Parallel Computing
15/15
timings are much better in the multithreaded implementation than de procedural one
!and in most of the cases li than the MPI implementation.
It is clearly that if the number of threads it"s bigger the efficiency of the algorithm
decreases.
The alues of the matrices are random generated between $ and %$$$$.
System Specifications
The tests were made on a machine with an Intel Core i& processor and a 'M memory
of b .