BSP on the Origin2000
description
Transcript of BSP on the Origin2000
![Page 1: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/1.jpg)
BSP on the Origin2000
Lab for the course:Seminar in Scientific Computing with
BSP Dr. Anne Weill – [email protected] ,ph:4997
![Page 2: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/2.jpg)
Origin2000 (SGI)
32 processors
![Page 3: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/3.jpg)
Origin2000/3000 architecture features
Important hardware and software components:
* node board: processors + memory
* node interconnect topology and configurations
* scalability of the architecture
* directory-based cache coherency
* single system image components
![Page 4: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/4.jpg)
Origin2000 node board
![Page 5: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/5.jpg)
Origin2000 – two nodes
![Page 6: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/6.jpg)
Origin2000 interconnect
![Page 7: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/7.jpg)
Origin2000 interconnect
32 processors
64 processors
![Page 8: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/8.jpg)
Origin router interconnect
- Router chip has 6 CrayLink interfaces: 2 for connections to nodes (HUBs) and 4 for connections to other routers in the network * 4-dimensional interconnect
- Router links are point-to-point connections 17+7 wires @ 400 MHz (that is, wire speed 800 MB/s)
- Worm hole routing with static routing table loaded at boot - Router delay is 50 ns in one direction
- The interconnect topology is determined by the size of the computer (number of nodes): * direct (back-to-back) connection for 2 nodes (4 cpu) * strongly connected cube up to 32 cpu * hypercube for up to 64 cpu * hypercube of hypercubes for up to 256 cpu
![Page 9: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/9.jpg)
Origin address space - Physically the memory is distributed and not contiguous - Node id is assigned at boot time
- Logically memory is a shared single contiguous address space, the virtual address space is 44 bits (16 TB) - A program (compiler) uses the virtual address space - CPU translates from virtual to physical address space
node id 8 bits Node offset 32 bits (4 GB)
39 32 31 0
k1n
0
012
n
TLB
Physical
Virtual TLB – Translation Look-aside Buffer
0 1 2 3 .. Node id
Empty slot
Memory present
page
![Page 10: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/10.jpg)
Login to carmel
1. Open an ssh window to :
carmel.technion.ac.il
2. Username : course01-course20
Password : bsp2006
Contact : Dr. Anne Weill – [email protected] ,
phone :4997
![Page 11: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/11.jpg)
Compiling and running codes
1. Setting path set path=($path /u/tcc/anne/BSP/bin)
2. Compiling%bspcc prog1.c –o prog1%bspcc –flibrary-level 1 prog1.c –o prog1 (for non-dedicated machine)
3. Running%bsprun –npes 4 prog1
![Page 12: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/12.jpg)
Running on carmel
1. Interactive mode :
% ./prog.exe <parameters>
2. NQE queues:
% qsub –q qcourse script.bat
![Page 13: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/13.jpg)
BSP functions
bsp_begin(maxpr) Start of program with at most maxpr processes
bsp_end() End of program
bsp_nprocs() Number of processes currently running
bsp_pid() Returns process id`
bsp_time() Returns elapsed wallclock time
![Page 14: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/14.jpg)
Sample program
![Page 15: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/15.jpg)
Output of hello program
![Page 16: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/16.jpg)
How it works
bsprun
P0
P1
P2
P3
Prog.exe
Prog.exe
Prog.exe
Prog.exe
![Page 17: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/17.jpg)
SPMD – single program multiple data
• Each processor views only its local memory.
• Contents of variable X are different in different processors.
• Transfer of data can occur in principle through one-sided or two-sided communication.
![Page 18: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/18.jpg)
DRMA- direct remote memory access
• All processors must register the space into which remote “read” and “write” will happen
• Calls to bsp_put
• Calls to bsp_get
• Call to bsp_sync – all processors synchronize, all communication is completed after the call
![Page 19: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/19.jpg)
BSP functions for communication
bsp_push_reg(var,nbytes) Registration of variable
bsp_put(pid,source,dest,offset,nbytes)
Pid is destination processor
bsp_get(pid , source,offset,dest,nbytes
Pid is source processor
bsp_pop_reg(var) `
![Page 20: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/20.jpg)
Running on carmel
1. Interactive mode :
% ./prog.exe <parameters>
2. NQE queues:
% qsub –q qcourse script.bat
![Page 21: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/21.jpg)
Script file for batch
![Page 22: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/22.jpg)
Output of command: “qstat –a”
![Page 23: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/23.jpg)
Another example
*What does the following program ?
• What will the program print ?
![Page 24: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/24.jpg)
![Page 25: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/25.jpg)
Output of program
![Page 26: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/26.jpg)
Another example
* Is there a problem with the following example?
• What will the program print ?
![Page 27: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/27.jpg)
![Page 28: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/28.jpg)
Answer
• As it is written, the program will not print any output : the data is actually transferred only after the bsp_sync statement
• Additional question : what will the program print if bsp_sync is placed right after the put statement?
• NB : the programs are in directory /u/tcc/anne/BSPcourse, under prog2.c and prog2wrong.c – try them
![Page 29: BSP on the Origin2000](https://reader036.fdocuments.in/reader036/viewer/2022062304/56813fe8550346895daad910/html5/thumbnails/29.jpg)
Exercise1 (due Nov. 26d 2006)
1. Copy over to your directory the directory: /u/tcc/anne/BSPcourse. Take a look at the bspedupack.h file.
2. Write a C program in which each processor writes its pid into an array PIDS(0:p-1) on p0. (PIDS(i)=i).
3. Run the program for p=1,2,4,8,16 processors and print PIDS. You can run it interactively.
4. Same with a get instruction.