Alexey Pakhunov /XCG, Microsoft Research/ [email protected] March 30 th, 2011.
-
Upload
chester-todd -
Category
Documents
-
view
217 -
download
1
Transcript of Alexey Pakhunov /XCG, Microsoft Research/ [email protected] March 30 th, 2011.
![Page 1: Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649f4f5503460f94c7082a/html5/thumbnails/1.jpg)
SCC Development Experiences
Alexey Pakhunov /XCG, Microsoft Research/[email protected]
March 30th, 2011
![Page 2: Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649f4f5503460f94c7082a/html5/thumbnails/2.jpg)
2
Overview
Black Cloud OS: A fork of Singularity OS Our playground for experimenting with
message passing in non-cache coherent environment
This presentation covers only our development experiences on the SCC Submission of the paper is on its way
![Page 3: Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649f4f5503460f94c7082a/html5/thumbnails/3.jpg)
3
What is Singularity?
A quote from Singularity home page:“A research operating system prototype, extending programming languages, and developing new techniques and tools for specifying and verifying program behavior”
Written in managed code Some Assembler and C++ in the boot
loader and kernel IPC and inter-component
communications are based on passing messages
![Page 4: Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649f4f5503460f94c7082a/html5/thumbnails/4.jpg)
4
Our setup
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
Tile
R R
R R
R R
R R
R R
R R
R R
R R
R R
R R
R R
R R
DDR3 MC
DDR3 MC
DDR3 MC
DDR3 MC
VRC System Interface
PCI-E
Management Console (Linux)
sccTcpServer/mceGui
Desktop PC (Windows)
RcLoader.Net, KdProxy, WinDbg, etc.
TCP/IP
![Page 5: Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649f4f5503460f94c7082a/html5/thumbnails/5.jpg)
5
RcLoader.Net
Configuration Generates the system memory map Configures the SCC registers Uploads the boot loader and OS images Supports manual editing of the SCC
configuration
Debugging Allows inspecting the memory and
configuration registers
![Page 6: Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649f4f5503460f94c7082a/html5/thumbnails/6.jpg)
6
The memory map
Shared memory (OS image, the initial jmp)
Unused
Shared memory buffers (256KB per core)
Configuration space
MPB (16KB per tile)
Unused
Private Memory (336 MB - 1360 MB)0x00000000 - up to 0x54FFFFFF
0x80000000 – 0x97FFFFFF
0xA0000000 – 0xB7FFFFFF
0xC0000000 – 0xC3FFFFFF
0xFC000000 – 0xFFFFFFFF
![Page 7: Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649f4f5503460f94c7082a/html5/thumbnails/7.jpg)
7
Debugging challenges
No serial port or console Memory at 0xb8000 is the console buffer
I/O redirection doesn’t work as expected Execution of IN or OUT instruction effectively
halts the core and sccTcpServer
Serial KD transport is emulated A couple of ring buffers on the SCC side KdProxy.exe exposes a named pipe interface for
the debugger
![Page 8: Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649f4f5503460f94c7082a/html5/thumbnails/8.jpg)
8
Porting challenges
No BIOS The system memory map is patched directly in the boot
loader
No standard devices Local APIC is used instead of i8254 timer and PIC No RTC clock
No modern instruction supported Context handling code was updated due to lack of MMX▪ 32bit flavor of Singularity uses only x87 for floating point
calculations Bartok compiler was patched due to lack of CMOV
instructions
![Page 9: Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649f4f5503460f94c7082a/html5/thumbnails/9.jpg)
9
Experimental hardware
Turning on MPB bypass bit causes a race causing memory corruptions Minus three days of debugging :-) We couldn’t take advantage of fast MPB
access
Large pages cannot be used together with MPB Singularity uses large pages to create
the identity mapping spanning 4GB
![Page 10: Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649f4f5503460f94c7082a/html5/thumbnails/10.jpg)
10
Interface
A telnet connection to each core The same serial transport emulation via
KdProxy.exe was used
![Page 11: Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649f4f5503460f94c7082a/html5/thumbnails/11.jpg)
11
Cache coherency matters A read-only OS image is shared among all
cores
Message passing code uses MPB-mapped buffers and CL1FLUSH-aware memcpy()
Large shared memory storage is accessible via dynamically remapped LUTs R/W access is possible with proper cache
flushing and/or caching settings in PTEs
![Page 12: Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649f4f5503460f94c7082a/html5/thumbnails/12.jpg)
12
Performance
Core’s memory interface bandwidth is limited One outstanding memory operationFrequency (MHz) A cache line
writing latency (cycles)
Maximum bandwidth
(MB/s)
533 39 (cached) 417
533 131 (uncached) 124.2
![Page 13: Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649f4f5503460f94c7082a/html5/thumbnails/13.jpg)
13
Performance
Memory controller bandwidth is limited
1 2 3 4 5 6 7 8 9 10 11 120
100000000
200000000
300000000
400000000
500000000
600000000
700000000
Write Bandwidth (uncached)Read Bandwidth (uncached)Write Bandwidth (cached)Read Bandwidth (cached)
Number of Cores
Bandw
idth
in M
B/s
![Page 14: Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649f4f5503460f94c7082a/html5/thumbnails/14.jpg)
14
Conclusions
The SCC is an experimental platform tailored for message passing Lack of cache coherency makes us think hard
how about message passing The chip has enough cores to play with
scalability
Compare apples to apples The cache and memory subsystems are
significantly different The SCC is super parallel, not super fast
![Page 15: Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649f4f5503460f94c7082a/html5/thumbnails/15.jpg)
15
Q&A