Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*
description
Transcript of Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*
Alvaro Cassinelli*, Makoto Naruse*,** and Masatoshi Ishikawa*Ishikawa-Hashimoto lab. University of Tokyo*, PRESTO JST**
Quad-tree image compression using reconfigurable free-space optical interconnections
and pipelined parallel processors
LCD/SLM LCD/SLM LCD/SLM LCD/SLM
…
A C :
PRESTO = Precursory Research for Embryonic Science and Technology
JST= Japan Science and Technology
III. Conclusion and further work
Plan of the presentation
I. OCULAR architectures for computing- Reconfigurable Single Stage (OCULAR-I)
- Reconfigurable Multi-stage (OCULAR-II)
II. OCULAR-II demonstration: Quad-tree compression.
- Quad-tree compression algorithm
- Set-up and Demonstration
- Discussion
I. OCULAR architectures for computing
I.1 Reconfigurable Single Stage (OCULAR-I)
2D array of data
Photo Detector Array
Processing Element Array
VCSEL array
Optical Interconnections
Optical feed-back
I.2 Reconfigurable Multi-stage (OCULAR-II)
O ptoelectronic
C omputer
U sing
L aser
A rrays with
R econfiguration
2D array of data
Output
Photo DetectorProcessing Element Array
VCSEL
Optical Interconnections
…
network-based parallel computers
Optical technology offers enhanced parallel communication primitives
Static Dynamic Reconfigurable interconnection
(X, Y or Z).
…switches inside processors (local control)
…switches outside processors (local or global/external control possible)
I.1 Single-stage paradigm for parallel computing
P1
P2
Pn
YZ
X
Fixed interconnection (X, Y, and Z)
mux
ULA
Mem
control
P1
P2
Pn
…
……
…
…
X
Y
Z
………
…
controller
…
…of great benefit for = distributed memory
shared memory
…anyway, static networks can be redesigned as single-stage dynamic networks…
I.1 Dynamic architecture vs. static
In an n-degree static topology, each processor has n distinct
optoelectronic I/O ports…
Technologically challenging
Non reusable architecture
Bad scalability
P1
P2
Pn
…processors, switches and interconnections located in
distinct modules
Optimal use of electronic, optoelectronic and optics
Scalability, hardware reusability in other topologies
possible introduction of multiple stages…
switches interconnectionsprocessors
P1
P2
Pn
…
……
…
… ………
…
Feed-back loop
…
[slide not shown in main presentation]
I.1 OCULAR-I system architecture
Switches and interconnections : reconfigurable diffractive optics module
dynamic single stage… Elementary Processor Array
VCSEL arrayPhoto-detector array
Optical interconnection
module
Optical feed-back
P1
P2
Pn
…
……
…
…
X
Y
Z
………
…
…
…optical architecture
2D optoelectronic processing layer (PD-PE-VCSEL) +
[ Modular architecture ]
[ SIMD Processor array ]
Processing Module
Electronic mesh for rapid short range communication between PEs.
Si photo-detectors with
Integrated amplifier / threshold
8x8 PEs (on FPGA)
AB
4-neighbors VCSEL PD
ALU
mapped I/O
local memory (24 bits)
registers
PE
[ Photo-detector array ] [VCSEL array ]
850 nm VCSELs
Modulation > 1 GHz (possible 10-50 GHz)
Each array attached to a PCB
10 MHz operation demonstrated
Folded 4-f system14 x 25 x 6.2 cm
Laser diode
FT lens
Reconfigurable interconnection module
CGH is generated by an optically addressable SLM, using a laser diode and a liquid crystal display coupled trough a fiber optical plate.
Space-invariant interconnections – good/bad?
Free-space – alignment issues?
Multi-level CGH – good diffraction efficiency
Reconfiguration (“switch”) freq. – 100 Hz…
The module generates the interconnection pattern…
…it is therefore responsible for interconnection and switching
XYZ
=
alvaro:In these optical interconnection module, we require adjustable components to adopt the diffraction position on LD and PD.We have designed zooming Fourier transform lens as the adjustable component. The focal length is adjustable from 360mm to 440mm by moving one of lenses as illustrated in the figure. This function is important for matching interconnection parameters such as the pixel pitches of the VCSEL-array, the PD-array, the CGH, and for compensating for wavelength variation of the VCSEL array.
Multi-StagesSingle-Stage
S &
I - m
…
S &
I - 2
…
S &
I - 1
P1
P2
Pn
… …
I.2 Multi-stage paradigm for parallel computing
architecture can be “spanned” into
The cost of multiplying the
processors is paid back as…
Simplicity & Speed – S & I does not need to be complex (shuffle-exchange networks).
Scalability / Reconfigurability – for different topologies.
Pipelining – possible.
Theoretical background – Multi-stage architectures have been studied for decades in networking applications…
Hypercube
Mesh
Cube Cycle
Shuffle/exchange
Delta Benes
De Bruijn[computing] [computing & networking]Tree
PyramidOmega
Clos
Banyan
Switc
h &
P1
P2
Pn
…
Inte
rcon
nect
ion
Stage mStage 1
P1
P2
Pn
P1
P2
Pn
Stage 2
Optical interconnection
module
…Optical
interconnection module
Optical interconnection
module
Elementary Processor Array
VCSEL arrayPhoto-detector array
Two layer module
Optoelectronic processing module
I.2 OCULAR-II system architecture
II. Quad-tree compression on OCULAR-II
II.3 Discussion
II.1 Quad-tree compression algorithm
II.2 Set-up and Demonstration
Interconnection module (SLM)
VCSELs
Photo Detectors
PE array
PE array
Receiver array
Sender array
Electrical feed-back trough host computer
II.1 Principle of the quad-tree compression algorithm
This group of pixels is a level 2 leaf of address B
A B
D C…this pixel is a level 0 leaf of address CDA
level 1 leaf of address DB
…this pixel is NOT a leaf
…corresponding tree
B
DB
CDA
B
AC
D
level 2
level 1
level 3
level 0
D
A
B
Image…
Image as a tree = ( 2 , B ) + ( 1 , DB ) + ( 0 , CDA )
Leaf = ( level , address )
II.1 Quad-tree compression on OCULAR-II architecture
- compare on receiver side
- update leaf levels of upper-level PE, if corners resulted to be lower “false” leafs.
- sequentially broadcast leaf’s values to corresponding upper PE.
• initializationarray n
array n+1
1
3
4
2
detect upper leaves
Load 2Nx2N image. ON pixels are set as lowest level leafs on local PE memories.
• from stage to stage• detect upper leaves
array n+1
array n+2
cutting branches
- parallel broadcast signal for resetting false low-level leaves.
- Download data from last array.
- Save data (level, address) from PEs which are still leaves.
• cutting branches
• End on last stage:
A C :
Rem : data from the receiver side to the sender side is electronically feed-back trough the host computer…
Example : interconnection for processing of level 1
1) Detecting leaves
2) Conditional broadcast
A B
C D
= computing PE on array n+1
= broadcasting PE on array n
A B
C D
…Is A a level one leaf?
A
(zero order)
D
(first order)
…If so, A must update its leaf level and cut lower branches.
CCD image of PD plane
[slide not shown in main presentation]
II.2 OCULAR-II demonstrator setup
• demonstration is carried out on a two layer OCULAR II prototype
Multiple layer processing is simulated thanks to electronic feed-back between first and second processor arrays.
• Interconnection for each level are time multiplexed on the SLM module.
Level 0 Level 1 Level 2cgh
diffraction pattern
Optical interconnection
module
PE array 2PE array 1 VCSEL array PD array
• Two level CGHs are used (enough diffraction efficiency)
…quad-tree algorithm and hypercube network
Image 2n/2 x 2n/2
pixel large
XY
W
Z
Quad-tree on OCULAR-II: pairs of (6-dimensional) hypercube links are generated and multiplexed in time thanks to the SLM-based interconnection module…
…on level 1: X, Z …on level 2: Y, W
2n elementary processors arranged in a n-dimensional hypercube topology
…
Interconnection module
“sender” array (SIMD + VCELS)
“receiver” array (SIMD + PD)
Monitor CCD
CGH monitor
Control and results on host computer …
II.2 Quad-Tree Compression Demonstration Setup
Example : holograms required during level 1 processing.
1) Broadcast hologram (quadrant comparison)
2) Re-Broadcast hologram (cutting branches)
A B
C D
= computing PE
= broadcasting PE
A B
C D
Potential leaf on level one
(zero order)
D
A
(first order)
[slide not shown in main presentation]
Level 0. Detecting upper leaves.
D CA B
D C
A B
…symbolic representation of the initial tree, containing 28 level 0 (most of them false) leaves
Level 0 quadrants
level 0 leaves
true
false
Detail of level 0 broadcasting
= “D” corners with leaf bit ON
= “D” corners with leaf bit OFF.
photo-detector chip surface as seen through the alignment CCD camera
receiver array
sender array
[slide not shown in main presentation]
In this demonstration we used two-level phase CGHs
computed by SA.
Only the 1st order of diffraction is
used as the interconnection pattern.
Level 0. Cutting branches.
D C
A Bnewly
created leaf on level 1
D CBA
D C
A B
Level 1. Detecting upper leaves.
Level 1 quadrants
Level 1. Cutting branches.
D C
A B
newly created leaf
on level 2
Level 2. Detecting leaves and cutting branches.
D CBA
D C
A B
…symbolic representation of the encoded image as a minimal tree with seven leaves.
Level 2 quadrants
Also, one have to remember than our chips are only 8x8 pixel large.
However, SLM reconfiguration limits operation at maximum hundred hertz....
II.3 Discussion
28 pixels ON = 28 initial leaves. …only seven final leaves
Compression of a 2Nx2N pixel large image takes O(5.N) clock cycles...
SIMD array, VCSEL and photo-detectors can run at more than 100MHz…
two million 1024x1024 images compressed per second!
8x8 image
(N=3)15 iterations…
III. Conclusion and further work
II.1 Summary
II.2 Research underway and further work
Alignment is not difficult, but may become a critical issue in “true” multistage architectures...
I.1 Summary
Electronic feed-back trough host computer generates parasitic signals, and synchronization problems!
We have successfully tested OCULAR-II multistage architecture with reconfigurable optical interconnections by implementing quad-tree compression on binary images (=example of embedded hypercube)
Optically addressed SLM-based interconnection module accounts for the strongest bandwidth limitation (hundred hertz)
However…
III.2 Further work: OCULAR-III
Alignment issues (between 2D arrays)
[ Research underway ]
- dynamic alignment using actuators and control theory.- pre-aligned connectors using fiber-bundles.
Design of an integrated (VLSI) optoelectronic layer (with switching…)
Fiber bundle
[ Future research directions ]
- Test of these “modular” architectures for building computing and networking MINs.
- Design of all-optical networks using the above paradigm.
network interconnection modules
Processor arrays
http://www.k2.t.u-tokyo.ac.jp/index-e.html
Concurrent multistage paradigm using fixed interconnections - design of fixed, guide-wave-based pre-aligned interconnection modules (the processor array is in charge of the switching function) => OCULAR-III
IBnC