Implementation of a streaming database management system on a Blue Gene architecture for measurement...
-
Upload
bilal-harman -
Category
Documents
-
view
221 -
download
0
Transcript of Implementation of a streaming database management system on a Blue Gene architecture for measurement...
Implementation of astreaming database management system
on a Blue Gene architecture for measurement data processing.
Erik ZeitlerUppsala data base lab
www.it.uu.se/research/group/udbl
Use many large radio telescopes?
Augment the measurements using signal processingThey act together as a HUGE telescope
• Look in one direction only • Expensive…
SolutionUse a huge amount of small antennas
This enables new scientific applications (and challenges)
}• Broad band
• Multi direction receivers
Scientific applications
• Re-ionization epoch• the 1st 105 years – hydrogen forming
• Deep Extragalactic Surveys• To boldly go…
• Transient Sources• All-sky surveys of
– gamma bursts– flare stars– supernovae
• Ultra High Energy Cosmic Rays• Pulsars
Antennas, antennas, antennas…
• Broad band radio receiver• 80…300 MHz, 3 dimensions
• Produces 0.9 Gbps raw data
• Central site + 20 outstations
located within a circular area, diameter 350 km
13103 antennas
System overview
• Antennas• Basic beam forming
• FPGAs
• Network• GbE, 10GbE
• Central Processing facility• Linux clusters, IBM Blue Gene/L
• Off line analysis• PCs, workstations, Blue Gene
Central processing tasks
• FFT
• Signal correlation
• Calibration• RFI mitigation (noise from human activities)
• Stratosphere plasma
• Subtracting known objects
• Transient analysis• Peak detection
Computing challenges
• Multiple incoming data streams• 20 Tbps
• Multiple experiments• Complex computations
• Demand for rapid reconfiguration of computing systems• Use case: On-line transient analysis
Central processing facilities
• On line processing• Linux cluster (buffering)
• Light weight BG/L (beam)• 6 racks 6144 compute nodes + 96 I/O nodes
• Off-line processing• Linux clusters, SAN, GRID, …
Blue GeneDataflow supercomputer
• LLNL installation: 64 racks (65536 CPUs)
70 TFLOPS on the size of a tennis court
BG/L architecture• I/O node:
• 2x PPC440@700MHz• Linux• Each I/O node coordinates 64 compute nodes• 512 MB RAM
• Compute node:• 2x PPC440@700MHz
• Single threaded light weight OS• Typically:
– 1 CPU for computation– 1 CPU for communication
• 512 MB RAM
Co
ntin
uo
us
qu
ery
Qu
ery
resu
lt st
rea
m
(Scientist)user
BG/L dataflow computerIncoming
measurement datastreams
Co
ntin
uo
us
qu
ery
Qu
ery
resu
lt st
rea
m
Co
ntin
uo
us
qu
ery
Qu
ery
resu
lt st
rea
m
(Scientist)user
User agent
UDBL project
• Implement a very high performance stream database manager• based on AmosII DB kernel (http://user.it.uu.se/~udbl/amos/)
• Utilize the BG/L computing environment for• scalable data stream queries• involving user-defined computations
• Implement specialized query optimization:• Planning BG/L node configuration for given stream queries• Re-configuration when interesting phenomena occur
This far (after 4 months)• Implementing primitives for data ~
• Computation• Aggregation• Communication• Fusion
• Proof of concept cases• Signal processing• Peak detection• Stream join
• Benchmark• Based on real LOFAR/LOIS data• Performance analysis for stream databases
A simple example• gnuplot(peakdetect(vector_elements(winagg(vector_elements(readlofarvectorfile("temp.DAT")),256,256))));
Other application areas
• Other space physics research areas• projects at IRFU
• Network traffic analysis
• Financial (stock market) information
• Content analysis of streaming media