Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and...
-
Upload
april-joseph -
Category
Documents
-
view
216 -
download
0
Transcript of Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and...
Discovery Net
Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering)
Bob Spence (Dept. of Electrical Engineering) Tony Cass (Department of Biochemistry), Sevket Durucan (T. H. Huxley School of Environment)
Imperial College London
Discovery Net
AIM
To design, develop and implement an infrastructure to support real time processing, interaction, integration, visualisation and mining of massive amounts of time critical data generated by high throughput devices.
The Consortium
Industry Connection : 4 Spin-off companies + related companies (AstraZeneca, Pfizer, GSK, Cisco, IBM, HP, Fujitsu, Gene Logic, Applera, Evotec, International Power, Hydro Quebec, BP, British Energy, ….)
Industrial Contribution
Hardware : sensors (photodiode arrays, hybrid photodiodes, PMTs), systems (optics, mechanical systems, DSPs, FPGAs)
Software (analysis packages, algorithms, data warehousing and mining systems)
Intellectual Property: access to IP portfolio suite at no cost
Data: raw and processed data from biotechnology, pharmacogenomic, remote sensing (GUSTO installations, satellite data from geo-hazard programmes) and renewable energy data (from our own remote tidal power systems)
High Throughput Sensing
Characteristics
Different Devices but same computational characteristics
•Data intensive &
• Data dispersive
•large scale,
•heterogeneous
•distributed data
•Real-time data manipulation Need to
• calibrate
• integrate
• analyse
GRID issues: wide area, high volume, scalability (data, users), collaboration
Data issues: different measurements for same object: Data registration, normalisation, calibration & quality control
Information issues: annotationssemantics, reference, integrated view of data
Discovery issues: Distributed Knowledge Discovery, Management Incremental, Interactive Discovery & Collaborative Discovery
Distributed DevicesDistributed
warehousing
Distributed Reference DBs
Distributed Users
Collaborative applications
High Throughput Computing Services
Distributed Data EngineeringData Registration, Data Normalisation, Data Quality
Information StructuringInformation Integration & Composition,
Semantics & Domain-based Ontologies, Sharing
Grid-based Knowledge DiscoveryGrid-based Data Mining, Collaborative Visualisation
DNet ArchitectureHigh Throughput Sensing (HTS) Applications
Large-scale Dynamic Real- time Decision
support
Large-scale Dynamic System Knowledge
Discovery
Grid Basic InfrastructureGlobus/Cordon/SRB
Utilising Grid Infrastructure for HT Computing
Based on
Kensington
Discovery P
latform
Based on
Globus &
O
RB
Infrastructure
Testbed ApplicationsHTS Applications
Large-scale Dynamic Real- time Decision support
Large-scale Dynamic System Knowledge Discovery
Bio Chip Applications
Protein-folding chips: SNP chips, Diff. Gene chips using LFIIProtein-based fluorescent micro arrays
Renewable energy Applications
Tidal EnergyConnections to other renewable initiatives (solar, biomass, fuel cells), & to CHP and baseload stations
Remote Sensing Applications
Air Sensing, GUSTOGeological, geohazard analysis
1-100
10-100
>50000Image
RegistrationVisualisation
PredictiveModelling
RT decisions
1-100010-1000 >10000
Data QualityVisualisationStructuringClusteringDistributed Dynamic
Knowledge Management
Throughput(GB/s)
Size(petabytes)
Node Number
operations
1-10 1-10
>20000
StructuringMiningOptimisationRT decisions
Large-scale urban air sensing applicationsEach GUSTO air pollution system produces 1kbit per second, or 1010 bits per year. We expect to increase the number (from the present 2 systems) to over 20,000 over next 3 years, to reach a total of 0.6 petabytes of data within the 3-year ramp-up.
GUSTO
GUSTO
NO
simulant 6.7.2001
The useful information comes from time-resolved correlations among remote stations, and with other environmental data sets.
You are here
Electrical grid
There is large potential in embedded generation renewable sources – they will dominate in new build (nuclear., hydro and carbon) power stations. Decentralised power is the new paradigm. .
Renewables characterised by •large number of small units, •often in remote areas•wireless connectivity•fluctuating,unpredictable loading
As total exceeds 12% grid control becomes very difficult without RT e-grid.
•active management, •RT monitoring, •RT control, •minute to minute security, •pan network optimisation.
•This requires very high bandwidth •RT remote station data acquisition, •warehousing and analysis.
The IC AdvantageThe IC infrastructure: microgird for the testbed
ICPC Resource
+20 TB of disk storage
+25 TB of tape storage
3 Clusters
(> 1 Tera Flops)
Network upgrade
Over than 12000 end devices
10 Mb/s – 1Gb/s to end devices
1 Gb/s between floors
10 Gb/s to backbone
10 Gb/s between backbone router matrix and wireless capability
2x1Gb/s to LMAN II
(10Gb/s scheduled 2004)
Access to disparate off-campus sites: IC hospitals, Wye College etc.
Core router switches
Building router switches
Floor switches
End devices
Core Fibre
Core to Building Fibre
Building Riser Fibre
Cat 5 floor wiring
London MANJANET
Proposed firewall
workstation cluster
storage
SMP
Central Computing Facilities
wireless
End devices
Floor switches
Building Router Switches
Core Router Switches
Proposed Firewall
London MAN/ JANET
£3m SRIF funding
150 Gflops Processing
>100 GB Memory
5 TB of disk storage
Particle Physics and Astronomy Research Council (PPARC)
ASTROGRID (http://www.astrogrid.ac.uk/)
a ~£5M project aimed at building a data-grid for UK astronomy, which will form the UK contribution to a global Virtual Observatory
Particle Physics and Astronomy Research Council (PPARC)
GridPP (http://www.gridpp.ac.uk/)
to develop the Grid technologies required to meet the LHC computing challenge
collaboration with international grid developments in Europe and the US
EPSRC Testbeds (1)
MyGrid Personalised extensible environments for data-intensive in silico experiments in biology
Distributed Aircraft Maintenance Environment
RealityGrid closely couple high performance computing, high throughput experiment and visualization
EPSRC Testbeds (2)
GEODISE : Grid Enabled Optimisation and DesIgn Search for Engineering
CombiChem : Combinatorial Chemistry Structure-Property Mapping
Discovery Net : High Throughput Sensing