Grid Computing Jorge Gomes Laboratório de Instrumentação e Física Experimental de Partículas...
-
Upload
ezekiel-cressey -
Category
Documents
-
view
218 -
download
1
Transcript of Grid Computing Jorge Gomes Laboratório de Instrumentação e Física Experimental de Partículas...
Grid ComputingJorge Gomes
Laboratório de Instrumentação e Física Experimental de Partículas
R-ECFA Workshop 2008, Lisbon, 28 March 2008
LIP and grid computing
• LIP participates in the ATLAS and CMS experiments • The LHC data management and processing requires a novel
computing approach:
– Highly distributed community and resources • both geographically and administrative wise
– Integration of all computing and storage resources from the collaborating institutes
– Common interface to access the resources– Transparent access to the resources– Single sign-on– e-Infrastructure …
• Grid computing was adopted to implement the LHC computing infrastructure
• Consequently LIP become involved in grid computing • Now grid computing is also being used at LIP by non-LHC
experiments
LIP in international grid computing projects
DataGrid CrossGrid LCG EGEE-I EELA EGEE-II Int.Eu.Grid
2001
2002
2003
2004
2005
2006
2007
2008
LIP in international grid computing projects
• DataGrid + EGEE I and II: EU funded projects coordinated by CERN middleware and production infrastructures for multidisciplinary data intensive grid computing
• CrossGrid and Int.Eu.Grid: EU funded projects, middleware and production grids for data intensive parallel and interactive applications
• EELA: EU funded project to build a pilot grid in Latin America• LCG: CERN long-term project to support the LHC computing• All projects based on the same middleware EDG/LCG/gLite
DataGrid EGEE-I EGEE-II
CrossGrid
LHC Computing Grid (LCG)
Int.Eu.Grid
EDG LCG gLite
EELA
Strategy
1. Learn about grid middleware (DataGrid, CrossGrid, …)– Gain know-how and experience– Follow the evolution
2. Contribute to the technology and build a team– Engineers at CERN working for LCG (AdI coordinated by LIP)– Participate in international grid projects (EGEE etc…)
• Infrastructure operations– Core services– User support / Helpdesk– Deployment coordination
• Middleware test / validation• Middleware integration• Resource provider• Support services
– Certification Authority for Portugal– Grid training
3. Deploy and operate an LCG Tier-2 for Portugal
Focus in the same areas in all projects:• take advantage of
knowledge and synergies • maximize resources• work on areas related with
IT operations
People
Team Working
Jorge Gomes Coordination
Mário David Grid Computing
Gonçalo Borges Grid Computing
Manuel Montecelo Grid Computing
João Martins Systems administration
Nuno Dias Systems administration, LIP CA manager
Miguel Oliveira Grid Computing (Coimbra site)
Hugo Gomes Web
José Aparício Technical support
9 FTEs in the computing team including grid
F
Projects and Grid Infrastructures
The LCG depends on two major grid computing infrastructures ...
The biggest grid
computing infrastructure
worldwide
LCG Infrastructure
240 sites45 countries41,000 CPUs5 PetaBytes>10,000 users>150 VOs>100,000 jobs/day
ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial Sciences…
91 partners in 32 countries25 collaborating projects
EGEE in Portugal
• Portugal and Spain constitute the EGEE Southwest federation
• LIP coordinates EGEE in Portugal
• EGEE Resource Centres:– LIP
• Lisboa (core services, production and pre-prod)
• Coimbra
– Univ Lusiada• Famalicão
– Univ Porto• Porto (3 clusters)
– Univ Minho• Braga
– CFP-IST• Lisbon
– IEETA• Aveiro (pre-prod)
Int.EU.Grid infrastructure
• Grid infrastructure focused on:– Parallel processing– Interactivity– Extending the glite MW
• 12 centers• 7 countries
• Virtual Organizations:– ifusion– ienvmod– iusct– ibrain– ihep– iplanck– iwien2k– icompchem– ...
Grid OperationsManagement
LIP Coordinates the infrastructure operation
EELA
• E-Infrastructure shared between Europe and Latin America • EU funded project coordinated by CIEMAT• January 2006 – December 2007• 25 partners in Europe and Latin America
• Mexico, Brazil, Cuba, Chile, Venezuela, Argentina, Portugal, Italy e Spain, CERN, CLARA
• EGEE extension to Latin America• Pilot grid infrastructure• Dissemination and training
• LIP responsible for the authentication and VO management task• Coordinate the deployment of grid certification authorities
• Brazil, Argentina, Chile, Mexico, catch-all• Virtual organizations management and authorization• File catalogue core services
F
At Work
Processing and storage capacity at LIP
• Lisbon– Farm (EGEE/LCG, Int.Eu.Grid, EELA)
• 88 CPU COREs AMD Opteron and Intel Xeon• 2 to 4 CPU COREs per machine• 1 to 2GB RAM per CPU CORE• Gigabit Ethernet• LRMS: Sun Grid Engine (SGE)• dCache storage
– Pré-production (EGEE)• Small farm (Pentium III and Pentium IV)• Some storage
• Coimbra– Farm (EGEE/LCG)
• 84 Workstations Pentium IV CPUs 2.2 to 3.0GHz• 1 CPU per machine• 1 to 2GB RAM per CPU• Fast Ethernet• LRMS: Torque/MAUI• DPM storage
LIP grid core services
• LIP operates grid core services for several infrastructures– EGEE– Int.Eu.Grid– EELA
• These services include:– Resource Brokers / CrossBrokers– Information Indexes– RAS servers– LFC servers– VOMS servers– Myproxy server
• Hosted at:– FCCN datacenter– LIP-Lisbon datacenter
LIP Certification Authority
• The LIP CA is the grid certification authority for Portugal– Accredited by the International Grid Trust Federation (IGTF)– Registered at the TERENA TACAR trust anchor
• The LIP CA is member of the EUgridPMA• Registration authorities at:
– Centro de Física de Plasmas (IST)– Centro de Sistemas Inteligentes (UALG)– FCT/UNL– Instituto de Engenharia Electrónica e Telemática de Aveiro (UA)– LIP Lisboa– LIP Coimbra (UC)– Universidade Lusíada (Famalicão)– Universidade Autónoma de Lisboa– Universidade do Minho– Universidade do Porto
http://ca.lip.pt
Portugal in EGEE
EGEE production infrastructure usage including LCG
Grid Computing resources:• LIP-Lisbon 28 - 75 CPU cores• LIP-Coimbra 84 CPU cores
CPU usage in the EGEE SWE federation
Atlas + CMS + Compass + Auger
F
The Portuguese National Grid Initiative
National grid initiative
• Officially launched in April of 2006 by the Portuguese Ministry of Science– Support the development of grid infrastructures for
complex problem solving– Development of competences – Integrate Portugal in major international grid
infrastructures
• LIP participates in the coordination of the initiative• Activities of the initiative
– Funding: 15 pilot projects (1.500.000 €)– Networks for grid computing– Infrastructures:
• INGRID+ (national grid infrastructure based on EGEE)• Main node for grid computing
International network connectivity
• International connectivity being deeply changed with new links being deployed:– North: Minho - Galicia– South: through Spanish Estremadura
• Objectives – Better geant connectivity (higher bandwidth)– Redundancy between both countries– Grid computing support– Have dedicated fibres
• Grid computing connectivity– Dedicated bandwidth for grid computing– Separate commodity and grid network traffic
Main node for grid computing
• Deployment of the Portuguese main node for grid computing– First step towards INGRID+– The project began in the summer of 2007– Expected to become operational in September of 2008– Consortium: LIP, FCCN and LNEC
• Datacenter dedicated to grid computing– Host the core grid services for Portugal– Host major computing and storage resources– Host resources from other organizations
• Users– National grid initiative projects– National Tier-2 for the LCG– Projects in the context of IBERGRID– Portuguese researchers with demanding
computing requirements
F
The Portuguese Tier-2 deployment
LHC Computing Grid
• The Portuguese government has signed the LCG MoU
• A Portuguese Tier-2 for LCG is being deployed:– Resource centre providing processing and storage
capacity integrated into the LHC Computing Grid– Supporting the ATLAS and CMS VOs
• The Portuguese Tier-2 will have 3 centres:– LIP-Lisbon
• Hosted at the LIP datacenter in Lisbon• Operational
– LIP-Coimbra• Hosted in partnership with CFC• Operational since mid 2007
– Main node for grid computing • National grid initiative project• Being built at the LNEC campus in Lisbon
LCG MoU
• The LIP federated Tier-2 is delivering 13% of the CPU capacity foreseen for 2008
• The LIP Tier-2 storage capacity initially foreseen in the MoU is not sufficient
• LHC is about to start we need to ramp-up• The available network bandwidth is already above the MoU
• The LCG Tier-1 for Portugal is PIC in Barcelona
Ongoing: processing and storage upgrade
• Lisbon and Coimbra– ≥360 - 384 CPU COREs
• two x quad core CPUs• 24GB RAM• 750GB SATA disks• Gigabit Ethernet• IPMI mgmt
– >300TB storage• ~ 11 - 15 servers• Storage based on SATA disks• RAID 5 and 6• Quad gigabit Ethernet• IPMI mgmt• Attachment
– Fibre Channel– SAS or SATA
• Currently benchmarking and testing hardware from several vendors
Internet connectivity
• Lisbon:– New fibre link LIP<->FCCN
• Fibre rented from PT• Max bandwidth 10 Gbps
– Agreed capacity with FCCN • 1 Gigabit/s full duplex• GBICs Gigabit Ethernet LX• 995Mbps academic (Geant tagging) + 5Mbps commercial
• Coimbra:– The LIP farm is hosted at CFC and shares the network
connectivity with CFC• CFC shares the University of Coimbra Internet link now being
upgraded• The LIP cluster will have a dedicated link
Ongoing: network
• FCCN will provide network connectivity for grid computing:– Possibly enabling direct connectivity between Lisbon
and Coimbra clusters– Possibly enabling more bandwidth at both sites– The FCCN infrastructure already has the hardware
capabilities to provide the service– FCCN workshop on 15 April
• Upgrade the datacentre LANs in Lisbon and Coimbra– New core switches– Modular switches wire speed non-blocking – High density gigabit Ethernet– Some 10gigabit Ethernet ports– Redundant configuration– Ready for L2/L3 WAN connectivity
Main node for grid computing
• GRID datacenter– Being built at LNEC
• Civil construction ongoing• Ready on September 2008
– Located near the FCCN NOC• Direct connectivity to the FCCN backbone and Geant point of presence
– Adequate power and cooling infrastructure for grid computing– Budget > 3.200.000€
• Host:– Core GRID services INGRID+ and other projects– GRID computing cluster
• > 600 CPU COREs and ~ 600TBin the end of 2008
• Up to ~ 2000 CPU COREs and ~ 1000TB of storage
• Resources managed by LIP
– FCCN nearline storage• Tape robot for data repositories
– Host grid resources from other organizations• LNEC GRID cluster
Integrated in the Tier-2LCG will be the main user
F
Future
IBERGRID
• IBERGRID– A common Iberian grid infrastructure is being prepared:
• In the context of agreements between the Portuguese and Spanish governments
• Sharing of resources between Portugal and Spain
– Main areas:• Networking, grid, supercomputing and applications
– Based on and profiting from the current collaboration in the framework of European projects:
• EGEE, int.eu.grid, EELA
– LIP is deeply involved:• Coordination of the initiative• Infrastructure coordination for Portugal
– Conferences:• 1st conference took place in Santiago de Compostela in May of
2007• 2nd conference will take place at the University of Porto
EGEE-III
• EGEE-III project approved and being signed by the partners– Expand the EGEE infrastructure
• More resources
• More users and communities
– Prepare the migration towards a future European Infrastructure (EGI) based on the national grid initiatives (NGIs)
• Will start after EGEE-II (two years duration)• Total budget 68.900.000 €
– EU contribution 36.250.000 €– EU contribution for LIP 300.000 €
Concerns
• Deployment and operation of the main node for grid computing– big challenge due to the dimension and characteristics– serving new communities– core services for national grid initiative and IBERGRID
• The LHC is starting – the LCG usage will increase and so the problems
• Need more human resources– user support and tier-2 operation– but attract people has been very difficult …
• Costs of infrastructure operation– electricity, maintenance etc…– these costs will increase a lot with the new deployments and
upgrades
Thank you ...