An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick Steve Timm.

23
An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick & Steve Timm

description

Definition A Campus Grid is a distributed collection of [compute and storage] resources, provisioned by one or more stakeholders, that can be seamlessly accessed through one or more [Grid] portals. 19-Apr-20102Campus Grids

Transcript of An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick Steve Timm.

Page 1: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

An Introduction to Campus Grids19-Apr-2010

Keith Chadwick&

Steve Timm

Page 2: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 2

Outline• Definition of a Campus Grid• Why do Campus Grids• Drawbacks of Campus Grids• Examples of Campus Grids

– GLOW– Purdue– University of California– Nebraska

• FermiGrid– Pre-Grid Situation– Architecture– Metrics

• Evolution & Other Considerations• Cloud Computing• Additional Resources• Conclusions

19-Apr-2010

Page 3: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Definition• A Campus Grid is a distributed

collection of [compute and storage] resources, provisioned by one or more stakeholders, that can be seamlessly accessed through one or more [Grid] portals.

19-Apr-2010 3Campus Grids

Page 4: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 4

Why Do Campus Grids ?• Improve utilization of (existing) resources – don’t purchase

resources when they are not needed.– Cost savings.

• Provide common administrative framework and user experience.– Cost savings.

• Buy resources (clusters) in “bulk” @ lower costs.– Cost savings.

• Lower maintenance costs.– Cost savings.

• Unified user interface will reduce the amount of user training required to make effective use of the resources.– Cost savings.

19-Apr-2010

Page 5: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 5

What are the drawbacks ?• Additional centralized infrastructure to provision and support.

– Additional costs.– Can be provisioned incrementally to manage buy-in costs.– Virtual machines can be used to lower buy-in costs.

• Can make problem diagnosis somewhat more complicated.– Correlation of multiple logs across administrative boundaries.– A central log repository is one mechanism to manage this.

• Not appropriate for all workloads.– Don’t want campus financials running on the same resources as research.

• Have to learn (and teach the user community) how to route jobs to the appropriate resources.– Trivially parallel jobs require different resources than MPI jobs.– I/O intensive jobs require different resources than compute intensive jobs.

• Limited stakeholder buy-in may lead to a campus grid that's less interoperable than you might like.

19-Apr-2010

Page 6: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 6

GLOW• Single Globus Gatekeeper (GLOW)• Large central cluster funded by grant.• Multiple department based clusters all

running Condor.• Departments have priority [preemptive]

access to their clusters.• Clusters interchange workloads using

Condor “flocking”.• Approximately 1/3 of jobs are

opportunistic.19-Apr-2010

Page 7: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 7

Purdue• Single Gatekeeper (Purdue-Steele)• Centrally managed “Steele” cluster.• ??? Nodes, ??? Slots• Departments purchase “slots” on the cluster.• Primary batch scheduler is PBS for purchased

slots.• Secondary batch scheduler is Condor for

opportunistic computing.• Condor is configured to only run jobs when

PBS is not running a job on the node.

19-Apr-2010

Page 8: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 8

University of California• Multiple campuses.• Each campus has a local campus

Grid portal.• Overall Grid portal in addition.• Access is Web portal based.

19-Apr-2010

Page 9: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 9

Nebraska• 3 Campuses across Nebraska.• Being commissioned now.

19-Apr-2010

Page 10: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 10

Fermilab – Pre Grid• Multiple “siloed” clusters, each dedicated to a particular

stakeholder:– CDF – 2 clusters, ~2,000 slots– D0 – 2 clusters, ~2,000 slots– CMS – 1 cluster, ~4,000 slots– GP – 1 cluster, ~500 slots

• Difficult to share:– When a stakeholder needed more resources, or did not need all of their

currently allocated resources, it was extremely difficult to move jobs or resources to match the demand.

• Multiple interfaces and worker node configurations:– CDF – Kerberos + Condor– D0 – Kerberos + PBS– CMS – Grid + Condor– GP – Kerberos + FBSNG

19-Apr-2010

Page 11: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 11

FermiGrid - Today• Site Wide Globus Gatekeeper (FNAL_FERMIGRID).• Centrally Managed Services (VOMS, GUMS, SAZ, MySQL,

MyProxy, Squid, Accounting, etc.)• Compute Resources are “owned” by various stakeholders:Compute Resources # Clusters # Gatekeepers Batch System # Batch Slots

CDF 3 5 Condor 5685

D0 2 2 PBS 5305

CMS 1 4 Condor 6904

GP 1 3 Condor 1901

Total 7 15 n/a ~19,000

Sleeper Pool 1 2 Condor ~14,200

19-Apr-2010

Page 12: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 12

FermiGrid - Architecture

19-Apr-2010

VOMSServer

SAZServer

GUMSServer

FERMIGRID SE(dcache SRM) BlueArc

CDFOSG0

CDFOSG2

D0CAB1

GPGrid

SAZServer

GUMSServer

Step 2 - user iss

ues voms-proxy-init

user receives voms sig

ned credentials

Step 5 – Gateway requests GUMS

Mapping based on VO & Role

Step 4 – Gateway checks against

Site Authorization Service clusters send ClassAdsvia CEMon

to the site wide gateway

Step 6 - Grid job is forwarded

to target cluster

Periodic

Synchronization

D0CAB2

Exterior

Interior

VOMRSServer

Periodic

Synchronization

Step

1 - u

ser

regist

ers w

ith V

OVOMSServer

CDFOSG3

Squid

Site Wide

Gateway

CMSWC4

CMSWC2

CMSWC3

CMSWC1

Step 3 – user submits their grid job viaglobus-job-run, globus-job-submit, or condor-g Gratia Gratia

CDFOSG1

CDFOSG4

Page 13: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 13

FermiGrid HA Services - 1

19-Apr-2010

Client(s)

Replication

LVS

Standby

VOMS

Active

VOMS

Active

GUMS

Active

GUMS

Active

SAZ

Active

SAZ

Active

LVS

Standby

LVS

Active

MySQLActive

MySQLActive

LVS

Active

Heartbeat Heartbeat

Page 14: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 14

FermiGrid HA Services - 2

19-Apr-2010

• Active

• fermigrid5

• Xen Domain 0

• Active

• fermigrid6

• Xen Domain 0

–• Active fg5x1• VOMS • Xen VM 1

• Active fg5x2• GUMS • Xen VM 2

• Active fg5x3• SAZ • Xen VM 3

• Active fg5x4• MySQL • Xen VM 4

–• Active fg5x0• LVS • Xen VM 0

–• Active fg6x1• VOMS • Xen VM 1

• Active fg6x2• GUMS • Xen VM 2

• Active fg6x3• SAZ • Xen VM 3

• Active fg6x4• MySQL • Xen VM 4

–• Standby fg6x0• LVS • Xen VM 0

Page 15: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 15

(Simplified) FermiGrid Network

19-Apr-2010

san

ba head 1 ba head 2

s-s-fcc2-server3

s-s-hub-fcc

s-s-fcc1-server

s-s-fcc2-server

fermigrid0

fnpc3x1

s-cdf-cas-fcc2e

s-cdf-fcc1

fcdf1x1

fcdf2x1 fcdfosg4

fcdfosg3

s-d0-fcc1w-cas

d0osg1x1 d0osg2x1

fnpc4x1 fnpc5x2

s-f-grid-fcc1

fgtest

s-cd-wh8ses-s-wh8w-6

s-cd-fcc2 s-s-hub-wh

fgitb-gk

Switches for Gp worker nodes

d0wn

gpwn

cdfwn

Page 16: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 16

FermiGrid Utilization

19-Apr-2010

Page 17: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 17

GUMS calls

19-Apr-2010

Page 18: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 18

VOMS-PROXY-INIT calls

19-Apr-2010

Page 19: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 19

Evolution• You don’t have to start with a massive project to

transition to a Grid infrastructure overnight.

• FermiGrid was commissioned over roughly a 18 month interval:– Ongoing discussions with stakeholders,– Establish initial set of central services based on these

discussions [VOMS, GUMS],– Work with each stakeholder to transition their

cluster(s) to use Grid infrastructure,– Periodically review the set of central services and add

additional services as necessary/appropriate [SAZ, MyProxy, Squid, etc.].

19-Apr-2010

Page 20: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 20

Other Considerations• You will likely want to tie your (?centrally managed?)

administration/staff/faculty/student computer account data into your Campus Grid resources.– FermiGrid has implemented automated population of the “fermilab”

virtual organization (VO) from our Central Name and Address Service (CNAS).

– We can help with the architecture of your equivalent service if you decide to implement such a VO.

• If you have centrally provided services to multiple independent clusters [eg. GUMS, SAZ], you will eventually need to implement some sort of high availability service configuration.– Don’t have to do this right off the bat, but it is useful to keep in mind

when designing and implementing services.– FermiGrid has implemented highly available Grid services & we are

willing to share our designs and configurations.

19-Apr-2010

Page 21: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 21

What About Cloud Computing?

• Cloud Computing can be integrated into a Campus Grid infrastructure.

19-Apr-2010

Page 22: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 22

Additional Resources• FermiGrid

– http://fermigrid.fnal.gov– http://cd-docdb.fnal.gov

• OSG Campus Grids Activity:– https://twiki.grid.iu.edu/bin/view/CampusGrids/WebHom

e

• OSG Campus Grids Workshop:– https://twiki.grid.iu.edu/bin/view/CampusGrids/Working

MeetingFermilab

• ISGTW Article on Campus Grids:– http://www.isgtw.org/?pid=100244719-Apr-2010

Page 23: An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick  Steve Timm.

Campus Grids 23

Conclusions• Campus Grids offer significant cost savings.

• Campus Grids do require a bit more infrastructure to establish and support.– This can be added incrementally.

• Many large higher education and research organizations have already deployed and are making effective use of Campus Grids.

• Campus Grids can be easily integrated into larger Grid organizations (such as the Open Science Grid or TeraGrid) to give your community access to larger or specialized resources.– Of course it’s nice if you are also willing to make your unused

resources available for opportunistic access.

19-Apr-2010