Dongju Choi, Glenn Lockwood, Robert Sinkovits, Mahidhar Tatineni San Diego Supercomputer Center
San Diego Supercomputer Center iRODS DGMS Towards Data Grid Standard Implementations Arun...
-
Upload
daniel-morris -
Category
Documents
-
view
215 -
download
0
Transcript of San Diego Supercomputer Center iRODS DGMS Towards Data Grid Standard Implementations Arun...
San Diego Supercomputer CenterSan Diego Supercomputer Centerwww.irods.org iRODS DGMS
Towards Data Grid Standard Implementations
Arun Jagatheesan
San Diego Supercomputer Center
Open Grid Forum 19 Jan 31, 2007 – session II
San Diego Supercomputer Centerwww.irods.org IROS DGMS 2
Outline
• Community Introduction : OGF-GFS• User perspective• Developer/Vendor Perspective• Need for standard community implementation• Community implementation process• GFS-WG community architecture sketch• Follow-up actions
San Diego Supercomputer Centerwww.irods.org IROS DGMS 3
Motivation
• Global namespace for unstructured data storage • Collaboration amongst multiple partners / teams• Long-term management of unstructured data
• Files, collection-based digital entities
San Diego Supercomputer Centerwww.irods.org IROS DGMS 4
NIH BIRN Data Grid
San Diego Supercomputer Centerwww.irods.org IROS DGMS 5
World Wide Datagrid
San Diego Supercomputer Centerwww.irods.org IROS DGMS 6
Used or Required by
• Large scale academic projects• Federal agencies (NARA, LoC, …)• Fortune 500, Forbes Global 2000, ….
San Diego Supercomputer Centerwww.irods.org IROS DGMS 7
DGMS Concept-wise
• Large-scale logical file system + File System+ Database System+ Grid Computing
= Data Grid Management System (DGMS)
• Core Concepts• Logical shared collections • Logical shared resources• Collaborative communities
San Diego Supercomputer Centerwww.irods.org IROS DGMS 8
Problem solved / Requirements –1
• Collaborative logical namespace• Global collaborations of multiple teams• Collaborations of multiple organizations • Avoid multiple mount points as they restrict scalability of
the collaboration• Coordinated data sharing at any granular level (data,
metadata, annotations,…)
San Diego Supercomputer Centerwww.irods.org IROS DGMS 9
Problem solved / Requirements –2
• Data Distribution• Multi-site replicas reduce access times• Replicas have the same logical name everywhere in the
enterprise (big plus for users)• Concept of replica, copy, cache• Replicas controlled by user, admin, system-enabled
(automated or policy based)• Reduce WAN latency (chattiness)
San Diego Supercomputer Centerwww.irods.org IROS DGMS 10
Problem solved / Requirements –3
• Data Classification and Discovery• Major advantage for Global 2000 companies• Tag data with any arbitrary metadata schema• Each team can organize its data based on user-defined
attributes• Multiple teams can have different metadata attributes on
the same data• Query, discover and access data without knowing path or
protocol to be used
San Diego Supercomputer Centerwww.irods.org IROS DGMS 11
User Perspective
• Designed for Off the shelf • don’t want to assemble (or DIY) • But able to customize the solution
• One point of contact or responsibility• If it does not work I have one mailing list or number to call
San Diego Supercomputer Centerwww.irods.org IROS DGMS 12
Vendor/developer perspective
• “OGF-GFS compatible” • OGF-GFS Data Grid Applications• OGF-GFS Data Grid Appliance
• Ease of standard evolution• Avoid unnecessary dependencies on multiple interfaces
for operations that are the same granular level
• Ability to collaborate, learn and compete• An end-to-end solution with common interface• Additional capabilities that add value to the solution
San Diego Supercomputer Centerwww.irods.org IROS DGMS 13
Lessons Learnt
• Software v/s Specification• Software implementation to engage and collaborate as we
define standards (unless every wants to invest on software development from the start)
• Make both the user and vendor/developer happy• Have users happy to be confident to share requirements
and demand for the standards from vendors/developers• Vendors/developers know it’s a real thing that can be
implemented around their existing products or software
San Diego Supercomputer Centerwww.irods.org IROS DGMS 14
The scope (from GFS Architecture)
• A single interface• Protocols
• A hybrid of XML and byte-level protocol• XML – command channel of operations• Byte-level – data movement
• Possible Functionalities • File namespace and file operations (read, write, …• Meta-data operations (user-defined metadata, search)• Data Grid Language for policy, rules etc.,
San Diego Supercomputer Centerwww.irods.org IROS DGMS 15
What could be the right high level picture?
DGMS
XML-command protocol
XML-command protocol
Byte-level data protocol
Byte-level data protocolObject-transfer
Facilitate SOA
San Diego Supercomputer Centerwww.irods.org IROS DGMS 16
What could be the right high level picture?
DGMSserver
XML-command protocol
XML-command protocol
Byte-level data protocol
Byte-level data protocol
DGMSserver
DGMSserver
San Diego Supercomputer Centerwww.irods.org IROS DGMS 17
User perspective
Logical Resources
Multiple Replicas
Users from different
organizations
User defined meta data for
data discovery
Secret Recipe
San Diego Supercomputer Centerwww.irods.org IROS DGMS 18
So what will we be doing (products?)
• Definition• Concept ( data grid namespace, resource-namespace…)• Initial functionalities (DGMS operations to be targeted)• Namespace (Files, Metadata, Resource, Policy rules)
• XML protocol • XML-handshake and message transfer between DGMS-
client and DGMS-server
• Most importantly…• Software as a common framework for the evolution,
adoption and growth of the standard and DGMS concepts
San Diego Supercomputer Centerwww.irods.org IROS DGMS 19
So how will we do it? (process)
• Community-based open design (OPEN FORUM)• Design discussions as a community• Code through multiple parties to make sure we keep the
vendor/developer community and user community engaged
• Community-based open standard (OPEN STDS)• Specs written using wiki and other mechanisms• Community based spec for OGF• Interoperability workshops and Workshops along with
other relevant agencies like SNIA or DMTF
San Diego Supercomputer Centerwww.irods.org IROS DGMS 20
How can you get started?• Initial requirements
• Can you delete email? (sign up for our mailing list)• Got Bandwidth and browser? (Visit our group page)• Can you scream or shout or smile ( join our WG sessions)
• Are you a user or consumer or researcher?• Tell us what is needed?• What should be there for you to put this open source
software/standard in production
• Are you a vendor/developer?• Have your engineer or developer talk to us (we will convert him to a
DGMS developer or DGMS Guru)• We are developing a open standard – take advantage of it and
develop a value added solution around it
San Diego Supercomputer Centerwww.irods.org IROS DGMS 21
When do we get started?
• Right now (Hmmm.. We did long time back)• Conference calls every other week
• Mostly Wednesdays• Attend through phone call, Skype or Polycom Video
conference (any thing you like)• Discussions influencing, design requirements
• Face to face meeting• Once every quarter (planned), OGF sessions
San Diego Supercomputer Centerwww.irods.org IROS DGMS 22
Suggestions, comments, critics
• TO DO• Standard operations based on policies/rules• Take advantage of OGF standards as possible• Other commercial or magic tools could be used below the
standard
• NOT TO DO
San Diego Supercomputer Centerwww.irods.org IROS DGMS 23
Conclusions
• Data Grids• Data Grid Management systems (DGMS)• Very good user need in academic and non-academics• Need for standards framed by Grid File System WG
• Software-included Spec Strategy