Rule-Based Data Management Systems
description
Transcript of Rule-Based Data Management Systems
![Page 1: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/1.jpg)
Rule-Based Data Management Rule-Based Data Management SystemsSystems
Reagan W. MooreReagan W. Moore
Wayne SchroederWayne Schroeder
Mike WanMike Wan
Arcot RajasekarArcot Rajasekar
{moore, schroede, mwan, sekar}@sdsc.edu
http://www.sdsc.edu/srb
http://irods.sdsc.edu/http://irods.sdsc.edu/
![Page 2: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/2.jpg)
TopicsTopics
• Managing distributed shared collections• Data grids
• Control of name spaces - SRB• Production system• Data and trust virtualization• Infrastructure independence
• Control of management policies - iRODS• Next generation technology• Management virtualization• Rules controlling remote operations• Constraints on the rules and remote operations
![Page 3: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/3.jpg)
Data Management ApplicationsData Management Applications
• Data grids • Share data
• Digital libraries • Publish data
• Persistent archives • Preserve data
• Real-time sensor streams • Data federation
• Data analysis• Automate access to distributed data
![Page 4: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/4.jpg)
ConceptsConcepts
• Distributed Data Management Concepts• Data virtualization
• Manage the properties of a shared collection independently of the storage systems
• Trust virtualization• Administrative domain independence
• Federation• Managing interactions between data grids
• Rule-based Data Management• Policy virtualization
• Automating execution of management policies• Applying management policies to remote operations
![Page 5: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/5.jpg)
Data GridData Grid
Using a Data Grid – Using a Data Grid – in Abstractin Abstract
Ask for d
ata
•User asks for data from the data grid
Data d
elivere
d
•The data is found and returned•Where & how details are hidden
![Page 6: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/6.jpg)
Using a Data Grid - Using a Data Grid - DetailsDetails
Storage Resource Broker Server
•Data request goes to SRB Server
Storage Resource Broker Server
Metadata Catalog
DB
•Server looks up information in catalog
•Catalog tells which SRB server has data
•1st server asks 2nd for data
•The data is found and returned
•User asks for data
![Page 7: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/7.jpg)
Data VirtualizationData Virtualization
• Manage properties of each digital entity independently of the remote storage systems• Infrastructure independence
• Properties of the shared collection• Name spaces• Persistent state information (location, size,…)
• Manage standard operations• Map from client requests to standard operations• Map from standard operations to remote storage system
protocol
![Page 8: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/8.jpg)
Data VirtualizationData Virtualization
Storage Repository
• Storage location
• User name
• File name
• File context (creation date,…)
• Access controls
Data Grid
• Logical resource name space
• Logical user name space
• Logical file name space
• Logical context (metadata)
• Access constraints
Data Collection
Data Access Methods (C library, Unix, Web Browser)
Data is organized as a shared collection
![Page 9: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/9.jpg)
Data VirtualizationData Virtualization
Storage SystemStorage System
Storage ProtocolStorage Protocol
Access InterfaceAccess Interface
Standard Access ActionsStandard Access Actions
Data GridData Grid
Map from the Map from the
actions requested byactions requested by
the access methodthe access method
to a standard set ofto a standard set of
micro-services used micro-services used
to interact with theto interact with the
storage systemstorage system
Standard Micro-servicesStandard Micro-services
![Page 10: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/10.jpg)
Standard OperationsStandard Operations
• File manipulation• Posix I/O calls - open, close, read, write, seek, …• Register, replicate, checksum, synchronize
• Bulk operations• Bulk data transport, metadata load• Parallel I/O streams
• Remote procedures• Data filtering, subsetting, metadata extraction• Remote library execution (HDFv5, DataCutter)
![Page 11: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/11.jpg)
BaBar High-Energy PhysicsBaBar High-Energy Physics
• Stanford Linear Accelerator
• IN2P3• Lyon, France• Rome, Italy• San Diego• RAL, UK
• A functioning international Data Grid for high-energy physics
Manchester-SDSC mirror
Moved over 300 TBs of dataMoved over 300 TBs of data
Increasing to 5 TBs per dayIncreasing to 5 TBs per day
![Page 12: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/12.jpg)
Next Generation TechnologyNext Generation Technology
• Every fault that occurs in the distributed environment is the responsibility of the data grid• Network outage / system crash / operator error• Minimize risk through checksums, replicas,
synchronization, federation
• Management of large collections is labor intensive• Initiation of recovery operations after remote system
failure
• Need to automate execution of management policies
![Page 13: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/13.jpg)
Controlling Remote OperationsControlling Remote Operations
Data ManagementEnvironment
ConservedProperties
ControlMechanisms
RemoteOperations
ManagementFunctions
AssessmentCriteria
ManagementPolicies
Capabilities
Data ManagementInfrastructure
PersistentState
Rules Micro-services
PhysicalInfrastructure
Database Rule Engine StorageSystem
iRODS - integrated Rule-Oriented Data SystemiRODS - integrated Rule-Oriented Data System
Support unique organizational / social Support unique organizational / social
management policies for each collectionmanagement policies for each collection
![Page 14: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/14.jpg)
Rule-based Data ManagementRule-based Data Management
• Express assessment criteria through sets of required persistent state information
• Express management policies as sets of rules controlling the execution of micro-services
• Express capabilities as sets of micro-services• Manage persistent state information resulting from
the application of rules controlling execution of remote micro-services
![Page 15: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/15.jpg)
Management VirtualizationManagement Virtualization
• Examples of management policies• Integrity
• Validation of checksums• Synchronization of replicas• Data distribution• Data retention• Access controls
• Authenticity• Chain of custody - audit trails• Track required preservation metadata - templates• Generation of Archival Information Packages
![Page 16: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/16.jpg)
Rule-based Data ManagementRule-based Data Management
• Rules required for standard operations• Posix I/O control• Standard SRB operations
• Administrator controlled rules to implement management policies• Administrative - adding / deleting users, resources• Data ingestion - pre-processing, post-processing• Data transport / deletion - parallel I/O streams, disposition
• User-defined rules, create your own server-side workflow• Rule set for a particular collection, particular user group,
particular storage system, particular micro-service
![Page 17: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/17.jpg)
iRODS RuleiRODS Rule
• Each rule defines • Event• Condition• Action sets (micro-services and rules)• Recovery sets
• Rule types• Atomic, applied immediately• Deferred, support deferred consistent constraints• Periodic, typically used to validate assertions
![Page 18: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/18.jpg)
Rule-based AccessRule-based Access
• Associate security policies with each digital entity• Redaction, access controls on structures within a file• Time-dependent access controls (how long to hold
data proprietary)
• Associate access controls with each rule• Restrict ability to modify, apply rules
• Associate access controls with each micro-service• Explicit control of operation execution within a given
collection• Much finer control than provided by Unix r:w:e
![Page 19: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/19.jpg)
Federation Between Data GridsFederation Between Data Grids
Data Grid
• Logical resource name space
• Logical user name space
• Logical file name space
• Logical rule name space
• Logical micro-service name
• Logical persistent state
Data Collection B
Data Access Methods (Web Browser, DSpace, OAI-PMH)
Data Grid
• Logical resource name space
• Logical user name space
• Logical file name space
• Logical rule name space
• Logical micro-service name
• Logical persistent state
Data Collection A
![Page 20: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/20.jpg)
Rule-based FederationRule-based Federation
• When registering a digital entity into another data grid, register required management rules along with the digital entity• Move management policies with data
• Expectation that each operation on each digital entity can be controlled across federated data grids• Example is end-to-end encryption
![Page 21: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/21.jpg)
Evolution of Rule-based SystemsEvolution of Rule-based Systems
• Logical name spaces enable dynamic addition of new rules, micro-services, and state information• Apply new rules on one collection while applying old
rule sets on a legacy collection• Can run old and new rule sets in parallel
• Can build a system that manages its evolution• Can create rules that track the evolution of the rule-
based system• Can create rules that govern migration to new rule
sets
![Page 22: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/22.jpg)
Assessment RulesAssessment Rules
• Can build a system that monitors its own state information• Parse audit trails to verify accesses by
authorized persons• Parse persistent state information for compliance
with management rules• Test micro-services for compliance with rules• Audit all accesses to a collection • Compare system properties to desired outcomes
![Page 23: Rule-Based Data Management Systems](https://reader036.fdocuments.in/reader036/viewer/2022062804/5681492e550346895db66ac3/html5/thumbnails/23.jpg)
For More InformationFor More Information
Reagan W. Moore
San Diego Supercomputer Center
SRB: http://www.sdsc.edu/srb/
iRODS: http://irods.sdsc.edu/