Site Recovery Manager Overview

download Site Recovery Manager Overview

of 71

description

SRM overview

Transcript of Site Recovery Manager Overview

  • STORAGE ARCHITECTURE/GETTING STARTED:SAN SCHOOL 101 Marc FarleyPresident of Building Storage, IncAuthor, Building Storage Networks, Inc.

  • Agenda Lesson 1: Basics of SANsLesson 2: The I/O pathLesson 3: Storage subsystemsLesson 4: RAID, volume management and virtualizationLesson 5: SAN network technologyLesson 6: File systems

  • Basics of storage networking Lesson #1

  • Connecting

  • Networking or bus technology Cables + connectorsSystem adapters + network device driversNetwork devices such as hubs, switches, routersVirtual networking Flow control Network securityConnecting

  • Storing

  • Device (target) command and controlDrives, subsystems, device emulationBlock storage address space manipulation (partition management)MirroringRAIDStripingVirtualizationConcatentationStoring

  • Filing

  • Namespace presents data to end users and applications as files and directories (folders)Manages use of storage address spacesMetadata for identifying datafile nameownerdates Filing

  • Connecting, storing and filing as a complete storage systemConnecting

  • NAS and SAN analysisNAS is filing over a networkSAN is storing over a networkNAS and SAN are independent technologiesThey can be implemented independentlyThey can co-exist in the same environmentThey can both operate and provide services to the same users/applications

  • Protocol analysis for NAS and SANNAS

    SAN

    NetworkFilingConnectingStoring

  • NAS Server + SAN Initiator NAS HeadConnectingStoringConnectingFilingIntegrated SAN/NAS environment

  • StoringConnectingFilingNAS HeadCommon wiring with NAS and SAN

  • The I/O path Lesson #2

  • Host hardware path componentsProcessorMemory BusSystem I/O BusStorage Adapter (HBA)Memory

  • Host software path componentsApplicationOperating SystemFiling SystemVolume ManagerDevice DriverMulti- PathingCache Manager

  • Network hardware path components CablingFiber opticCopperSwitches, hubs, routers, bridges, gatways Port buffers, processors Backplane, bus, crossbar, mesh, memory

  • RoutingFabric ServicesVirtual NetworkingAccess and SecurityFlow ControlNetwork software path components

  • Network PortsAccess and SecurityInternal Bus or NetworkCacheResourceManagerSubsystem path components

  • Disk drivesTape drivesSolid state devicesTape MediaDevice and media path components

  • The end to end I/O path pictureDisk drivesTape drivesNetwork SystemsAccess and SecurityInternal Bus or NetworkCacheCablingRoutingFabric Services Virtual NetworkingAccess and SecurityProcessorMemory BusSystem I/O BusStorage Adapter (HBA)Memory AppOperating SystemFiling SystemVolume ManagerDevice DriverMulti- PathingCache ManagerSubsystem Network PoirtResourceManagerFlow Control

  • Storage subsystems Lesson #3

  • Storage ResourcesGeneric storage subsystem modelNetwork PortsCache MemoryController (logic+processors) Access control Resource manager Internal Bus or NetworkPower

  • Redundancy for high availabilityMultiple hot swappable power suppliesHot swappable cooling fansData redundancy via RAIDMulti-path supportNetwork ports to storage resources

  • Physical and virtual storageSubsystem Controller Resource Manager(RAID, mirroring, etc.)Hot SpareDevice

  • SCSI communications are independent of connectivitySCSI initiators (HBAs) generate I/O activityThey communicate with targetsTargets have communications addressesTargets can have many storage resourcesEach resource is a single SCSI logical unit (LU) with a universal unique ID (UUID) - sometimes referred to as a serial number An LU can be represented by multiple logical unit numbers (LUNs)Provisioning associates LUNs with LUs & subsystem ports A storage resource is not a LUN, its an LUSCSI communications architectures determine SAN operations

  • Provisioning storagePort S4Port S3Port S2Port S1LUN 0LUN 1LUN 1LUN 2LUN 2LUN 3LUN 3LUN 0Controller functions

  • MP SW

    MultipathingLUN XLUN X

  • Read Caches

    1. Recently Used2. Read AheadWrite Caches

    1. Write Through (to disk)2. Write Back (from cache)Caching

  • Tape Subsystem Controller Tape SlotsRobotTape subsystemsTape DriveTape DriveTape DriveTape Drive

  • Exported Storage ResourceManagement stationbrowser-basednetwork mgmt softwareEthernet/TCP/IPOut-of-band management portStorage SubsystemIn-band management Now with SMISSubsystem management

  • Duplication Parity Difference2nn+1-1d(x) = f(x) f(x-1) f(x-1)Data redundancy

  • I/O PathMirroring OperatorI/O PathAI/O PathBTerminate I/O & regenerate new I/OsError recovery/notificationHost-basedWithin a subsystemDuplication redundancy with mirroring

  • HostUni-directional (writes only)ABAAADuplication redundancy with remote copy

  • Subsystem SnapshotHostABAAACPoint-in-time snapshot

  • Lesson #4RAID, volume managementand virtualization

  • Duplication Parity Difference2nn+1-1d(x) = f(x) f(x-1) f(x-1)RAID = parity redundancy

  • Late 1980s R&D project at UC Berkeley David PattersonGarth Gibson (independent)Redundant array of inexpensive disksStriping without redundancy was not defined (RAID 0) Original goals were to reduce the cost and increase the capacity of large disk storageHistory of RAID

  • Capacity scalingCombine multiple address spaces as a single virtual addressPerformance through parallelismSpread I/Os over multiple disk spindlesReliability/availability with redundancyDisk mirroring (striping to 2 disks)Parity RAID (striping to more than 2 disks)Benefits of RAID

  • RAIDController (resource manager)Storageextent 1Storageextent 2Storageextent 3Storageextent 4Storageextent 5Storageextent 6Storageextent 7Storageextent 8Storageextent 9Storageextent10Storageextent11Storageextent12Combined extents 1 - 12Capacity scaling

  • RAID controller (microsecond performance)Disk driveDisk driveDisk driveDisk driveDisk driveDisk driveDisk drives (Millisecond performance) from rotational latency and seek timePerformance

  • RAID arrays use XOR for calculating parity Operand 1Operand 2XOR Result False FalseFalse False TrueTrue True FalseTrue True TrueFalse XOR is the inverse of itselfApply XOR in the table above from right to leftApply XOR to any two columns to get the thirdParity redundancy

  • Reduced mode operationsWhen a member is missing, data that is accessed must be reconstructed with xor An array that is reconstructing data is said to be operating in reduced mode System performance during reduced mode operations can be significantly reducedXOR {M1&M2&M3&P}

  • RAID Parity RebuildThe process of recreating data on a replacement member is called a parity rebuildParity rebuilds are often scheduled for non-production hours because performance disruptions can be so severe XOR {M1&M2&M3&P}Parity rebuild

  • Hybrid RAID: 0+1Disk driveDisk driveDisk driveDisk drive12345Disk driveDisk driveDisk driveDisk driveDisk driveDisk driveMirrored pairs of striped membersRAID 0+1, 10RAID Controller

  • Volume management and virtualizationStoring level functionsProvide RAID-like functionality in host systems and SAN network systems Aggregation of storage resources for:scalability availabilitycost / efficiencymanageability

  • RAID & partition management Device driver layer between the kernel and storage I/O drivers

    OS kernelFile systemVolume ManagerHBA driversHBAsVolume ManagerVolume management

  • SAN diskresources

    Virtual Storage

    Server systemHBA driversSAN HBA SCSI BusSCSI HBA SCSI disk resourceSAN SwitchSAN cableVolume managers can use all available connections and resources and can span multiple SANs as well as SCSI and SAN resources

  • RAID and partition management in SAN systemsTwo architectures:In-band virtualization (synchronous)Out-of-band virtualization (asynchronous)SAN storage virtualization

  • DisksubsystemsExported virtual storageI/O PathSystem(s),switch orrouterIn-band virtualization

  • Distributed volume managementVirtualization agents are managed from a central system in the SAN

    Virtualization agentsDisksubsystemsOut-of-band virtualization

  • Lesson #5SAN networks

  • The first major SAN networking technologyVery low latencyHigh reliabilityFiber optic cablesCopper cables Extended distance1, 2 or 4 Gb transmission speedsStrongly typed Fibre channel

  • A Fibre Channel fabric presents a consistent interface and set of services across all switches in a networkHost and subsystems all 'see' the same resources Storage SubsystemStorage SubsystemStorage SubsystemFibre channel

  • FC ports are defined by their network roleN-ports: end node ports connecting to fabricsL-ports: end node ports connecting to loopsNL-ports: end node ports connecting to fabrics or loopsF-ports: switch ports connecting to N portsFL-ports: switch ports connecting to N ports or NL ports in a loopE-ports: switch ports connecting to other switch portsG ports: generic switch ports that can be F, FL or E portsFibre channel port definitions

  • Ethernet / TCP / IP SAN technologiesLeveraging the install base of Ethernet and TCP/IP networksiSCSI native SAN over IPFC/IP FC SAN extensions over IP

  • Native storage I/O over TCP/IPNew industry standardLocally over Gigabit EthernetRemotely over ATM, SONET, 10Gb EthernetiSCSITCPIPMACPHYiSCSI

  • Storage NICs (HBAs)SCSI driversCablesCopper and fiberNetwork systemsSwitches/routersFirewallsiSCSI equipment

  • FC/IPExtending FC SANs over TCP/IP networksFCIP gateways operate as virtual E-port connectionsFCIP creates a single fabric where all resources appear to be localFCIP GatewayFCIP GatewayTCP/IP LAN, MAN or WANE-portE-port

  • SAN switching & fabricsHigh-end SAN switches have latencies of 1 - 3 secTransaction processing requires lowest latencyMost other applications do not Transaction processing requires non-blocking switchesNo internal delays preventing data transfers

  • Switches8 48 portsRedundant power suppliesSingle system supervisorDirectors64+ portsHA redundancyDual system supervisorLive SW upgradesSwitches and directors

  • StarSimplestsingle hopDual starSimple network + redundancySingle hopIndependent or integrated fabric(s)SAN topologies

  • N-wide starScalableSingle hopIndependent or integrated fabric(s) Core - edgeScalable1 3 hopsintegrated fabricSAN topologies

  • RingScalableintegrated fabric 1 to N2 hops Ring + StarScalableintegrated fabric1 to 3 hopsSAN topologies

  • Lesson #6File systems

  • File system functions

  • StoringFiling

  • Think of the storage address space as a sequence of storage locations (a flat address space)

  • SuperblocksSuperblocks are known addresses used to find file system roots (and mount the file system)

    Sheet1

    SB

    SB

  • Filing and ScalingStoringStoringFilingFile systems must have a known and dependable address spaceThe fine print in scalability - How does the filing function know about the new storing address space?

    Sheet1

    12345

    678910

    1112131415

    1617181920

    2122232425

    Sheet1

    123456

    789101112

    131415161718

    192021222324

    252627282930

    313233343536

    373839404142

  • SCSI's role in storage networkingLegacy open systems server storage Physical parallel bus Independent master/slave protocolStoring in SANsCompatibility requirements with system software force the use of the SCSI protocol Storing and wiring in NASSCSI and ATA (IDE) used with NASLesson #2

  • Parallel SCSI bus technologies8-bit and 16-bit (narrow and wide)Single ended, differential, low voltage differential (LVD) electronics5MB, 10MB, 20MB, 40MB, 80MB, 160MB, 320MB Ultra SCSI 3 is 320 MB/secDistances vary from 3 to 25 metersCurrent LVD SCSI is 12 metersA bus, with address lines and data lines

  • SCSI command protocolMaster/slave relationshipshost = master, device = slaveIndependent of physical connectivityCDBs = Command Descriptor BlocksCommand formatUsed for both device operations and data xfersSerial SCSI standard created and implemented as:Fibre Channel Protocol (FCP)iSCSI

  • SCSI addressing modelLUN16 bus addresses with LUN sub-addressingHost systemTarget storage subsystem

  • SCSI daisy chain connectivityHost systemTarget devices or subsystemsIn / OutIn / OutIn / OutIn /Storage interfaceStorage interfaceStorage interfaceStorage interface

  • SCSI arbitrationHost system ID 7Target IDs 6 5 4 3 2 1 0The highest number address 'wins' arbitration to access the bus next

  • SCSI resource discoveryLUNSCSI inquiry CDB tell me your resourcesHost systemTarget storage subsystemThere is no domain server concept in SCSI

  • SCSI performance capabilitiesOverlapped I/O Tagged command queuing (Reshuffled I/Os)writereadstatus

  • Parallel SCSI bus shortcomingsBus lengthservers and storage tied togetherSingle initiator access to data depends on serverA standard full of variationschange is the only constant

  • Disk drivesDisk drive componentsAreal densityRotational latencySeek timeBuffer memoryDual portingLesson #4

  • Disk drivesComplex electro-mechanical devicesMediaMotor and speed control logicBearings and suspensionActuator (arm)Read/write headsRead/write channelsI/O controller (ext interface + int operations)Buffer memoryPower

  • Disk drive areal densityAmount of signal per unit area of mediaKeeps pace with Moore's lawAreal density doubles approximately every 18 monthsIncreasingly smaller magnetic particlesContinued refinement of head technologyElectro-magnetic physics research

  • Rotational latencyTime for data on media to rotate underneath headsfaster rotational speed = lower rotational latency2 to 10 milliseconds are commonApplication level I/O operations can generate multiple disk accesses, each impacted by rotational latencyMemorySAN switchDisk drivenanosecondsmicroseconds milliseconds 10 -9 10 -6 10 -3

  • Rotational latency & filing systemsFiling systems determine contiguous data lengths(file systems and databases)Block size definitions5122k4k16k512k2MDisk media

  • Seek TimeTime needed to position the actuator over the trackEquivalent to rotational latency in timeDisk mediaDisk headDisk actuator

  • Disk drive buffer memoryFIFO memory for data transfersnot cacheOvercome mechanical latencies with faster memory storageEnables overlapped I/Os to multiple drivesPerformance metricsBurst transfer rate = transfer in/out buffer memory(Sustained transfer rate = transfer with track changes)

  • Dual-ported disk drives Redundant connectivity interfacesOnly FC to dateController AController B

  • Forms of data redundancy Duplication Parity Difference2nn+1-1d(x) = f(x) f(x-1) f(x-1)

  • Business Continuity24 x 7 data access is the goal5 nines through planning and luckThere are many potential threatsPeoplePowerNatural disastersFires Redundancy is the keyMultiple techniques cover different threats

  • Backup and recoveryRemovable media, usually taperemovable redundancy Backup systemsBackup operations Media rotationBackup metadataBackup challengesLesson #8

  • Forms of data redundancy in backup Duplication Parity Difference2nn+1-1d(x) = f(x) f(x-1) f(x-1)

  • Magnetic 'ribbon'multiple layers of backing, adhesive, magnetic particles and lube/coatingcorrodes and cracksrequires near-perfect conditions for long-term storageSequential accessslow load and seek timesreasonable transfer ratescan hold multiple versions of filesBackup and recovery tape media

  • Two primary geometriesLongitudinal trackingHelical trackingHighly differentiatedSpeeds (3MB/s to 30MB/s)Capacities (20MB to 160MB)Physical formats (layouts)Compatibility is a constant issueMostly parallel SCSI Tape drives

  • Two primary geometriesLongitudinal trackingHelical trackingHighly differentiatedSpeeds (3MB/s to 30MB/s)Capacities (20MB to 160MB)Physical formats (layouts)4mm, 8mm, inch, DLT, LTO, 19mmCartridge construction, tape lengthsCompatibility is a constant issueMostly parallel SCSI Tape drive formats

  • Parallel data tracks written lengthwise on tape by a 'stack' of headsLongitudinal trackingTape headsData tracksTechnologies: DLT, SDLT, LTO, QIC

  • Single data tracks written diagonally across tape by a rotating cylindrical head assemblyHelical trackingTape headData tracksTapeTechnologies: 4mm, 8mm, 19mm

  • Tape libraries & autoloadersTape subsystemsRobotTape Subsystem Controller TapedriveTapedriveTapedriveTapes

  • Tape subsystemsI/O bus/network subsystemWork scheduler & managerData moverMetadata (database or catalog)Media manager (rotation scheduler)File system and database backup agentsGeneric backup system components

  • Generic Network Backup SystemFile server Web server DB server APP server Backup server SCSI busBackup agentBackup agentBackup agentBackup agentWork schedulerData moverMetadata systemMedia managerTape drive(s) orTape subsystemEthernet network

  • Full (all data)Longest backup operationsUsually done over/on weekendsEasiest recovery with 1 tape setIncremental (changed data)Shortest backup operationOften done on days of the weekMost involved recovery Differential (accumulated changed data)Compromise for easier backups and recoveryMax 2 tape set restoreBackup operations

  • Full Duplication redundancyOne backup for complete redundancyIncremental Difference redundancyMultiple backups for complete redundancyDifferential Difference redundancyTwo backups for complete redundancyBackup operations and data redundancy

  • Change of tapes with common names and purposesTape sets - not individual tapesBackup job schedules anticipate certain tapesMonday, Tuesday, Wednesday, etc..Even days, odd days1st Friday, 2nd Friday, etc..January, February, March, etc...1st Qtr, 2nd Qtr, etc....Media rotations

  • What happens when wrong tapes are used by mistake?Say you use the last Friday's tape on the next Tuesday? Data you might need to restore sometime can be overwritten!Backup system logic may have to choose between:Not completing backup (restore will fail)Deleting older backup files (restore might fail)Media rotation problems

  • A database for locating data on tape:Version: create/modify date & sizeDate/time of backup jobTape names & backup job ID on tapeOwner Delete records (don't restore deleted data!)Transaction processing during backupMany small files creates heavy processor loadsThis is where backup fails to scaleBackup databases need to be prunedPerformance and capacity problemsBackup metadata

  • Completing backups within the backup windowBackup window = time allotted for daily backups Starts after daily processing finishesEnds before next day's processing beginsMedia management and administrationThousands of tapes to manageAudit requirements are increasingOn/offsite movement for disaster protectionBalancing backup time against restore complexityTraditional backup challenges

  • LAN-free backup in SANsFile server Web server DB server APP server Backup softwareEthernet client networkBackup softwareBackup softwareBackup softwareSAN switchTape drives or tape subsystemSANLAN

  • Consolidated resources (especially media)Centralized administrationPerformanceOffloads LAN traffic Platform optimization Advantages of LAN-free backupSAN

  • Path managementDual pathingZoningLUN maskingReserve / releaseRoutingVirtual networking

  • System software for redundant pathsPath management is a super-driver processRedirects I/O traffic over a different path to the same storage resourceTypically invoked after SCSI timeout errorsActive / active or active / passiveStatic load balancing onlyDual pathing

  • I/O segregation Switch function that restricts forwardingZone membership is based on port or addressZoning 1Port zoningZone 1Addr 1Addr 2Zone 2Addr 3Addr 4Zone 3Addr 5Addr 6Address zoning

  • Address zoning allows nodes to belong to more than one zoneFor example, tape subsystems can belong to all zonesZoning 2Zone 1Addr 1 (server A)Addr 2 (disk subsystem port target address A)Addr 7 (tape subsystem port target address A)Zone 2Addr 3 (server B)Addr 4 (disk subsystem port target address B)Addr 7 (tape subsystem port target address A)Zone 3Addr 5 (server C)Addr 6 (disk subsystem port target address C)Addr 7 (tape subsystem port target address A) Addr1 Addr 2 Addr 7 Addr 3 Addr 4 Addr 7 Addr 5 Addr 6 Addr 7#1#2#3

  • Zones (or zone memberships) can be 'swapped' to reflect different operating environmentsZoning 3Changing zones

  • Restricts subsystem access to defined servers Target or LUN level masking Non-response to SCSI inquiry CDB Can be used with zoning for multi-level controlLUN maskingNo response to SCSI inquiry

  • SCSI function Typically implemented in SCSI/SAN storage routers Used to reserve tape resources during backupstape drivesroboticsReserve / Release1st access2nd access blockedReservedStorage router

  • Path decisions made by switchesLarge TCP/IP networks require routing in switches instead of in end nodesLooping is avoided by spanning tree algorithms that ensure a single pathOSPF is spanning tree technology for Fibre ChannelRouting is not HA failover technologyRouting

  • Name Space The Name Space is the representation of data to end users and applicationsIdentification and searchingOrganizational structureDirectories or folders in file systemsRows and columns in databasesAssociations of data Database indexingFile system linking

  • Metadata and Access Control (Security)Metadata is the description of dataIntrinsic information and accounting informationAccess control determines how (or if) a user or application can use the datafor example, read-onlyAccess control is often incorporated with metadata but can be a separate function

  • Data has attributes that describe itStorage is managed based on data attributesActivity infoOwner infoCapacity infoWhatever infoData can have security associated with it.Data can be erased, copied, renamed, etc.

  • LockingManaging multiple users or applications with concurrent access to dataLocking has been done in multi-user systems for decadesLocking in NAS has been a central issueNFS advisory locks provide no guaranteesCIFS oplocks are enforcedLock persistence

  • Blocks are SCSIs address abstraction layer Filing functions use block addresses to communicate with storing level entities Filing systems manage the utilization of block address spaces (space management)Block address structures typically are uniformBlock address boundaries are static for efficient and error-free space managementFile systems organize data in blocks

  • JournalingFile system structure has to be verified when mounting (FSCHECK)FSCheck can take hours on large file systemsJournaling file systems keep a log of file system updatesLike a database log file, journal updates can be checked against actual structuresIncomplete updates can be rolled forward or backward to maintain system integrity

  • V/VM and Filing Filing is a filing functionVirtualization & volume management (V/VM) is a storing functionV/VM manipulates block addresses and creates real and virtual address spaces Filing manages the placement of data in the address spaces exported by virtualization