MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.
-
Upload
merryl-atkins -
Category
Documents
-
view
222 -
download
2
Transcript of MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.
![Page 1: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/1.jpg)
MS CLOUD DB - AZURE SQL DBFault Tolerance
bySubha VasudevanChristina Burnett
![Page 2: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/2.jpg)
Windows AZURE Cloud Services
![Page 3: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/3.jpg)
AZURE Storage Services
● Blob● Table● Queue● File Storage
![Page 4: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/4.jpg)
Azure SQL Database
Database as a Service● Predictable performance● Scalability● Business continuity● Data protection● Zero administration
![Page 5: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/5.jpg)
Azure DB
![Page 6: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/6.jpg)
Fault Tolerance and Failure
Why is it so important?● Supports
concurrency control● Provides
transactional guarantee
● ACID
Why does it fail?● Inevitable
software/hardware failure
● Human errors
![Page 7: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/7.jpg)
Fault Tolerant SQL Database
● Redundant computers rather than redundant components.
● Fault tolerance at the highest level of the stack - Fault tolerant DB rather than fault tolerant DB servers.
● Database replication across fault zones.
● Failure Detection and Failover.
![Page 8: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/8.jpg)
Fault Zones/Domains
Each fault zone is a fully independent physical sub-system with its own server racks and network routers.
![Page 9: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/9.jpg)
Assigning Storage to a Fault Domain
Proximity vs. Isolation● Proximity of replicas affects network latency● Isolation helps ensure availability of replicas in
the event of a failure
Selection of replica location ● MDS codes● (N, K) coding
(Banerjee, Das, Mazumder, Derakhshandeh, & Sen, 2014)
![Page 10: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/10.jpg)
Database Replication
There are 3 copies of each DB, a primary and two secondary replicas.The primary database performs the transactions, and sends the updates and DDL to the replicas.
![Page 11: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/11.jpg)
Database Replication
Each replica is stored in a different fault zone.
![Page 12: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/12.jpg)
Quorum-Based Commit
● At least two copies required.
● Data must be written to the primary and at least one secondary before it is considered committed.
![Page 13: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/13.jpg)
PRIMARY FAILSWhen the server containing the primary database fails, one of the secondary replicas is promoted to primary.
Dynamic Quorum
![Page 14: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/14.jpg)
SECONDARY FAILSWhen a server fails that contains secondary replicas, new replicas are created.
Dynamic Quorum
![Page 15: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/15.jpg)
Transactional Consistency
● Updates are persisted in log
● Primary DB streams updates to secondaries
● Secondaries are asked to commit first
● Secondaries return acknowledgement
● Primary commits after quorum
![Page 16: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/16.jpg)
Recovering Transactions
If secondary fails, on restart it checks with primary for transactions it may have missed.
![Page 17: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/17.jpg)
Failure Detection● The database is paired with
the SQL Engine to detect failures in the neighborhood.
● Distributed failure detection - every node monitored by several neighbors.
● Efficient, localized and fast.● Prevents ping storms and
avoids delayed failure detection
![Page 18: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/18.jpg)
Failover● If primary node fails unexpectedly,
standby backup node automatically assumes role of primary.
● Managed by GPM(Global Partition Manager).
● Distributed fabric maintains a global map
● GPM maintains the health, state and location of every DB.
● Fabric informs GPM of any node failure.● GPM reconfigures assignment of
primary and secondary DBs in failed node.
Gateway Processes
Client
psss
ssps
sssp
![Page 19: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/19.jpg)
Fault Tolerance in Application Design
Data Failure● application specific● catastrophic consequences● not addressed by Azure
Computational Failure● addressed by Azure
● controlled by application
Monitoring and Logging● diagnosis
● debugging(Jie Li et al., 2010)
![Page 20: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/20.jpg)
ReferencesFault-tolerance in Windows Azure SQL Database. [Online]. Available: http://azure.microsoft.com/blog/2012/07/30/fault-tolerance-in-windows-azure-sql-database/
Banerjee, S., Das, A., Mazumder, A., Derakhshandeh, Z., & Sen, A. (2014). On the impact of coding parameters on storage requirement of region-based fault tolerant distributed file system design. Paper presented at the Computing, Networking and Communications (ICNC), 2014 International Conference On, 78-82. doi:10.1109/ICCNC.2014.6785309
Jie Li, Humphrey, M., You-Wei Cheah, Youngryel Ryu, Agarwal, D., Jackson, K., & van Ingen, C. (2010). Fault tolerance and scaling in e-science cloud applications: Observations from the continuing development of MODIS Azure. Paper presented at the E-Science (E-Science), 2010 IEEE Sixth International Conference On, 246-253. doi:10.1109/eScience.2010.47
Rajan, D., Canino, A., Izaguirre, J. A., & Thain, D. (2011). Converting a high performance application to an elastic cloud application. Paper presented at the Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference On, 383-390. doi:10.1109/CloudCom.2011.58
![Page 21: MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d6d5503460f94a4df1f/html5/thumbnails/21.jpg)
QUESTIONS?