Post on 17-Dec-2015
The Sisyphus Database The Sisyphus Database Retrieval Software Performance Retrieval Software Performance
AntipatternAntipattern
Robert F. Dugan Jr.Robert F. Dugan Jr.
Dept. of Computer ScienceDept. of Computer Science
Stonehill CollegeStonehill College
Easton, MA 02357 USAEaston, MA 02357 USA
bdugan@stonehill.edubdugan@stonehill.edu
Ali ShokoufandehAli Shokoufandeh
Dept. of Math/Computer ScienceDept. of Math/Computer Science
Drexel UniversityDrexel University
Philadelphia, PA 19104 USAPhiladelphia, PA 19104 USA
ashokouf@mcs.drexel.eduashokouf@mcs.drexel.edu
Ephraim P. GlinertEphraim P. Glinert
Dept. of Computer ScienceDept. of Computer Science
Rensselaer Polytechnic Inst.Rensselaer Polytechnic Inst.
Troy, NY 12180 USATroy, NY 12180 USA
glinert@cs.rpi.eduglinert@cs.rpi.edu
Third International Workshop on Software and Performance Third International Workshop on Software and Performance July 24-26, 2002July 24-26, 2002
Rome, ItalyRome, Italy
OverviewOverview
Software Performance AntipatternsSoftware Performance Antipatterns
Sisyphus Database Retrieval Sisyphus Database Retrieval AntipatternAntipattern
SolutionsSolutions
ExperimentsExperiments
Real World ChallengesReal World Challenges
Future WorkFuture Work
Software Performance Software Performance AntipatternsAntipatterns
Software Design Patterns: Software Design Patterns: Effective solution to a common software Effective solution to a common software design problemdesign problemsingleton, proxy, iterator, observer/listener singleton, proxy, iterator, observer/listener [Gamma et al. 1995][Gamma et al. 1995]
Software Design Antipatterns: Software Design Antipatterns: ““A commonly occurring solution to a A commonly occurring solution to a problem that generates decidedly negative problem that generates decidedly negative consequences.” [Brown et al. 1998]consequences.” [Brown et al. 1998]““god” class, dead code, class proliferationgod” class, dead code, class proliferation
Software Performance Software Performance AntipatternsAntipatterns
““Software Performance Antipatterns”, Software Performance Antipatterns”, Smith and Williams, WOSP 2000Smith and Williams, WOSP 2000
““God” ClassGod” Class
Circuitous Treasure HuntCircuitous Treasure Hunt
Excessive Dynamic AllocationExcessive Dynamic Allocation
One Lane BridgeOne Lane Bridge
A commonly occurring solution to a A commonly occurring solution to a software design problem that generates software design problem that generates decidedly negative decidedly negative performanceperformance consequencesconsequences
Sisyphus Database Retrieval Sisyphus Database Retrieval AntipatternAntipattern
1)1) Issue request to display list Issue request to display list subsetsubset
2)2) Issue database query to retrieve Issue database query to retrieve entire listentire list
3)3) Return query resultsReturn query results4)4) Determine number of items Determine number of items
displayeddisplayed5)5) Iterate through result set Iterate through result set
discarding all items until discarding all items until first first itemitem to display is reached to display is reached
6)6) Continue through result set Continue through result set rendering items for display until rendering items for display until last itemlast item to display is reached to display is reached
7)7) Discard remaining result setDiscard remaining result set8)8) Display subsetDisplay subset
examples: email, address book, search results
Sisyphus Database Retrieval Sisyphus Database Retrieval AntipatternAntipattern
Key to this antipattern is Key to this antipattern is the processing the processing necessary to retrieve the necessary to retrieve the entire list from which a entire list from which a subset is extracted must subset is extracted must be repeated.be repeated.Recalls Greek myth of Recalls Greek myth of Sisyphus damned for all Sisyphus damned for all eternity to push a stone eternity to push a stone up a hill only to watch it up a hill only to watch it roll back down again.roll back down again.
Sisyphus by Franz von Stock
Sisyphus Database Retrieval Sisyphus Database Retrieval AntipatternAntipattern
Web Ser verBrowser Database Ser ver
view list subset
process http reques t
determine list subset to return to browser
query to retrieve entire list
process query
return tupl e
discard tupl e until first subset item
render html
html response
discard result set
loop
return tupl e
render tuple until las t subset item
loop
Web Ser verBrowser Database Ser ver
view list subset
process http reques t
determine list subset to return to browser
query to retrieve entire list
process query
return tupl e
discard tupl e until first subset item
render html
html response
discard result set
loop
return tupl e
render tuple until las t subset item
loop
Three tier system Three tier system selected.selected.SPE techniques used to SPE techniques used to model and analyze model and analyze antipattern.antipattern.ResourceResource AnalysisAnalysis
DiskDisk Database Server:Database Server: Number of disk I/O operations Number of disk I/O operations rises linearly with the size of the total list. I/O rises linearly with the size of the total list. I/O reduction possible with database caching, but reduction possible with database caching, but memory resource contention as system scales to memory resource contention as system scales to more usersmore users
CPUCPU BrowserBrowser: linear dependence on list subset: linear dependence on list subset
WebWeb ServerServer: linear dependence on start position of : linear dependence on start position of subset within result set; linear dependence on list subset within result set; linear dependence on list subsetsubset
Database ServerDatabase Server: log linearly with the size of the : log linearly with the size of the total list; linear dependence on start position of total list; linear dependence on start position of subset within result set; linear dependence on list subset within result set; linear dependence on list subsetsubset
NetworkNetwork Browser-Web ServerBrowser-Web Server: linear dependence on list : linear dependence on list subsetsubset
Web-Database ServerWeb-Database Server: linear dependence on start : linear dependence on start position of subset within result setposition of subset within result set
Solutions: Index and Solutions: Index and RownumRownummulti-attribute index and rownummulti-attribute index and rownum
select lname, fname, phone, address select lname, fname, phone, address from contacts from contacts where userid=45 and rownum <= 50where userid=45 and rownum <= 50
Advantages:Advantages:processing beyond subset eliminatedprocessing beyond subset eliminatedsorting result set eliminatedsorting result set eliminated
DisadvantagesDisadvantageslinear dependence on subset start positionlinear dependence on subset start positionmulti-attribute index prevents dynamic sortingmulti-attribute index prevents dynamic sortingno total list sizeno total list size
Solutions: Upper/Lower Solutions: Upper/Lower BoundBoundmulti-attribute index, lower bound attribute multi-attribute index, lower bound attribute value, rownumvalue, rownum
select lname, fname, phone, address select lname, fname, phone, address from contacts from contacts where userid=45 and rownum <= SUBSETSIZE where userid=45 and rownum <= SUBSETSIZE and lname > ENDSUBSETLASTNAMEand lname > ENDSUBSETLASTNAME
Linear dependence on list subset sizeLinear dependence on list subset sizeDisadvantages:Disadvantages:
lower bound attribute must be uniquelower bound attribute must be uniquemulti-attribute index prevents dynamic sortingmulti-attribute index prevents dynamic sortingno total list sizeno total list size
Solutions: Sequence Solutions: Sequence NumbersNumbersEach list element assigned unique sequence numberEach list element assigned unique sequence numberCombination of user and sequence number is uniqueCombination of user and sequence number is unique
select lname, fname, phone, address select lname, fname, phone, address from contacts from contacts where userid=45 and lnameSeq >= subListStartwhere userid=45 and lnameSeq >= subListStartand lnameSeq <= subListEndand lnameSeq <= subListEnd
AdvantagesAdvantagesLinear dependence on list subset sizeLinear dependence on list subset sizeNo restriction on duplicate list elementsNo restriction on duplicate list elementsTrivial to compute list sizeTrivial to compute list sizeMultiple sorting criteria possibleMultiple sorting criteria possible
Cost of maintaining sequence numberCost of maintaining sequence number
Solutions: CachingSolutions: Caching
Amortize cost of full list retrieval across subset viewsAmortize cost of full list retrieval across subset viewsList resides outside database after first subset retrievalList resides outside database after first subset retrievalAdvantages:Advantages:
Useful when listSize/subSetViews <= subListSize, e.g. list Useful when listSize/subSetViews <= subListSize, e.g. list shared across multiple usersshared across multiple usersResources eliminated completely after first retrievalResources eliminated completely after first retrievalLinear dependence on list subset sizeLinear dependence on list subset sizeCompute total list size onceCompute total list size once
DisadvantagesDisadvantagesPotentially significant response time for first retrievalPotentially significant response time for first retrievalCache state maintained between requests complicating Cache state maintained between requests complicating scalingscalingCache consistencyCache consistencyTier memory required for cacheTier memory required for cache
ExperimentsExperiments
SubseSubset Startt Start
AntipatterAntipattern n get/discarget/discard tuple d tuple (ms)(ms)
AntipatteAntipattern rn get/rendeget/render tuple r tuple (ms)(ms)
Seq. Seq. Number Number get/discard get/discard tuple (ms)tuple (ms)
Seq. Seq. Number Number get/render get/render tuple (ms)tuple (ms)
00 0.260.26 35.2235.22 ------ 20.9320.93
2020 31.7531.75 19.9919.99 ------ 21.0521.05
4040 48.3248.32 22.0622.06 ------ 21.4321.43
8080 81.0281.02 20.1120.11 ------ 22.0122.01
160160 144.39144.39 19.9619.96 ------ 23.1423.14
320320 278.29278.29 21.121.1 ------ 25.3325.33
640640 548.97548.97 20.7820.78 ------ 30.1330.13
12801280 1057.91057.9 20.2920.29 ------ 39.3339.33
Real World ChallengesReal World Challenges
eCal provides a web based calendar/address book eCal provides a web based calendar/address book systemsystemAntipattern uncovered by performance engineeringAntipattern uncovered by performance engineeringResistance to design change from database and Resistance to design change from database and application development teams because of schedulesapplication development teams because of schedulesExperimental evidence reinforced antipattern as Experimental evidence reinforced antipattern as problem for lists above 100 elementsproblem for lists above 100 elementsDebate over average list size per userDebate over average list size per userList subset handling logic encapsulated in stored List subset handling logic encapsulated in stored procedures isolating application logicprocedures isolating application logicMonitor average list sizes in production, when Monitor average list sizes in production, when average exceeds 100, then sequence number average exceeds 100, then sequence number solution usedsolution used
Future WorkFuture Work
Software Performance Antipattern Software Performance Antipattern WorkshopWorkshop
Great opportunity for veteran performance Great opportunity for veteran performance engineers from industry to contributeengineers from industry to contribute
Compendium of Antipatterns much like Compendium of Antipatterns much like Addison-Wesley’s Design Patterns bookAddison-Wesley’s Design Patterns book
Coming soon (WOSP 2003?, SIGMETRICs?)Coming soon (WOSP 2003?, SIGMETRICs?)
Caching TechniquesCaching Techniques