The Sisyphus Database Retrieval Software Performance Antipattern Robert F. Dugan Jr. Dept. of...

14
The Sisyphus Database The Sisyphus Database Retrieval Software Retrieval Software Performance Antipattern Performance Antipattern Robert F. Dugan Jr. Robert F. Dugan Jr. Dept. of Computer Science Dept. of Computer Science Stonehill College Stonehill College Easton, MA 02357 USA Easton, MA 02357 USA [email protected] [email protected] Ali Shokoufandeh Ali Shokoufandeh Dept. of Math/Computer Dept. of Math/Computer Science Science Drexel University Drexel University Philadelphia, PA 19104 USA Philadelphia, PA 19104 USA [email protected] [email protected] Ephraim P. Glinert Ephraim P. Glinert Dept. of Computer Science Dept. of Computer Science Rensselaer Polytechnic Inst. Rensselaer Polytechnic Inst. Troy, NY 12180 USA Troy, NY 12180 USA [email protected] [email protected] Third International Workshop on Software and Performance Third International Workshop on Software and Performance July 24-26, 2002 July 24-26, 2002 Rome, Italy Rome, Italy

Transcript of The Sisyphus Database Retrieval Software Performance Antipattern Robert F. Dugan Jr. Dept. of...

Page 1: The Sisyphus Database Retrieval Software Performance Antipattern Robert F. Dugan Jr. Dept. of Computer Science Stonehill College Easton, MA 02357 USA bdugan@stonehill.edu.

The Sisyphus Database The Sisyphus Database Retrieval Software Performance Retrieval Software Performance

AntipatternAntipattern

Robert F. Dugan Jr.Robert F. Dugan Jr.

Dept. of Computer ScienceDept. of Computer Science

Stonehill CollegeStonehill College

Easton, MA 02357 USAEaston, MA 02357 USA

[email protected]@stonehill.edu

Ali ShokoufandehAli Shokoufandeh

Dept. of Math/Computer ScienceDept. of Math/Computer Science

Drexel UniversityDrexel University

Philadelphia, PA 19104 USAPhiladelphia, PA 19104 USA

[email protected]@mcs.drexel.edu

Ephraim P. GlinertEphraim P. Glinert

Dept. of Computer ScienceDept. of Computer Science

Rensselaer Polytechnic Inst.Rensselaer Polytechnic Inst.

Troy, NY 12180 USATroy, NY 12180 USA

[email protected]@cs.rpi.edu

Third International Workshop on Software and Performance Third International Workshop on Software and Performance July 24-26, 2002July 24-26, 2002

Rome, ItalyRome, Italy

Page 2: The Sisyphus Database Retrieval Software Performance Antipattern Robert F. Dugan Jr. Dept. of Computer Science Stonehill College Easton, MA 02357 USA bdugan@stonehill.edu.

OverviewOverview

Software Performance AntipatternsSoftware Performance Antipatterns

Sisyphus Database Retrieval Sisyphus Database Retrieval AntipatternAntipattern

SolutionsSolutions

ExperimentsExperiments

Real World ChallengesReal World Challenges

Future WorkFuture Work

Page 3: The Sisyphus Database Retrieval Software Performance Antipattern Robert F. Dugan Jr. Dept. of Computer Science Stonehill College Easton, MA 02357 USA bdugan@stonehill.edu.

Software Performance Software Performance AntipatternsAntipatterns

Software Design Patterns: Software Design Patterns: Effective solution to a common software Effective solution to a common software design problemdesign problemsingleton, proxy, iterator, observer/listener singleton, proxy, iterator, observer/listener [Gamma et al. 1995][Gamma et al. 1995]

Software Design Antipatterns: Software Design Antipatterns: ““A commonly occurring solution to a A commonly occurring solution to a problem that generates decidedly negative problem that generates decidedly negative consequences.” [Brown et al. 1998]consequences.” [Brown et al. 1998]““god” class, dead code, class proliferationgod” class, dead code, class proliferation

Page 4: The Sisyphus Database Retrieval Software Performance Antipattern Robert F. Dugan Jr. Dept. of Computer Science Stonehill College Easton, MA 02357 USA bdugan@stonehill.edu.

Software Performance Software Performance AntipatternsAntipatterns

““Software Performance Antipatterns”, Software Performance Antipatterns”, Smith and Williams, WOSP 2000Smith and Williams, WOSP 2000

““God” ClassGod” Class

Circuitous Treasure HuntCircuitous Treasure Hunt

Excessive Dynamic AllocationExcessive Dynamic Allocation

One Lane BridgeOne Lane Bridge

A commonly occurring solution to a A commonly occurring solution to a software design problem that generates software design problem that generates decidedly negative decidedly negative performanceperformance consequencesconsequences

Page 5: The Sisyphus Database Retrieval Software Performance Antipattern Robert F. Dugan Jr. Dept. of Computer Science Stonehill College Easton, MA 02357 USA bdugan@stonehill.edu.

Sisyphus Database Retrieval Sisyphus Database Retrieval AntipatternAntipattern

1)1) Issue request to display list Issue request to display list subsetsubset

2)2) Issue database query to retrieve Issue database query to retrieve entire listentire list

3)3) Return query resultsReturn query results4)4) Determine number of items Determine number of items

displayeddisplayed5)5) Iterate through result set Iterate through result set

discarding all items until discarding all items until first first itemitem to display is reached to display is reached

6)6) Continue through result set Continue through result set rendering items for display until rendering items for display until last itemlast item to display is reached to display is reached

7)7) Discard remaining result setDiscard remaining result set8)8) Display subsetDisplay subset

examples: email, address book, search results

Page 6: The Sisyphus Database Retrieval Software Performance Antipattern Robert F. Dugan Jr. Dept. of Computer Science Stonehill College Easton, MA 02357 USA bdugan@stonehill.edu.

Sisyphus Database Retrieval Sisyphus Database Retrieval AntipatternAntipattern

Key to this antipattern is Key to this antipattern is the processing the processing necessary to retrieve the necessary to retrieve the entire list from which a entire list from which a subset is extracted must subset is extracted must be repeated.be repeated.Recalls Greek myth of Recalls Greek myth of Sisyphus damned for all Sisyphus damned for all eternity to push a stone eternity to push a stone up a hill only to watch it up a hill only to watch it roll back down again.roll back down again.

Sisyphus by Franz von Stock

Page 7: The Sisyphus Database Retrieval Software Performance Antipattern Robert F. Dugan Jr. Dept. of Computer Science Stonehill College Easton, MA 02357 USA bdugan@stonehill.edu.

Sisyphus Database Retrieval Sisyphus Database Retrieval AntipatternAntipattern

Web Ser verBrowser Database Ser ver

view list subset

process http reques t

determine list subset to return to browser

query to retrieve entire list

process query

return tupl e

discard tupl e until first subset item

render html

html response

discard result set

loop

return tupl e

render tuple until las t subset item

loop

Web Ser verBrowser Database Ser ver

view list subset

process http reques t

determine list subset to return to browser

query to retrieve entire list

process query

return tupl e

discard tupl e until first subset item

render html

html response

discard result set

loop

return tupl e

render tuple until las t subset item

loop

Three tier system Three tier system selected.selected.SPE techniques used to SPE techniques used to model and analyze model and analyze antipattern.antipattern.ResourceResource AnalysisAnalysis

DiskDisk Database Server:Database Server: Number of disk I/O operations Number of disk I/O operations rises linearly with the size of the total list. I/O rises linearly with the size of the total list. I/O reduction possible with database caching, but reduction possible with database caching, but memory resource contention as system scales to memory resource contention as system scales to more usersmore users

CPUCPU BrowserBrowser: linear dependence on list subset: linear dependence on list subset

WebWeb ServerServer: linear dependence on start position of : linear dependence on start position of subset within result set; linear dependence on list subset within result set; linear dependence on list subsetsubset

Database ServerDatabase Server: log linearly with the size of the : log linearly with the size of the total list; linear dependence on start position of total list; linear dependence on start position of subset within result set; linear dependence on list subset within result set; linear dependence on list subsetsubset

NetworkNetwork Browser-Web ServerBrowser-Web Server: linear dependence on list : linear dependence on list subsetsubset

Web-Database ServerWeb-Database Server: linear dependence on start : linear dependence on start position of subset within result setposition of subset within result set

Page 8: The Sisyphus Database Retrieval Software Performance Antipattern Robert F. Dugan Jr. Dept. of Computer Science Stonehill College Easton, MA 02357 USA bdugan@stonehill.edu.

Solutions: Index and Solutions: Index and RownumRownummulti-attribute index and rownummulti-attribute index and rownum

select lname, fname, phone, address select lname, fname, phone, address from contacts from contacts where userid=45 and rownum <= 50where userid=45 and rownum <= 50

Advantages:Advantages:processing beyond subset eliminatedprocessing beyond subset eliminatedsorting result set eliminatedsorting result set eliminated

DisadvantagesDisadvantageslinear dependence on subset start positionlinear dependence on subset start positionmulti-attribute index prevents dynamic sortingmulti-attribute index prevents dynamic sortingno total list sizeno total list size

Page 9: The Sisyphus Database Retrieval Software Performance Antipattern Robert F. Dugan Jr. Dept. of Computer Science Stonehill College Easton, MA 02357 USA bdugan@stonehill.edu.

Solutions: Upper/Lower Solutions: Upper/Lower BoundBoundmulti-attribute index, lower bound attribute multi-attribute index, lower bound attribute value, rownumvalue, rownum

select lname, fname, phone, address select lname, fname, phone, address from contacts from contacts where userid=45 and rownum <= SUBSETSIZE where userid=45 and rownum <= SUBSETSIZE and lname > ENDSUBSETLASTNAMEand lname > ENDSUBSETLASTNAME

Linear dependence on list subset sizeLinear dependence on list subset sizeDisadvantages:Disadvantages:

lower bound attribute must be uniquelower bound attribute must be uniquemulti-attribute index prevents dynamic sortingmulti-attribute index prevents dynamic sortingno total list sizeno total list size

Page 10: The Sisyphus Database Retrieval Software Performance Antipattern Robert F. Dugan Jr. Dept. of Computer Science Stonehill College Easton, MA 02357 USA bdugan@stonehill.edu.

Solutions: Sequence Solutions: Sequence NumbersNumbersEach list element assigned unique sequence numberEach list element assigned unique sequence numberCombination of user and sequence number is uniqueCombination of user and sequence number is unique

select lname, fname, phone, address select lname, fname, phone, address from contacts from contacts where userid=45 and lnameSeq >= subListStartwhere userid=45 and lnameSeq >= subListStartand lnameSeq <= subListEndand lnameSeq <= subListEnd

AdvantagesAdvantagesLinear dependence on list subset sizeLinear dependence on list subset sizeNo restriction on duplicate list elementsNo restriction on duplicate list elementsTrivial to compute list sizeTrivial to compute list sizeMultiple sorting criteria possibleMultiple sorting criteria possible

Cost of maintaining sequence numberCost of maintaining sequence number

Page 11: The Sisyphus Database Retrieval Software Performance Antipattern Robert F. Dugan Jr. Dept. of Computer Science Stonehill College Easton, MA 02357 USA bdugan@stonehill.edu.

Solutions: CachingSolutions: Caching

Amortize cost of full list retrieval across subset viewsAmortize cost of full list retrieval across subset viewsList resides outside database after first subset retrievalList resides outside database after first subset retrievalAdvantages:Advantages:

Useful when listSize/subSetViews <= subListSize, e.g. list Useful when listSize/subSetViews <= subListSize, e.g. list shared across multiple usersshared across multiple usersResources eliminated completely after first retrievalResources eliminated completely after first retrievalLinear dependence on list subset sizeLinear dependence on list subset sizeCompute total list size onceCompute total list size once

DisadvantagesDisadvantagesPotentially significant response time for first retrievalPotentially significant response time for first retrievalCache state maintained between requests complicating Cache state maintained between requests complicating scalingscalingCache consistencyCache consistencyTier memory required for cacheTier memory required for cache

Page 12: The Sisyphus Database Retrieval Software Performance Antipattern Robert F. Dugan Jr. Dept. of Computer Science Stonehill College Easton, MA 02357 USA bdugan@stonehill.edu.

ExperimentsExperiments

SubseSubset Startt Start

AntipatterAntipattern n get/discarget/discard tuple d tuple (ms)(ms)

AntipatteAntipattern rn get/rendeget/render tuple r tuple (ms)(ms)

Seq. Seq. Number Number get/discard get/discard tuple (ms)tuple (ms)

Seq. Seq. Number Number get/render get/render tuple (ms)tuple (ms)

00 0.260.26 35.2235.22 ------ 20.9320.93

2020 31.7531.75 19.9919.99 ------ 21.0521.05

4040 48.3248.32 22.0622.06 ------ 21.4321.43

8080 81.0281.02 20.1120.11 ------ 22.0122.01

160160 144.39144.39 19.9619.96 ------ 23.1423.14

320320 278.29278.29 21.121.1 ------ 25.3325.33

640640 548.97548.97 20.7820.78 ------ 30.1330.13

12801280 1057.91057.9 20.2920.29 ------ 39.3339.33

Page 13: The Sisyphus Database Retrieval Software Performance Antipattern Robert F. Dugan Jr. Dept. of Computer Science Stonehill College Easton, MA 02357 USA bdugan@stonehill.edu.

Real World ChallengesReal World Challenges

eCal provides a web based calendar/address book eCal provides a web based calendar/address book systemsystemAntipattern uncovered by performance engineeringAntipattern uncovered by performance engineeringResistance to design change from database and Resistance to design change from database and application development teams because of schedulesapplication development teams because of schedulesExperimental evidence reinforced antipattern as Experimental evidence reinforced antipattern as problem for lists above 100 elementsproblem for lists above 100 elementsDebate over average list size per userDebate over average list size per userList subset handling logic encapsulated in stored List subset handling logic encapsulated in stored procedures isolating application logicprocedures isolating application logicMonitor average list sizes in production, when Monitor average list sizes in production, when average exceeds 100, then sequence number average exceeds 100, then sequence number solution usedsolution used

Page 14: The Sisyphus Database Retrieval Software Performance Antipattern Robert F. Dugan Jr. Dept. of Computer Science Stonehill College Easton, MA 02357 USA bdugan@stonehill.edu.

Future WorkFuture Work

Software Performance Antipattern Software Performance Antipattern WorkshopWorkshop

Great opportunity for veteran performance Great opportunity for veteran performance engineers from industry to contributeengineers from industry to contribute

Compendium of Antipatterns much like Compendium of Antipatterns much like Addison-Wesley’s Design Patterns bookAddison-Wesley’s Design Patterns book

Coming soon (WOSP 2003?, SIGMETRICs?)Coming soon (WOSP 2003?, SIGMETRICs?)

Caching TechniquesCaching Techniques