GemStone Tutorial - James Foster
-
Upload
smalltalk-solutions -
Category
Documents
-
view
494 -
download
4
Transcript of GemStone Tutorial - James Foster
© 2010 VMware Inc. All rights reserved
GemStone/S Tutorial
James Foster, Sr. QA Engineer
2
Abstract
This tutorial provides an introduction to GemStone/S, a multi-user
Smalltalk with a built-in database. We then examine certain key
activities and the statistics used to monitor those events.
3
Presenter
Software Background
• As a junior-high student in 1971, I discovered the local university‘s computer
center and a life-long obsession with computers began.
• I was introduced to Smalltalk/V for the Mac in the mid-90s, and became a
Smalltalk bigot.
• I am on the Smalltalk Engineering team at VMware, and am a passionate
advocate for GemStone and Seaside.
Past Careers
• Commercial Pilot
• Lawyer
Other interests
• Economics
• Politics
• Religion
4
Agenda
Architecture
Investigating a commit record backlog
Commit processing
Garbage collection theory
Investigating an MFC cycle
5
Architecture:
What is Different about GemStone?
6
GemStone Enhancements over Typical Smalltalk
Large object space
• Object space is (in practice) limited only by disk (not by RAM)
Transactions
• Related updates can be grouped in an ―all-or-nothing‖ transaction
Persistence
• Transactions are immediately recorded to a log file
Multi-user
• Thousands of virtual machines can interact with a single object space
Multi-machine
• Virtual machines can be on hundreds of hosts
7
Complexities
Large object space
• Disk is slow, so cache recently-used objects in RAM
Transactions
• Group changes to support roll-back (abort)
Persistence
• Recover recent changes in event of crash
Multi-user
• Isolate each user‘s view (repeatable read)
• Manage concurrency conflicts (avoid simultaneous updates to same object)
• Manage object and class versions (when are updates visible to others?)
Multi-machine
• Coordinate object updates between machines
8
Programming Issues
Garbage collection (GC)
• Temporary objects local to virtual machine
• Persistent objects in shared object space
Large collections
• Iterating can be slow
• Use indexes to improve performance
Transactions
• Maintaining obsolete views can be expensive
• Object versions (different views of same object can see different values)
• Class versions (schema updates are not immediately applied to objects)
• Avoiding unnecessary concurrency conflicts
9
Architecture:
How it is Done
10
GemStone Architecture
Repository
• Disk-based ―image‖ holds objects and other data
• Made up of extents (files or raw partitions)
• Objects are on 16k pages
Gem Process(s)
• Single Smalltalk virtual machine
Stone Process
• Manages concurrency
Shared Page Cache
• Fast cache of pages from repository
• Managed by SPC Monitor process
Other Processes
• GC Gems, Symbol Gem, AIO Page Server, Free Frame Server, etc.
Repository
Gem
Stone
SPC
SPC
Monitor
11
Repository and Extent(s)
Holds persistent GemStone objects (i.e., the “image”)
Made up of 1 to 255 extents of up to 32 terabytes each
• On-demand grow
• Pre-grow to maximum size
Each extent is composed of 16 KB pages
• Root Pages
• Object Table Pages
• Data Pages
• Commit Record Pages
• Free OID List
• Free Page List
Page ID designates extent and offset
Statistics: FreePages (Gem vs. Stone)
Repository
12
System Startup
„startstone‟ command
• Command line arguments for database name and other configurations
• Finds and opens all extents specified in required config file
• Finds and opens transaction log specified in required config file
• Starts Shared Page Cache (SPC) Monitor process (which allocates SPC)
• Starts other processes
• AIO Page Server(s)
• Free Frame Server(s)
• Symbol Gem
• GC Admin Gem
• Reclaim Gem(s)
• Restores missing transactions if last shutdown was not clean (i.e., crash)
• Waits for requests from Gems (login, lock, commit, etc.)
13
Shared Page Cache
Typical database challenge: disk is slow
In-RAM cache of pages from repository
Gem(s) may “attach” (or lock) in-use Frames
Frame may contain a “dirty” page
Async IO Page Server(s) write to repository
Reuse frame only if it is unattached and clean
Free Frame List
• Maintained by SPC Monitor
• Might be incomplete
• Gem might be forced to scan cache
Free Frame Server(s) scan for unattached & clean
Repository
SPC
16 KB
Frame
Frame #5
Frame #7
Frame #8
Free Frame List
14
Shared Page Cache Statistics
Shrpc
• FreeFrameCount
• GlobalDirtyPageCount
• LocalDirtyPageCount
• TargetFreeFrameCount
• TotalAttached
Pgsvr
• FramesAddedToFreeList
Gem, Stn
• AttachDelta
• AttachedCount
• FramesFromFindFree
• FramesFromFreeList
• LocalPageCacheHits
• LocalPageCacheMisses
• NonSharedAttached
• TimeInFramesFromFindFree
15
Gem Types
Linked Gem
• Application loads GemStone C Interface
(GCI) library into its process space
• GCI library contains Gem code and runs in
Application‘s OS process space
Remote Procedure Call (RPC) Gem
• Application loads GCI library into its process
space
• GCI library asks NetLDI process to start Gem
process
• Gem process can be on same or different
host as application
• Additional communications overhead
• Reduced risk of application corrupting Gem
Application &
GCI Library
Gem
OS Process
Application &
GCI Library
Gem
TCP/IP
OS Process 1
OS Process 2
16
One-Machine Process Locations (Linked Gem)
Application
& GCI
LibraryGem
Stone
SPC
Repository
Stone Host
17
One-Machine Process Locations (RPC Gem)
Application
& GCI
Library
Gem
Stone
SPC
Repository
Stone Host
NetLDI
18
Two-Machine Process Locations (Gem on Stone Host)
Application
& GCI
Library
Client Host
N
E
T
W
O
R
K
Gem
Stone
SPC
Repository
Stone Host
NetLDI
19
Two-Machine Process Locations (Gem Remote from Stone)
Application
& GCI
Library
Gem
Gem Host
Remote SPC
Page
Server
N
E
T
W
O
R
K
NetLDI
Stone
SPC
Repository
Stone Host
Page
Server
NetLDI
20
Three-Machine Process Locations
Application
& GCI
Library
Client Host
N
E
T
W
O
R
K
Gem
Gem Host
Remote SPC
Page
Server
N
E
T
W
O
R
K
NetLDI
Stone
SPC
Repository
Stone Host
Page
Server
NetLDI
21
Gem Startup
Gem process started (if RPC)
• Linked Gem started with Application
• RPC Gem started by NetLDI based on request from Application (via GCI)
Application requests login
• Application provides Stone host and name to Gem through GCI library
• Gem contacts Stone and is assigned a session ID and a database view
• Gem connects to SPC on local machine
• Stone asks NetLDI on Gem host to start SPC Monitor if needed
• Application provides user ID and password to Gem through GCI library and
Gem validates user based on lookup in database
Login complete
• Gem now waits for requests from GCI
• Application submits requests to GCI for Smalltalk execution or objects
22
Architecture:
Database View and Commit Records
23
Database View and Commit Record
On login, Gem has a database view
Object Table
• Object ID (OID) == Object Oriented Pointer (OOP)
• Map to Page (offset in an Extent)
• Each view is based on a single Object Table
Each commit creates a Commit Record
• Reference to unique Object Table
• List of modified objects (Write Set)
• List of Shadowed Pages
246 10343
247
248
10343
-1
Object Table
Object ID Page ID
Object Table Reference
Write Set
Shadowed Pages
24
Commit Records
There is always at least one database view, or Commit Record (CR)
On login, a Gem is given the most recently created Commit Record
Other Gems can share the same Commit Record (login or abort)
Each (non-empty) commit creates a new Commit Record
An abort moves a Gem to the latest Commit Record
Oldest CR may be disposed of if it is not referenced
Another commit creates another Commit Record
Gem1
CR1 CR2 CR3
Gem2
login logincommitabort commit
25
Commit Record Backlog
Here we have two Gems and two Commit Records
Additional commits create more Commit Records (maybe many!)
Intermediate CRs cannot be disposed of if older CR is referenced
• This can be a major performance issue — a large CR Backlog is bad!
Problems with excess Commit Records
• They take space in SharedPageCache and/or Repository
• They slow down new commit processing
• They delay garbage collection
Gem1
CR2 CR3
Gem2
CRn
commit
…….
26
SigAbort
Important to avoid excessive CR Backlog
Signal requesting an abort (SigAbort) sent to a Gem if and only if:
1. Gem is referencing the oldest Commit Record
2. Gem is not in transaction
3. CR Backlog is above configured value
If Gem responds quickly to SigAbort, good!
Stone can dispose of oldest unreferenced CR(s)
Gem1
CR2 CR3
Gem2
CRn
abort
…….
SigAbort
27
LostOtRoot
If a SigAbort was sent to a Gem and it was ignored for X minutes
• X is configurable, with default of one (1)
Stone will revoke the Gem‟s database view (Commit Record)
Stone will send Gem a signal: LostOtRoot (Lost Object Table Root)
• Any object access will give an error
Stone can dispose of oldest unreferenced CR(s)
Gem must abort to get a new Commit Record (or logout)
Gem1
CR2 CR3
Gem2
CRn…….
LostOtRoot
abort
28
Transaction State vs. Mode
Transaction state
• In – commit attempt is allowed (might succeed or fail)
• Out – commit attempt is not allowed and will always give an error
Transaction mode
• #autoBegin – always in a transaction with a stable view
• #manualBegin – can be in or out, but always a stable view
• #transactionless – can be in or out, stable view only when in transaction
#autoBegin #manualBegin #transactionless
Always in
Stable view in
Stable view out N/A
Can get SigAbort
Can get SigFinish
Safe for GBS
29
Transaction Control
System abortTransaction
• Abort, losing any existing changes and obtain the most recent Commit Record
• New transaction state:
• ‗In transaction‘ if transaction mode is #autoBegin
• ‗Out of transaction‘ if transaction mode is #manualBegin or #transactionless
System beginTransaction
• Abort, losing any existing changes and obtain the most recent Commit Record
• Enter the ‗in transaction‘ state (for all transaction modes)
System commitTransaction
• If commit succeeds, new transaction state:
• ‗In transaction‘ if transaction mode is #autoBegin
• ‗Out of transaction‘ if transaction mode is #manualBegin or #transactionless
• If commit fails, see next slide …
30
Failed Commit
Reasons for a commit failure
• Another Gem has a lock (read or write) on an object we modified
• An object we modified was modified in a Commit Record after we got our view
Impact of commit failure
• Update to most recent Commit Record
• No longer on prior Commit Record, so database view is updated; but …
• Still have all locally modified objects
• New database view does not change local modifications of persistent objects
• Still in transaction
• But any further commit attempt will fail
• Gem will need to abort before any subsequent commit can succeed
• Abort will lose all local modifications of persistent objects
• May wish to copy modifications into other objects before abort
• Could reapply changes after abort and attempt another commit
31
Agenda
Architecture
Investigating a commit record backlog
Commit processing
Garbage collection theory
Investigating an MFC cycle
32
Investigating a
Commit Record Backlog
33
Example 1: Find Offending Process
Stone process, CommitRecordCount
• Identifies when it began and magnitude
• December 1, 2010: goes from 133 at 00:31 to 1082 at 00:41
Stone process, OldestCrSession
• Identifies which session is holding the oldest commit record
• Session 4 from 00:35 to 01:21
Find offending process
• Note that sessionId can be reused, so look at process start time
• Session 4, pid 18577, logged in at start of statmonitor file and stayed in all day
34
Example 1: What is Process Doing?
Pid 18577, TransactionLevel
• 0 -> manual begin mode, not in a transaction
• Gem process began a transaction at 00:31, ended at 01:23
CommitCount
• 31 commits at 00:31, still 31 at 01:23 (32 at 01:25)
• Statmonitor file starts at 00:05, process seems to be new; started at 00:00?
PageReads, ScavengeCount
• Gives a sense of gem activity
• Both show continuous activity from 00:31 to 01:23
Conclusion
• Appears to be a batch job that is busy
• Code should be rewritten to do intermediate commits or aborts
35
Example 2: Idle User Session
Stone process, OldestCrSession
• SessionId 129 from 12:39 to 14:11
Process started at 12:25, pid 10824
• 54 samples at 2 minutes per sample indicates 108 minutes connected
TransactionLevel
• Value of 1 entire time indicates always in a transaction
CommitCount, Abort
• Flat during entire sample period
PageReads
• Many immediately after login (at 12:27), but completely flat thereafter
Conclusion
• UI process with login, no activity, logout
36
Example 3: Read-only Batch Job
OldestCrSession
• SessionId 98 at 17:27, continues to end of file
Pid 12412 started at 15:51
• 247 samples at 2 minutes per sample takes us to the end of the file
TransactionLevel
• Sometimes 0 (manualBegin mode and out of a transaction)
• Began a transaction at 17:05
CommitCount, AbortCount
• No commits; abort count is flat after 17:05, so no fresh database view
PageReads
• Grows continually
Conclusion: need batch job to be rewritten to abort as needed
37
Agenda
Architecture
Investigating a commit record backlog
Commit processing
Garbage collection theory
Investigating an MFC cycle
38
Commit Process Steps
39
Prepare to Commit
Assign Object ID (OOP) to each new object
• Unused object ID list kept by Stone and distributed to Gems as needed
Obtain empty pages to hold new and modified objects
• Unused page ID list kept by Stone and distributed to Gems as needed
Obtain empty frames in shared page cache to hold new pages
• Free Frame List kept by SPC Monitor and used if available; else search
Copy new and modified objects to empty pages in SPC
Create transaction log record(s) describing changes attempted
• Records could be written to the transaction log before commit
• Will be followed by final succeed/fail record
40
Request Commit Token
Once the Gem has done everything that can be done in parallel, it
requests the commit token from the stone
Stn: CommitQueueSize
• Number of Gems waiting for the commit token
Stn: CommitTokenSession
• Session ID of Gem holding the commit token
Gem: TimeWaitingForCommit
• Total milliseconds spent waiting for commit token
41
Commit Validation
Search for write-write conflicts
• Each commit creates a commit record that includes new and modified objects
(the Write Set)
• Merging all the commit records since our transaction began gives a Write Set
Union of objects modified since our view was most recent
• The Gem searches the Write Set Union for objects modified in this transaction
• If any objects we want to modify have been modified by other Gems, then our
commit attempt fails (first to modify wins)
Search for modifications to locked objects
• With the commit token, the Gem gets access to the Lock Set
• The Gem searches the Lock Set for objects modified in this transaction
• If any objects we want to modify are read-locked or write-locked by other
Gems, then our commit attempt will fail
42
If Commit Validation Fails
Return commit token to stone
• Stone writes record indicating that prior transaction records should be ignored
Gem performs “partial” abort
• Gem obtains the most recent commit record (latest database view)
• Gem keeps modified objects in local memory (application could copy them)
• Subsequent commit attempt will fail immediately (no validation required)
Gem statistics
• AbortCount
• FailedCommitCount
Stone statistics
• Total Aborts
• LogRecordsIoCount
• LogRecordsWritten
43
If Commit Validation Succeeds - 1
Gem builds a new commit record
• New database view (object table) merging local changes and additions with
most recent view
Gem returns commit token to Stone
• Stone writes record indicating that prior transaction records should be applied
Stone notifies Gem that final log records was written to disk
• Gem waits till it is notified to proceed
Stone notifies requesting Gems of selected modified objects
• Gems can register an interest in being notified if an object is modified
44
If Commit Validation Succeeds - 2
Gem statistics
• BytesCommittedCount
• CommitCount
• NewObjectsCommitted
• ObjectsCommitted
• TimeStoneCommit
Stone statistics
• LogRecordsIoCount
• LogRecordsWritten
• TotalCommits
• TotalNewObjectsCommitted
45
Commit Record Disposal
A commit or abort may release a Gem‟s reference to a view
• A commit always creates a new commit record and releases the old one
• An abort moves to the most recent, which could still be the current one
When a commit record is not referenced, it may be disposed of
• Multiple Gems may reference the same commit record, so one release does
not mean immediate disposal
• A commit record references various resources that are no longer needed (e.g.,
the object table associated with its view)
Shadow objects
• Modified objects may have their old values referenced from another view
• An old copy of an object remains on its page and is called a shadow object
• Once no more views exist referencing the old values, the space taken up by
the shadow object may be reclaimed
• GcGems perform this reclaim activity as directed by the Stone
46
Agenda
Architecture
Investigating a commit record backlog
Commit processing
Garbage collection theory
Investigating an MFC cycle
47
Repository Garbage Collection
48
Types of Repository Garbage Collection
Page reclamation
• Recovering space taken by shadow objects
• Compacting space by copying objects from partially filled pages
Full markForCollection
• Scan the entire repository for any references to every object
Epoch GC
• Scan the objects modified during a time period (epoch) for new references to
new objects
Off-line GC
• Scan the entire repository for any references to objects found to be
unreferenced by an off-line scan
49
Garbage Collection Vocabulary
AllUsers
• The instance of UserProfileSet that acts as a root for the object graph
Live object
• An object referenced directly or indirectly from AllUsers
Dead object
• An object defined in the object table and present on a page, but not live
• A dead object may be referenced by a dead object (but not a live object)
• A dead object may reference both live and dead objects (but it doesn‘t matter)
• The object ID (OOP) and the space of a dead object may be reclaimed
Shadow object
• When an existing object is modified, it is placed on a new page
• The old page is preserved until no more views reference the old object
• The space can be reclaimed (through page compaction), but not the object ID
50
Summary of Repository-Wide Garbage Collection
MFC Gem builds possible dead set
• Mark live objects
• Object table sweep
• Record possible dead
Voting to remove from possible dead set (managed by Stone)
• Current gems vote based on current references at next commit or abort
• GcGem votes on behalf of all commit records since start of MFC
Cleanup
• Finalizing for selected objects
• Page reclamation
• Return of pages and object IDs to free pool
51
Mark Live Objects
Find connected objects
• Start with AllUsers as the root of the object graph (the original ‗live‘ object)
• Perform a ‗transitive closure‘ visiting each object referenced from a live object
• Add each live object to a live object set
Gem: ProgressCount
• Number of live objects found so far
• When this statistic drops back to zero, this step is done
Configuration
• Set mfcGcPageBufSize
Process is very I/O and CPU intensive
• Read object table page and data page for every live object
• Same page might be read multiple times
52
Object Table Sweep
Subtract live objects from all objects to get possible dead set
Gem: ProgressCount
• Begins at zero (clearing from previous step)
• Count of possible dead objects
• When this statistic drops back to zero, this step is done
53
Record Possible Dead
Pass possible dead set to stone
MFC Gem‟s task is now done
Stn: PossibleDeadSize
• Rough approximation of possible dead set size
54
Voting by Existing Gems
As each logged-in Gem does an abort or commit
• Stone passes list of possible dead to Gem for voting
• Gem scans its private memory for references to the objects
• Referenced objects are voted ‗not dead‘
Gem Statistic
• VoteNotDead
Stn Statistics
• GcPossibleDeadSize
• GcVoteUnderway
• SessionNotVoted
Garbage collection can stall here
• Voting happens only at the next abort or commit
• A quiet Gem that does not abort or commit will not vote
55
Voting by GcGem („Finalize Voting‟)
Original live object set is based on view at beginning of MFC
Any commit since MFC began could have created a reference
A „write set union‟ of all commit records since MFC began is kept
GcGem takes possible dead set and searches for new references
Stn statistics:
• GcPossibleDeadWsUnionSize
• GcSweepCount
• GcPossibleDeadSize
• DeadNotReclaimedSize
Gem statistic:
• ProgressCount
At end, we have a definitive dead object set
56
Cleanup: Finalizing
GcGem reads each object in the dead set
Certain dead objects require special cleanup
• Collections with indexes
• Compiled methods
Gem: ProgressCount
Stn: GcPossibleDeadSize
Stn: DeadNotReclaimed
57
Cleanup: Reclamation
GcGem activity
For each page containing a dead object
• In a transaction, copy all live objects on that page to a new page
• This leaves only shadow objects (the current version is on a new page), dead
objects, and free space on the old page
Note that the old page might still be referenced from a view
• Shadow objects need to be kept around as long as they are part of a view
Stn statistics
• GcReclaimState
• GcReclaimNewDataPagesCount
• DeadObjsCount
• FreePages
• GcPagesNeedReclaimSize
• DeadNotReclaimedSize
58
Cleanup: Return to Free Pool
When commit record for reclaim activity is no longer referenced
Page IDs and Object IDs associated with that reclaim are added to
the free list
59
Agenda
Architecture
Investigating a commit record backlog
Commit processing
Garbage collection theory
Investigating an MFC cycle
60
Investigating an MFC Cycle
61
Mark and Object Table Sweep
Topaz1938
• Start: Fri 03 Dec 20:31 Pid 17589 Session 21 – ProgressCount
• End: Wed 08 Dec
• Mark phase ends at 00:31 with ProgressCount of 920,224,401
• Object table sweep ends at 01:45 with ProgressCount of 24,544,409
• Record possible dead passes list to Stone and then Gem logs out
62
Current Gems Voting on Current Objects
Stn: PossibleDeadSize
• 36 million at 01:47
Stn: GcVoteUnderway
• 1 at 01:47 (current Gems voting)
• 3 at 02:11 (Possible Dead Write-Set Union Sweep in progress)
Stn: SessionNotVoted
• 2 at: 01:47
• 5 at 02:09
Stn: GcPossibleDeadSize
• 39,535,498 at 02:11
Gems: VoteNotDead
• none found
Not much time spent waiting for Gems to vote!
63
Possible Dead Write-Set Union Sweep
Stn: GcVoteUnderway
• 3 at 02:11 (Possible dead write-wet union sweep in progress)
• 0 at 05:17 (done)
Stn: GcReclaimState
• 4 at 02:11 (Write-set union sweep)
• 2 at 05:19 (Reclaiming shadowed pages)
Stn: GcPossibleDeadSize
• 39,535,498 at 02:11
Stn: GcPossibleDeadWsUnionSize
• 180,647 at 02:11
GcGem: ProgressCount
• 0 at 02:11
• Grows consistently to 39,517,338 at 05:15
• 0 at 05:17
64
Cleanup Dead Objects
Special processing for instances of selected classes
• Indexed collections
• Compiled methods
GcGem: ProgressCount
• No evidence of this activity
• It could have happened between statmonitor samples
• Samples at two-minute interval is a bit course for this sort of activity
65
Interlude for Epoch GC
PDWSUS finished at 05:17
• Stn: GcVoteUnderway returns to 0 at 05:17
• GcGem: ProgressCount returns to 0 at 05:17
Cleanup dead objects too quick to be measured
Next step would be „promote to dead‟ to begin reclamation activity
• Stn: DeadNotReclaimedSize does not jump till 05:49
• Why?
Epoch GC
• Stn: EpochGcCount increased at 05:25
• Stn: GcReclaimState of 5 from 05:25 through 05:47
• GcGem: ProgressCount starts rising at 05:25 and returns to zero at 05:49
66
Voting for Possible Dead Set from Epoch
Stn: GcVoteUnderway
• 1 -> 'gems are voting on the possible dead' from 05:49 through 07:55
Stn: SessionNotVoted
• Session 2 (login at 05:21) from 05:49 through 07:03
Several processes show VoteNotDead increase
• From 05:49 through 06:03
• Gem10320, Gem10322, Gem10323, Gem10324, Gem10341, Gem10342
• Gem10343, Gem10404, Gem11094, Gem11241, Gem12264
Stn: GcReclaimState
• 2 -> 'reclaiming shadowed pages' from 05:49 through 07:55
67
Dead Object Reclamation
Stn: DeadNotReclaimedSize
• Jumps from zero to 39,526,649 at 05:49
Stn: GcReclaimNewDataPagesCount
• Shows activity starting at 05:49
Stn: GcPagesNeedReclaimSize
• Flat till 05:47; starts growing at 05:49
Resource intensive
• Shrpc: FreeFrameCount
• GcGem: PageReads
68
Return to Free Pool
Stn: FreePages
• ~6 million at 05:49
• Drops to ~1 million in two hours
• Climbs up to ~7.5 million in 1-1/2 hours
Stn: FreeOopCount
• ~45 million at 08:00
• ~125 million at 18:00
Ready to start over in a couple days!