Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of...

34
Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat 14 January 2003

Transcript of Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of...

Page 1: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Building Undo for Operators:An Undoable E-mail Store

Aaron Brown

ROC Research GroupUniversity of California, Berkeley

Winter 2003 ROC Research Retreat14 January 2003

Page 2: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 2

Outline

• Recap of “Three-R’s” Undo model

• Implementing Undo for e-mail

• Analysis of e-mail undo prototype

• Conclusions and future directions

Page 3: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 3

Recap: Undo for Operators

• Dependability is achieved at the interface of the system and its operator

• Undo goal: support the operator– create a forgiving operations environment

» tolerate human error, allow trial-and-error experiments

– provide last-resort recovery from unknown damage» retroactive repair of software bugs, broken upgrades,

unknown operator changes, external attacks

– be easily comprehensible by operator and developer– preserve user data while refreshing system state

Page 4: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 4

Recap: The Three-R’s Undo Model• Provide Undo as virtual time travel

– Rewind: roll back all system state, physically– Repair: make changes to past timeline to avert

original disaster– Replay: roll state forward logically, merging

original timeline with effects of repairs

• Key properties of 3R’s Undo– recovery from problems at any system layer– recovery from unanticipated problems– no assumptions about correct application

behavior

Page 5: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 5

Outline

• Recap of “Three-R’s” Undo model

• Implementing Undo for e-mail

• Analysis of e-mail undo prototype

• Conclusions and future directions

Page 6: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 6

Building an E-mail Undo Prototype• Target: an undoable e-mail store service

– a leaf node in the Internet e-mail network– accessed via IMAP and SMTP

• Built around existing e-mail store service– e.g., sendmail+imapd, iPlanet, Exchange

• Extensible to other apps– architected around a reusable undo core with e-

mail-specific extensions

• Written in Java

Page 7: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 7

Undo System Architecture

E-mail Service

Time-travelStorage

UndoManager

Verbs

control

TimelineLog

ControlUIUser

E-mail Proxy

IMAP, SMTP

IMAP, SMTP

Includes: - user state - application - OS

Page 8: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 8

Architecture Design Points

• Proxy-based design targeted at services

• Application-neutral core– undo manager, timeline log, time-travel storage, UI– contains all undo-cycle logic

• E-mail-specific semantics encapsulated– proxy: encodes e-mail interactions into verbs– verbs: define e-mail semantics

» model of preserved state» model of acceptable external (observed) consistency

Page 9: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 9

Verbs

• Key construct in undo architecture– represent end-user state changes, state

exposure– building block of timeline and history– used to detect and manage inconsistencies

• Essential for reasoning about undo– verbs define exactly what state is preserved

during an undo cycle– provide framework for defining application’s

model of acceptable external inconsistency

Page 10: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 10

Verbs and the Undo Cycle

• Verb flow

E-mail Service

Includes: - user state - application - OS

Time-travelStorage

UndoManager

Verbs

control Timeline

Log

ControlUIUser

E-mail Proxy

Replay

Normal operation

Page 11: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 11

Comparison to Optimistic Replication• Insight: undo can be recast as

replication—not in space, but in time– two virtual replicas diverge at the Rewind point– verb history defines an update log to one replica– repairs define updates to the other replica– Replay involves reconciling the two update

streams, detecting and handling inconsistencies

• Verb-based design is heavily influenced by replica systems– verbs map to replica system “actions” or

“updates”

Page 12: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 12

Comparison to Optimistic Replication (2)• But Undo has key differences:

– reconciliation between a logical log (verb history) and a physical state (repaired system)» so log-merging schemes (e.g., IceCube) don’t apply

– not all inconsistencies are bad» no need to check for inconsistency until externalization» no complex, error-prone guards on verb execution

– inconsistencies can be handled after-the-fact» post-conditions on externalized data are sufficient» fewer assumptions of application correctness» if necessary, can rewind undesired inconsistency-

producing verb

Page 13: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 13

E-mail example: original timeline

Systemboundary

Systemstate

Verbs

Historylog

Time

Inbox

Folder1olleH

Copy

Copy

Externalizes:—

olleH

Fetch

Fet

ch

m

Externalizes:m

+ Signature(m)=“olleH”

Hello

olleH

Deliver

Deliver

m

Externalizes:—

+ input “Hello”

Page 14: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 14

olleH

Copy

Copy

Externalizes:—

olleH

Fetch

Fet

ch

m

Externalizes:m

+ Signature(m)=“olleH”

Hello

olleH

Deliver

Deliver

m

Externalizes:—

+ input “Hello”

XHello

Deliver

Externalizes:—

+ input “Hello”

Hello

Deliver

m

E-mail example: replay timeline

Systemboundary

Systemstate

Verbs

Historylog

Time

Inbox

Folder1 Hello

Copy

Copy

Externalizes:—

Hello

Fetch

Fet

ch

m

Externalizes:m

+ Signature(m)=“olleH”

mismatch! => inconsistency

Page 15: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 15

Verbs and External Inconsistency• Detecting inconsistency

– verbs that externalize state record a signature of externalized data and define a comparison function» comparison fn. defines the external consistency

model

• Handling inconsistency– verbs define compensation routines, invoked

when inconsistencies are found– later non-commuting verbs are squashed to

prevent data loss

Page 16: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 16

Verbs for E-mail

• SMTP & IMAP protocols mapped into 13 verbs– SMTP: Deliver– IMAP: Append, Fetch, Store, Copy, List, Status,

Select, Expunge, Close, Create, Rename, Delete– set could easily be extended with user and

subscription management functionality

• Each verb is a Java class implementing a generic, app-neutral Verb interface

Page 17: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 17

Example Verb: STORE (set msg flags)

• Tag– input: target folder, message IDs, flag value– externalized output: resultant flags of messages

• Sequencing tests– commutes with & independent of SMTP verbs, all

IMAP verbs except Fetch/Store/Close on same folder

• Consistency check– new flags must match original flags for all messages

in common; no original messages should be missing

• Inconsistency handler– leave user a message explaining and listing changes

Page 18: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 18

An External Consistency Model for E-mail• Retrieval (IMAP)

– tracked state includes message bodies, key message headers (to/from/subject/cc), flags, folder lists, and execution status of state-altering commands

– inconsistency if objects are missing or altered on replay or commands fail; order & new objects ignored

– compensation via explanatory messages and creation of “lost-and-found” containers

• Delivery (SMTP)– only possible inconsistency is in execution status– one tricky case: originally-failed delivery succeeds

» delay bounce to provide window for undo-repair

Page 19: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 19

Undo System Architecture

• Next up:– proxy– undo manager– time-travel

storage layer

E-mail Service

Time-travelStorage

UndoManager

Verbs

control

TimelineLog

ControlUIUser

E-mail Proxy

IMAP, SMTP

IMAP, SMTP

Page 20: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 20

Proxying E-mail

• IMAP proxy loop is straightforward– accept client connection and open server connection– parse commands and instantiate corresponding verbs– invoke undo manager to schedule, execute, log verb

» passing a handle to the client and server connections

• SMTP is more complicated– failed SMTP deliveries must be logged for later retry– proxy poses as a server that always accepts deliveries

» deliver verb is created after client has been ACK’d» some finesse to avoid being an open-relay

– if verb fails, proxy sends a bounce after a delay

Page 21: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 21

Undo Manager Implementation• Timeline log

– BerkeleyDB recno database storing serialized verbs– also stores control records to make undo manager

state persistent and recoverable

• Verb scheduling– all verb execution passes through undo manager– scoreboard-like structure sequences verbs according

to independence, commutativity, ordering properties

• External inconsistency management– undo manager coordinates verb APIs to ensure that

external inconsistencies are detected and handled

Page 22: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 22

A Time-Travel Storage Layer

• Base: Network Appliance Filer with snapshots

• Java wrapper for Filer’s management CLI– provides direct control of snapshot create/restore– periodically takes snapshots, aging out old ones

» hierarchical scheduling allows the 31 snapshots to span one month, with density inversely proportional to age

• Rewind restores the closest prior snapshot, then replays up to the exact rewind point

• Challenge: making snapshot restore undoable– solution: implement rollback by copying old snapshot

forward to present (expensive, but necessary)

Page 23: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 23

Outline

• Recap of “Three-R’s” Undo model

• Implementing Undo for e-mail

• Analysis of e-mail undo prototype

• Conclusions and future directions

Page 24: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 24

Code complexity

• Entire system is about 23K lines of Java– split evenly between app-neutral and

e-mail-specific code– bulk of e-mail code is in verbs

• Implementing and debugging the e-mail verbs and proxy took roughly two man-months

Page 25: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 25

Measuring Overhead for Undo• Workload

– modified SPECmail2000 e-mail benchmark» simulates traffic seen by an ISP mail server» modified to use IMAP instead of POP, all mail local

– configured with 5000 users» about 56 SMTP cxns/minute, 149 IMAP cxns/minute

• Setup– mail server: Linux, sendmail + UW imapd, 5000

users– undo proxy: Win2k, Sun 1.4.1 JDK– workload generator: Win2k, Sun 1.4.1 JDK– storage: all mailboxes and logs on NetApp Filer

» mailboxes accessed via NFS, logs via CIFS

Page 26: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 26

Session Length (seconds)0.01 0.1 1 10 100

Per

cen

tag

e o

f S

essi

on

s

0

20

40

60

80

100

IMAP, no undoIMAP, undoSMTP, no undoSMTP, undo

Overhead Results: Space and Time• Space overhead

– 5GB/day for timeline log= 1GB/day per 1000 users

– 325KB per 1000 mail folder name translations

– per 120GB disk:» 7 weeks of timeline for

1000 ISP users» or 350 million folder

name translations

• Time overhead– cumulative distribution

plot of IMAP and SMTP session lengths

Page 27: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 27

Outline

• Recap of “Three-R’s” Undo model

• Implementing Undo for e-mail

• Analysis of e-mail undo prototype

• Conclusions and future directions

Page 28: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 28

Next Steps

• Perform more analysis of prototype– speed of rewind, replay, entire undo cycle

• Measure efficacy of undo with human expts.– compare recovery time from pre-defined

scenarios with and without undo» based on survey of e-mail administrators

• Examine other applications to understand generality of undo architecture– auctions, e-commerce, others?

Page 29: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 29

Conclusions

• We can build a real Undo implementation– for a real application– with tolerable overhead– with a reusable, application-independent core– based on a verb model that enables reasoning

about state and undo behavior

• Look forward to final analysis at next retreat!

Page 30: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Building Undo for Operators:An Undoable E-mail Store

Aaron BrownROC Research Group

University of California, Berkeley

[email protected]://roc.cs.berkeley.edu/

Page 31: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 31

Backup Slides

Page 32: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 32

Summary: What’s in a Verb?

• Encapsulation of a user-service interaction– action specification– procedure to execute verb via proxy– tag: input parameters, signature of externalized data,

verb execution status

• Commutativity/independence test functions

• Consistency management functions– routine to check two tags for external inconsistency– inconsistency handler– squash routine

Page 33: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 33

Verbs and the Undo Timeline

• Recorded history of verbs must be consistent with actual verb execution– but logging at proxy may not match execution order

• Solution: partial serialization of verbs– undo manager tracks arriving & executing verbs in

scoreboard– arriving verbs that commute are executed in parallel– non-commuting verbs stall until dependencies clear

• Guarantees consistent recorded timeline without requiring hooks into application

Page 34: Building Undo for Operators: An Undoable E-mail Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.

Slide 34

Verb Issue: Naming State

• Verbs need to specify names of target state– folders, messages, users

• Names must be time-invariant and unaffected by changes to execution timeline– IMAP protocol names can’t guarantee this

• Solution: allocate “UndoIDs” to objects– UndoID never changes and is restored on replay– undo system maintains translations to IMAP names

» messages: embed UndoID in header field» folders: track UndoIDs in parallel database

Aaron Brown
Move to backup slides?