THE MAGAZINE OF USENIX amp SAGEAugust 2002 volume 27 bull number 5
The Advanced Computing Systems Association amp
The System Administrators Guild
amp
insideCONFERENCE REPORTS
USENIX 2002
62 Vol 27 No 5 login
conference reportsKEYNOTE ADDRESS
THE INTERNETrsquoS COMING SILENT SPRING
Lawrence Lessig Stanford University
Summarized by David E Ott
In a talk that received a standing ova-tion Lawrence Lessig pointed out therecent legal crisis that is stifling innova-tion by extending notions of privateownership of technology beyond rea-sonable limits
Several lessons from history are instruc-tive (1) Edwin Armstrong the creator of FM radio technology became anenemy to RCA which launched a legalcampaign to suppress the technology(2) packet switching networks proposedby Paul Baron were seen by ATampT as anew competing technology that had tobe suppressed (3) Disney took Grimmtales by then in the public domain andretold them in magically innovativeways Should building upon the past inthis way be considered an offense
ldquoThe truth is architectures can allowrdquothat is freedom for innovation can bebuilt into architectures Consider theInternet a simple core allows for anunlimited number of smart end-to-endapplications
Compare an ATampT proprietary core net-work to the Internetrsquos end-to-endmodel The number of innovators forthe former is one company while for thelatter itrsquos potentially the number of peo-ple connected to it Innovations forATampT are designed to benefit the ownerwhile the Internet is a wide-open boardthat allows all kinds of benefits to allkinds of groups The diversity of con-tributors in the Internet arena is stagger-ing
The auction of spectrum by the FCC isanother case in point The spectrum isusually seen as a fixed resource charac-terized by scarcity The courts have seenspectrum as something to be owned asproperty Technologists however haveshown that capacity can be a function of
OUR THANKS TO THE SUMMARIZERS
For the USENIX Annual Technical Conference
Josh Simon who organized the collecting of
the summaries in his usual flawless fashion
Steve Bauer
Florian Buchholz
Matt Butner
Pradipta De
Xiaobo Fan
Hai Huang
Scott Kilroy
Teri Lampoudi
Josh Lothian
Bosko Milekic
Juan Navarro
David E Ott
Amit Purohit
Brennan Reynolds
Matt Selsky
JD Welch
Li Xiao
Praveen Yalagandula
Haijin Yan
Gary Zacheiss
2002 USENIX Annual Technical ConferenceMONTEREY CALIFORNIA USAJUNE 10-15 2002ANNOUNCEMENTS
Summarized by Josh Simon
The 2002 USENIX Annual TechnicalConference was very exciting The gen-eral track had 105 papers submitted (up28 from 82 in 2001) and accepted 25(19 from students) the FREENIX trackhad 53 submitted (up from 52 in 2001)and accepted 26 (7 from students)
The two annual USENIX-given awardswere presented by outgoing USENIXBoard President Dan Geer The USENIXLifetime Achievement Award (also
known as theldquoFlamerdquo becauseof the shape ofthe award) wentto JamesGosling for hiscontributionsincluding thePascal compilerfor MulticsEmacs an early
SMP UNIX work on X11 and Sunrsquoswindowing system the first enscriptand Java The Software Tools UsersGroup (STUG) Award was presented tothe Apache Foundation and accepted byRasmus Lerdorf In addition to the well-known Web server Apache produces
Jakartamod_perlmod_tcl andXML parserwith over 80members in at least 15countries
James Gosling
Rasmus Lerdorf
architecture As David Reed arguescapacity can scale with the number ofusers ndash assuming an effective technologyand an open architecture
Developments over the last three yearsare disturbing and can be summarizedas two layers of corruption intellectual-property rights constraining technicalinnovation and proprietary hardwareplatforms constraining software innova-tion
Issues have been framed by the courtslargely in two mistaken ways (1) itrsquostheir property ndash a new technologyshouldnrsquot interfere with the rights ofcurrent technology owners (2) itrsquos justtheft ndash copyright laws should be upheldby suppressing certain new technologies
In fact we must reframe the debate fromldquoitrsquos their propertyrdquo to a ldquohighwaysrdquometaphor that acts as a neutral platformfor innovation without discriminationldquoTheftrdquo should be reframed as ldquoWaltDisneyrdquo who built upon works from thepast in richly creative ways that demon-strate the utility of allowing work toreach the public domain
In the end creativity depends upon thebalance between property and accesspublic and private controlled and com-mon access Free code builds a free cul-ture and open architectures are whatgive rise to the freedom to innovate Allof us need to become involved inreframing the debate on this issue asRachel Carsonrsquos Silent Spring points outan entire ecology can be undermined bysmall changes from within
INVITED TALKS
THE IETF OR WHERE DO ALL THOSE RFCS
COME FROM ANYWAY
Steve Bellovin ATampT Labs ndash Research
Summarized by Josh Simon
The Internet Engineering Task Force(IETF) is a standards body but not alegal entity consisting of individuals(not organizations) and driven by a con-sensus-based decision model Anyonewho ldquoshows uprdquo ndash be it at the thrice-
63October 2002 login
C
ON
FERE
NC
ERE
PORT
Sannual in-person meetings or on theemail lists for the various groups ndash canjoin and be a member The IETF is con-cerned with Internet protocols and openstandards not LAN-specific (such asAppletalk) or layer-1 or -2 (like copperversus fiber)
The organizational structure is looseThere are many working groups eachwith a specific focus within severalareas Each area has an area directorwho collectively form the Internet Engi-neering Steering Group (IESG) The sixpermanent areas are Internet (withworking groups for IPv6 DNS andICMP) Transport (TCP QoS VoIP andSCTP) Applications (mail some WebLDAP) Routing (OSPF BGP) Opera-tions and Management (SNMP) andSecurity (IPSec TLS SMIME) Thereare also two other areas SubIP is a tem-porary area for things underneath the IPprotocol stack (such as MPLS IP overwireless and traffic engineering) andtherersquos a General area for miscellaneousand process-based working groups
Internet Requests for Comments (RFCs)fall into three tracks Standard Informa-tional and Experimental Note that thismeans that not all RFCs are standardsThe RFCs in the Informational track aregenerally for proprietary protocols orApril first jokes those in the Experimen-tal track are results ideas or theories
The RFCs in the Standard track comefrom working groups in the variousareas through a time-consuming com-plex process Working groups are createdwith an agenda a problem statement anemail list some draft RFCs and a chairThey typically start out as a BoF sessionThe working group and the IESG makea charter to define the scope milestonesand deadlines the Internet AdvisoryBoard (IAB) ensures that the workinggroup proposals are architecturallysound Working groups are narrowlyfocused and are supposed to die off oncethe problem is solved and all milestonesachieved Working groups meet andwork mainly through the email list
though there are three in-person high-bandwidth meetings per year Howeverdecisions reached in person must be rat-ified by the mailing list since not every-body can get to three meetings per yearThey produce RFCs which go throughthe Standard track these need to gothrough the entire working group beforebeing submitted for comment to theentire IETF and then to the IESG MostRFCs wind up going back to the work-ing group at least once from the areadirector or IESG level
The format of an RFC is well-definedand requires it be published in plain 7-bit ASCII Theyrsquore freely redistributableand the IETF reserves the right ofchange control on all Standard-trackRFCs
The big problems the IETF is currentlyfacing are security internationalizationand congestion control Security has tobe designed into protocols from thestart Internationalization has shown usthat 7-bit-only ASCII is bad and doesnrsquotwork especially for those character setsthat require more than 7 bits (likeKanji) UTF-8 is a reasonable compro-mise But what about domain namesWhile not specified as requiring 7-bitASCII in the specifications most DNSapplications assume a 7-bit character setin the namespace This is a hard prob-lem Finally congestion control isanother hard problem since the Internetis not the same as a really big LAN
INTRODUCTION TO AIR TRAFFIC
MANAGEMENT SYSTEMS
Ron Reisman NASA Ames ResearchCenter Rob Savoye Seneca Software
Summarized by Josh Simon
Air traffic control is organized into fourdomains surface which runs out of theairport control tower and controls theaircraft on the ground (eg taxi andtakeoff) terminal area which covers air-craft at 11000 feet and below handledby the Terminal Radar Approach Con-trol (TRACON) facilities en routewhich covers between 11000 and 40000feet including climb descent and at-
USENIX 2002
altitude flight runs from the 20 AirRoute Traffic Control Centers (ARTCCpronounced ldquoartsyrdquo) and traffic flowmanagement which is the strategic armEach area has sectors for low high andvery-high flight Each sector has a con-troller team including one person onthe microphone and handles between12 and 16 aircraft at a time Since thenumber of sectors and areas is limitedand fixed therersquos limited system capac-ity The events of September 11 2001gave us a respite in terms of systemusage but based on path growth pat-terns the air traffic system will be over-subscribed within two to three yearsHow do we handle this oversubscrip-tion
Air Traffic Management (ATM) Deci-sion Support Tools (DST) use physicsaeronautics heuristics (expert systems)fuzzy logic and neural nets to help the(human) aircraft controllers route air-craft around The rest of the talk focusedon capacity issues but the DST alsohandle safety and security issues Thesoftware follows open standards (ISOPOSIX and ANSI) The team at NASAAmes made Center-TRACON Automa-tion System (CTAS) which is softwarefor each of the ARTCCs portable fromSolaris to HP-UX and Linux as wellUnlike just about every other major soft-ware project this one really is standardand portable co-presenter Rob Savoyehas experience in maintaining gcc onmultiple platforms and is the projectlead on the portability and standardsissues for the code CTAS allows theARTCCs to upgrade and enhance indi-vidual aspects or parts of the system itisnrsquot a monolithic all-or-nothing entitylike the old ATM systems
Some future areas of research include ahead-mounted augmented reality devicefor tower operators to improve their sit-uational awareness by automatinghuman factors and new digital globalpositioning system (DGPS) technologieswhich are accurate within inches insteadof feet
64 Vol 27 No 5 login
Questions centered around advancedavionics (eg getting rid of ground con-trol) cooperation between the US andEurope for software development (wersquoreworking together on software develop-ment but the various European coun-triesrsquo controllers donrsquot talk well to eachother) and privatization
ADVENTURES IN DNS
Bill Manning ISI
Summarized by Brennan Reynolds
Manning began by posing a simplequestion is DNS really a critical infra-structure The answer is not simple Per-haps seven years ago when the Internetwas just starting to become popular theanswer was a definite no But today withIPv6 being implemented and so manytransactions being conducted over theInternet the question does not have aclear-cut answer The Internet Engineer-ing Task Force (IETF) has several newmodifications to the DNS service thatmay be used to protect and extend itsusability
The first extension Manning discussedwas Internationalized Domain Names(IDN) To date all DNS records arebased on the ASCII character set butmany addresses in the Internetrsquos globalnetwork cannot be easily written inASCII characters The goal of IDN is toprovide encoding for hostnames that isfair efficient and allows for a smoothtransition from the current scheme Thework has resulted in two encodingschemes ACE and UTF-8 Each encod-ing is independent of the other but theycan be used together in various combi-nations Manning expressed his opinionthat while neither is an ideal solutionACE appears to be the lesser of two evilsA major hindrance getting IDN rolledout into the Internetrsquos DNS root struc-ture is the increase in zone file complex-ity
Manningrsquos next topic was the inclusionof IPv6 records In an IPv4 world it ispossible for administrators to rememberthe numeric representation of anaddress IPv6 makes the addresses too
long and complex to be easily remem-bered This means that DNS will play avital role in getting IPv6 deployed Sev-eral new resource records (RR) havebeen proposed to handle the translationincluding AAAA A6 DNAME and BIT-STRING Manning commented on thedifference between IPv4 and v6 as atransport protocol and that systemstuned for v4 traffic will suffer a perfor-mance hit when using v6 This is largelyattributed to the increase in data trans-mitted per DNS request
The final extension Manning discussedwas DNSSec He introduced this exten-sion as a mechanism that protects thesystem from itself DNSSec protectsagainst data spoofing and providesauthentication between servers Itincludes a cryptographic signature ofthe RR set to ensure authenticity andintegrity By signing the entire set theamount of computation is kept to aminimum The information itself isstored in a new RR within each zone fileon the DNS server
Manning briefly commented on the useof DNS to provide a PKI infrastructurestating that that was not the purpose ofDNS and therefore it should not be usedin that fashion The signing of the RRsets can be done hierarchically resultingin the use of a single trusted key at theroot of the DNS tree to sign all sets tothe leafs However the job of keyreplacement and rollover is extremelydifficult for a system that is distributedacross the globe
Manning stated that in an operation testbed with all of these extensions enabledthe packet size for a single queryresponse grew from 518 bytes to largerthan 18000 This results in a largeincrease of bandwidth usage for highvolume name servers and puts There-fore in Manningrsquos opinion not all ofthese features will be deployed in thenear future For those looking for moreinformation Billrsquos Web site can be foundat httpwwwisieduotdr
THE JOY OF BREAKING THINGS
Pat Parseghian Transmeta
Summarized by Juan Navarro
Pat Parseghian described her experiencein testing the Crusoe microprocessor atthe Transmeta Lab for Compatibility(TLC) where the motto is ldquoYou make it we break itrdquo and the goal is to makeengineers miserable
She first gave an overview of Crusoewhose most distinctive feature is a codemorphing layer that translates x86instructions into native VLIW Suchpeculiarity makes the testing processparticularly challenging because itinvolves testing two processors (theldquoexternalrdquo x86 and the ldquointernalrdquo VLIW)and also because of variations on howthe code-morphing layer works (it mayinterpret code the first few times it seesan instruction sequence and translateand save in a translation cache after-wards) There are also reproducibilityissues since the VLIW processor mayrun at different speeds to save energy
Pat then described the tests that theysubject the processor to (hardware com-patibility tests common applicationsoperating systems envelope-pushinggames and hard-to-acquire legacy appli-cations) and the issues that must be con-sidered when defining testing policiesShe gave some testing tips includingorganizational issues like trackingresources and results
To illustrate the testing process Pat gavea list of possible causes of a system crashthat must be investigated If it is the sili-con then it might be because it is dam-aged or because of a manufacturingproblem or a design flaw If the problemis in the code-morphing layer is thefault with the interpreter or with thetranslator The fault could also be exter-nal to the processor it could be thehomemade system BIOS a faulty main-board or an operator error Or Trans-meta might not be at fault at all thecrash might be due to a bug in the OS orthe application The key to pinpointing
65October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe cause of a problem is to isolate it byidentifying the conditions that repro-duce the problem and repeating thoseconditions in other test platforms andnon-Crusoe systems Then relevant fac-tors must be identified based on productknowledge past experience and com-mon sense
To conclude Pat suggested that some ofTLCrsquos testing lessons can be applied toproducts that the audience was involvedin and assured us that breaking things isfun
TECHNOLOGY LIBERTY FREEDOM AND
WASHINGTON
Alan Davidson Center for Democracyand Technology
Summarized by Steve Bauer
ldquoExperience should teach us to be moston our guard to protect liberty when theGovernmentrsquos purposes are beneficentMen born to freedom are naturally alertto repel invasion of their liberty by evil-minded rulers The greatest dangers toliberty lurk in insidious encroachmentby men of zeal well-meaning but with-out understandingrdquo ndash Louis BrandeisOlmstead v US
Alan Davison an associate director atthe CDT (httpwwwcdtorg) since1996 concluded his talk with this quoteIn many ways it aptly characterizes theimportance to the USENIX communityof the topics he covered The majorthemes of the talk were the impact oflaw on individual liberty system archi-tectures and appropriate responses bythe technical community
The first part of the talk provided theaudience with an overview of legislationeither already introduced or likely to beintroduced in the US Congress Thisincluded various proposals to protectchildren such as relegating all sexualcontent to a xxx domain or providingkid-safe zones such as kidsus Othersimilar laws discussed were the Chil-drenrsquos Online Protection Act and legisla-tion dealing with virtual child pornography
Other lower-profile pieces covered werelaws prohibiting false contact informa-tion in emails and domain registrationdatabases Similar proposals exist thatwould prohibit misleading subject linesin emails and require messages to clearlyidentify if they are advertisementsBriefly mentioned were laws impactingonline gambling
Bills and laws affecting the architecturaldesign of systems and networks and var-ious efforts to establish boundaries andborders in cyberspace were then dis-cussed These included the ConsumerBroadband and Television PromotionAct and the French cases against Yahooand its CEO for allowing the sale of Nazimaterials on the Yahoo auction site
A major part of the talk focused on thetechnological aspects of the USA-PatriotAct passed in the wake of the September11 attacks Topics included ldquopen regis-tersrdquo for the Internet expanded govern-ment power to conduct secret searchesroving wiretaps nationwide service ofwarrants sharing grand jury and intelli-gence information and the establish-ment of an identification system forvisitors to the US
The technological community needs tobe aware and care about the impact oflaw on individual liberty and the systemsthe community builds Specific sugges-tions included designing privacy andanonymity into architectures and limit-ing information collection particularlyany personally identifiable data to onlywhat is essential Finally Alan empha-sized that many people simply do notunderstand the implications of lawThus the technology community has aimportant role in helping make theseimplications clear
TAKING AN OPEN SOURCE PROJECT TO
MARKET
Eric Allman Sendmail Inc
Summarized by Scott Kilroy
Eric started the talk with the upsides anddownsides to open source software Theupsides include rapid feedback high-
USENIX 2002
quality work (mostly) lots of hands andan intensely engineering-drivenapproach The downsides include nosupport structure limited project mar-keting expertise and volunteers havinglimited time
In 1998 Sendmail was becoming a suc-cess disaster A success disaster includestwo key elements (1) volunteers startspending all their time supporting cur-rent functionality instead of enhancing(this heavy burden can lead to projectstagnation and eventually death)(2) competition for the better-fundedsources can lead to FUD (fear uncer-tainty and doubt) about the project
Eric then led listeners down the path hetook to turn Sendmail into a businessEric envisioned Sendmail Inc as a small-ish and close-knit company but soonrealized that the family atmosphere hedesired could not last as the companygrew He observed that maybe only thefirst 10 people will buy into your visionafter which each individual is primarilyinterested in something else (eg powerwealth or status) He warns that therewill be people working for you that youdo not like Eric particularly did not carefor sales people but he emphasized howimportant sales marketing finance andlegal people are to companies
The nature of companies in infancy canbe summed up as you need money andyou need investors in order to getmoney Investors want a return oninvestment so money managementbecomes critical Companies thereforemust optimize money functions Ericrsquosexperience with investors were lessonsall in themselves More than givingmoney good investors can provide con-nections and insight so you donrsquot wantinvestors who donrsquot believe in you
A company must have bug historychange management and people whounderstand the product and have a senseof the target market Eric now sees theimportance of marketing targetresearch A company can never forget
66 Vol 27 No 5 login
the simple equation that drives businessprofit = revenue - expense
Finally the concept of value should bebased on what is valuable to the cus-tomer Eric observed that his customersneeded documentation extra value inthe product that they couldnrsquot get else-where and technical support
Eric learned hard lessons along the wayIf you want to do interesting opensource it might be best not to be toosuccessful You donrsquot want to let thelarger dragons (companies) notice youIf you want a commercial user base youhave to manage process watch corpo-rate culture provide value to customerswatch bottom lines and develop a thickskin
Starting a company is easy but perpetu-ating it is extremely difficult
INFORMATION VISUALIZATION FOR
SYSTEMS PEOPLE
Tamara Munzner University of BritishColumbia
Summarized by JD Welch
Munzner presented interesting evidencethat information visualization throughinteractive representations of data canhelp people to perform tasks more effec-tively and reduce the load on workingmemory
A popular technique in visualizing datais abstracting it through node-link rep-resentation ndash for example a list of cross-references in a book Representing therelationships between references in agraphical (node-link) manner ldquooffloadscognition to the human perceptual sys-temrdquo These graphs can be producedmanually but automated drawing allowsfor increased complexity and quickerproduction times
The way in which people interpret visualdata is important in designing effectivevisualization systems Attention to pre-attentive visual cues are key to successPicking out a red dot among a field ofidentically shaped blue dots is signifi-cantly faster than the same data repre-
sented in a mixture of hue and shapevariations There are many pre-attentivevisual cues including size orientationintersection and intensity The accuracyof these cues can be ranked with posi-tion triggering high accuracy and coloror density on the low end
Visual cues combined with differentdata types ndash quantitative (eg 10inches) ordered (small medium large)and categorical (apples oranges) ndash canalso be ranked with spatial positionbeing best for all types Beyond thiscommonality however accuracy variedwidely with length ranked second forquantitative data eighth for ordinal andninth for categorical data types
These guidelines about visual perceptioncan be applied to information visualiza-tion which unlike scientific visualiza-tion focuses on abstract data and choiceof specialization Several techniques areused to clarify abstract data includingmulti-part glyphs where changes inindividual parts are incorporated into aneasier to understand gestalt interactiv-ity motion and animation In large datasets techniques like ldquofocus + contextrdquowhere a zoomed portion of the graph isshown along with a thumbnail view ofthe entire graph are used to minimizeuser disorientation
Future problems include dealing withhuge databases such as the HumanGenome reckoning dynamic data likethe changing structure of the Web orreal-time network monitoring andtransforming ldquopixel boundrdquo displays intolarge ldquodigital wallpaperrdquo-type systems
FIXING NETWORK SECURITY BY HACKING
THE BUSINESS CLIMATE
Bruce Schneier Counterpane InternetSecurity
Summarized by Florian Buchholz
Bruce Schneier identified security as oneof the fundamental building blocks ofthe Internet A certain degree of securityis needed for all things on the Net andthe limits of security will become limitsof the Net itself However companies are
hesitant to employ security measuresScience is doing well but one cannot seethe benefits in the real world Anincreasing number of users means moreproblems affecting more people Oldproblems such as buffer overflowshavenrsquot gone away and new problemsshow up Furthermore the amount ofexpertise needed to launch attacks isdecreasing
Schneier argues that complexity is theenemy of security and that while secu-rity is getting better the growth in com-plexity is outpacing it As security isfundamentally a people problem oneshouldnrsquot focus on technologies butrather should look at businesses busi-ness motivations and business costsTraditionally one can distinguishbetween two security models threatavoidance where security is absoluteand risk management where security isrelative and one has to mitigate the riskwith technologies and procedures In thelatter model one has to find a pointwith reasonable security at an acceptablecost
After a brief discussion on how securityis handled in businesses today in whichhe concluded that businesses ldquotalk bigabout it but do as little as possiblerdquoSchneier identified four necessary stepsto fix the problems
First enforce liabilities Today no realconsequences have to be feared fromsecurity incidents Holding peopleaccountable will increase the trans-parency of security and give an incentiveto make processes public As possibleoptions to achieve this he listed indus-try-defined standards federal regula-tion and lawsuits The problems withenforcement however lie in difficultiesassociated with the international natureof the problem and the fact that com-plexity makes assigning liabilities diffi-cult Furthermore fear of liability couldhave a chilling effect on free softwaredevelopment and could stifle new com-panies
67October 2002 login
C
ON
FERE
NC
ERE
PORT
SAs a second step Schneier identified theneed to allow partners to transfer liabil-ity Insurance would spread liability riskamong a group and would be a CEOrsquosprimary risk analysis tool There is aneed for standardized risk models andprotection profiles and for more securi-ties as opposed to empty press releasesSchneier predicts that insurance will cre-ate a marketplace for security where cus-tomers and vendors will have the abilityto accept liabilities from each otherComputer-security insurance shouldsoon be as common as household or fireinsurance and from that developmentinsurance will become the driving factorof the security business
The next step is to provide mechanismsto reduce risks both before and aftersoftware is released techniques andprocesses to improve software quality aswell as an evolution of security manage-ment are therefore needed Currentlysecurity products try to rebuild the wallsndash such as physical badges and entrydoorways ndash that were lost when gettingconnected Schneier believes thisldquofortress metaphorrdquo is bad one shouldthink of the problem more in terms of acity Since most businesses cannot affordproper security outsourcing is the onlyway to make security scalable With out-sourcing the concenpt of best practicesbecomes important and insurance com-panies can be tied to them outsourcingwill level the playing field
As a final step Schneier predicts thatrational prosecution and education willlead to deterrence He claims that peoplefeel safe because they live in a lawfulsociety whereas the Internet is classifiedas lawless very much like the AmericanWest in the 1800s or a society ruled bywarlords This is because it is difficult toprove who an attacker is prosecution ishampered by complicated evidencegathering and irrational prosecutionSchneier believes however that educa-tion will play a major role in turning theInternet into a lawful society Specifi-cally he pointed out that we need lawsthat can be explained
Risks wonrsquot go away the best we can dois manage them A company able tomanage risk better will be more prof-itable and we need to give CEOs thenecessary risk-management tools
There were numerous questions Listedbelow are the more complicated ones ina paraphrased QampA format
Q Will liability be effective will insur-ance companies be willing to accept therisks A The government might have tostep in it needs to be seen how it playsout
Q Is there an analogy to the real worldin the fact that in a lawless society onlythe rich can afford security Securitysolutions differentiate between classesA Schneier disagreed with the state-ment giving an analogy to front-doorlocks but conceded that there might bespecial cases
Q Doesnrsquot homogeneity hurt securityA Homogeneity is oversold Diversetypes can be more survivable but giventhe limited number of options the dif-ference will be negligible
Q Regulations in airbag protection haveled to deaths in some cases How can wekeep the pendulum from swinging theother way A Lobbying will not be pre-vented An imperfect solution is proba-ble there might be reversals ofrequirements such as the airbag one
Q What about personal liability A Thiswill be analogous to auto insurance lia-bility comes with computerNet access
Q If the rule of law is to become realitythere must be a law enforcement func-tion that applies to a physical space Youcannot do that with any existing govern-ment agency for the whole worldShould an organization like the UNassume that role A Schneier was notconvinced global enforcement is possi-ble
Q What advice should we take awayfrom this talk A Liability is comingSince the network is important to ourinfrastructure eventually the problems
USENIX 2002
will be solved in a legal environmentYou need to start thinking about how tosolve the problems and how the solu-tions will affect us
The entire speech (including slides) isavailable at httpwwwcounterpanecompresentation4pdf
LIFE IN AN OPEN SOURCE STARTUP
Daryll Strauss Consultant
Summarized by Teri Lampoudi
This talk was packed with morsels ofinsight and tidbits of information aboutwhat life in an open source startup islike (though ldquowas likerdquo might be moreappropriate) what the issues are instarting maintaining and commercial-izing an open source project and theway hardware vendors treat open sourcedevelopers Strauss began by tracing thetimeline of his involvement with devel-oping 3-D graphics support for Linuxfrom the 3dfx voodoo1 driver for Linuxin 1997 to the establishment of Preci-sion Insight in 1998 to its buyout by VALinux to the dismantling of his groupby VA to what the future may hold
Obviously there are benefits from doingopen source development both for thedeveloper and for the end user Animportant point was that open sourcedevelops entire technologies not justproducts A misapprehension is theinherent difficulty of managing a projectthat accepts code from a large number ofdevelopers some volunteer and somepaid The notion that code can just bethrown over the wall into the world is aBig Lie
How does one start and keep afloat anopen source development company Inthe case of Precision Insight the subjectmatter ndash developing graphics andOpenGL for Linux ndash required a consid-erable amount of expertise intimateknowledge of the hardware the librariesX and the Linux kernel as well as theend applications Expertise is mar-ketable What helps even further is hav-ing a visible virtual team of expertspeople who have an established trackrecord of contributions to open source
68 Vol 27 No 5 login
in the particular area of expertise Sup-port from the hardware and softwarevendors came next While everyone waswilling to contract PI to write the por-tions of code that were specific to theirhardware no one wanted to pay the billfor developing the underlying founda-tion that was necessary for the drivers tobe useful In the PI model the infra-structure cost was shared by several cus-tomers at once and the technology keptmoving Once a driver is written for avendor the vendor is perfectly capableof figuring out how to write the nextdriver necessary thereby obviating theneed to contract the job This and ques-tions of protecting intellectual propertyndash hardware design in this case ndash aredeeply problematic with the open sourcemode of development This is not to saythat there is no way to get over thembut that they are likely to arise moreoften than not
Management issues seem equally impor-tant In the PI setting contracts wereflexible ndash both a good and a bad thing ndashand developers overworked Furtherdevelopment ldquoin a fishbowlrdquo that isunder public scrutiny is not easyStrauss stressed the value of good com-munication and the use of various out-of-sync communication methods likeIRC and mailing lists
Finally Strauss discussed what portionsof software can be free and what can beproprietary His suggestion was that hor-izontal markets want to be free whereasvertically one can develop proprietarysolutions The talk closed with a smallstroll through the problems that projectpolitics brings up from things likemerging branches to mailing list andsource tree readability
GENERAL TRACK SESSIONS
FILE SYSTEMS
Summarized by Haijin Yan
STRUCTURE AND PERFORMANCE OF THE
DIRECT ACCESS FILE SYSTEM
Kostas Magoutis Salimah AddetiaAlexandra Fedorova and Margo ISeltzer Harvard University JeffreyChase Andrew Gallatin Richard Kisleyand Rajiv Wickremesinghe Duke Uni-versity and Eran Gabber Lucent Tech-nologies
This paper won the Best Paper award
The Direct Access File System (DAFS) isa new standard for network-attachedstorage over direct-access transport net-works DAFS takes advantage of user-level network interface standards thatenable client-side functionality in whichremote data access resides in a libraryrather than in the kernel This reducesthe overhead of memory copy for datamovement and protocol overheadRemote Direct Memory Access (RDMA)is a direct access transport network thatallows the network adapter to reducecopy overhead by accessing applicationbuffers directly
This paper explores the fundamentalstructural and performance characteris-tics of network file access using a user-level file system structure on adirect-access transport network withRDMA It describes DAFS-based clientand server reference implementationsfor FreeBSD and reports experimentalresults comparing DAFS to a zero-copyNFS implementation It illustrates thebenefits and trade-offs of these tech-niques to provide a basis for informedchoices about deployment of DAFS-based systems and similar extensions toother network file protocols such asNFS Experiments show that DAFS givesapplications direct control over an IOsystem and increases the client CPUrsquosusage while the client is doing IOFuture work includes how to addresslongstanding problems related to theintegration of the application and filesystem for high-performance applica-tions
CONQUEST BETTER PERFORMANCE THROUGH
A DISKPERSISTENT-RAM HYBRID FILE
SYSTEM
An-I A Wang Peter Reiher and GeraldJ Popek UCLA Geoffrey M Kuenning Harvey Mudd College
Motivated by the declining cost of per-sistent RAM the authors propose theConquest file system which stores allsmall files metadata executables andshared libraries in persistent RAM diskshold only the data content of remaininglarge files Compared to alternativessuch as caching and RAM file systemsConquest has the advantages of effi-ciency consistency and reliability at areduced cost Using popular bench-marks experiments show that Conquestincurs little overhead while achievingfaster performance Future workincludes designing mechanisms foradjusting file-size threshold dynamicallyand finding a better disk layout for largedata blocks
EXPLOITING GRAY-BOX KNOWLEDGE OF
BUFFER-CACHE MANAGEMENT
Nathan C Burnett John Bent AndreaC Arpaci-Dusseau and Remzi HArpaci-Dusseau University of Wisconsin Madison
Knowing what algorithm is used tomanage the buffer cache is very impor-tant for improving application perfor-mance However there is currently nointerface for finding this algorithm Thispaper introduces Dust a simple finger-printing tool that is able to identify thebuffer-cache replacement policy Dustautomatically identifies the cache sizeand replacement policy based on theconfiguring attributes of access ordersrecency frequency and long-term his-tory Through simulation Dust was ableto distinguish between a variety ofreplacement algorithm policies found inthe literature FIFO LRU LFU ClockSegmented FIFO 2Qand LRU-K Fur-ther experiments of fingerprinting realoperating system such as NetBSDLinux and Solaris show that Dust is ableto identify the hidden cache replacementalgorithm
69October 2002 login
C
ON
FERE
NC
ERE
PORT
SBy knowing the underlying cachereplacement policy a cache-aware Webserver can reschedule the requests on acached request-first policy to obtain per-formance improvement Experimentsshow that cache-aware schedulingimproves average response time and sys-tem throughput
OPERATING SYSTEMS
(AND DANCING BEARS)
Summarized by Matt Butner
THE JX OPERATING SYSTEM
Michael Golm Meik Felser ChristianWawersich and Juumlrgen Kleinoumlder University of Erlagen-Nuumlrnberg
The talk opened by emphasizing theneed for and the practicality of a func-tional Java OS Such implementation inthe JX OS attempts to mimic the recenttrend toward using highly abstractedlanguages in application development inorder to create an OS as functional andpowerful as one written in a lower-levellanguage
The JX is a micro-kernel solution thatuses separate JVMs for each entity of thekernel and in some cases for eachapplication The separated domains donot share objects and have no threadmigration and each domain implementsits own code The interesting part of thepresentation was the discussion of sys-tem-level programming with Java keyareas discussed were memory manage-ment and interrupt handlers Theauthors concluded by noting that theirtype-safe and modular system resultedin a robust system with great configura-tion flexibility and acceptable perfor-mance
DESIGN EVOLUTION OF THE EROS
SINGLE-LEVEL STORE
Jonathan S Shapiro Johns HopkinsUniversity Jonathan Adams Universityof Pennsylvania
This presentation outlined current char-acteristics of file systems and some oftheir least desirable characteristics Itwas based on the revival of the EROS
single-level-store approach for currentattempts to capitalize on system consis-tency and efficiency Their solution didnot relate directly to EROS but ratherto the design and use of the constructsand notions on which EROS is based Byextending the mapping of the memorysystem to include the disk systems arefurther able to ensure global persistencewithout regard for a disk structure Inexplaining the need for absolute systemconsistency and an environment whichby definition is not allowed to crash thegoal of having such an exhaustive designbecomes clear The cool part of thiswork is the availability of its design andarchitecture to the public
THINK A SOFTWARE FRAMEWORK FOR
COMPONENT-BASED OPERATING SYSTEM
KERNELS
Jean-Philippe Fassino France TeacuteleacutecomRampD Jean-Bernard Stefani INRIA JuliaLawall DIKU Gilles Muller INRIA
This presentation discussed the need forcomponent-based operating systemsand respective structures to ensure flexi-bility for arbitrary-sized systems Thinkprovides a binding model that mapsuniformed components for OS develop-ers and architects to follow ensuringconsistent implementations for an arbi-trary system size However the goal isnot to force developers into a predefinedkernel but to promote the use of certaincomponents in varied ways
Thinkrsquos concentration is primarily onembedded systems where short devel-opment time is necessary but is con-strained by rigorous needs and limitedresources The need to build flexible sys-tems in such an environment can becostly and implementation-specific butThink creates an environment sup-ported by the ability to dynamically loadtype-safe components This allows formore flexible systems that retain func-tionality because of Thinkrsquos ability tobind fine-grained components Bench-marks revealed that dedicated micro-kernels can show performanceimprovements in comparison to mono-
USENIX 2002
lithic kernels Notable future workincludes building components for low-end appliances and the development of areal-time OS component library
BUILDING SERVICES
Summarized by Pradipta De
NINJA A FRAMEWORK FOR NETWORK
SERVICES
J Robert von Behren Eric A BrewerNikita Borisov Michael Chen MattWelsh Josh MacDonald Jeremy Lauand David Culler University of Califor-nia Berkeley
Robert von Behren presented Ninja anongoing project that aims to provide aframework for building robust and scal-able Internet-based network serviceslike Web-hosting instant-messagingemail and file-sharing applicationsRobert drew attention to the difficultiesof writing cluster applications One hasto take care of data consistency andissues of concurrency and for robustapplications there are problems relatedto fault tolerance Ninja works as awrapper to relieve the user of theseproblems The goal of the project isbuilding network services that are scala-ble and highly available maintainingpersistent data providing gracefuldegradation and supporting online evo-lution
The use of clusters distinguishes thissetup from generic distributed systemsin terms of reliability and security aswell as providing a partition-free net-work The next important feature of thisproject is the new programming modelwhich is more restrictive than a generalprogramming model but still expressiveenough to write most of the applica-tions This model described as ldquosingleprogram multiple connectionrdquo usesintertask parallelism instead of multi-threaded concurrency and is character-ized by all nodes running the sameprogram with connections delegated tothe nodes by a centralized connectionmanager (CM) A CM takes care of hid-ing the details of mapping an externalconnection to an internal node Ninja
70 Vol 27 No 5 login
also uses a design called ldquostaged event-driven architecturerdquo where each serviceis broken down into a set of stages con-nected by event queues This architec-ture is suitable for a modular design andhelps in graceful degradation by adap-tive load shedding from the eventqueues
The presentation concluded with exam-ples of implementation and evaluationof a Web server and an email systemcalled NinjaMail to show the ease ofauthoring and the efficacy of using theNinja framework for service develop-ment
USING COHORT-SCHEDULING TO ENHANCE
SERVER PERFORMANCE
James R Larus and Michael ParkesMicrosoft Research
James Larus presented a new schedulingpolicy for increasing the throughput ofserver applications Cohort schedulingbatches execution of similar operationsarising in different server requests Theusual programming paradigm in han-dling server requests is to spawn multi-ple concurrent threads and switch fromone thread to another whenever a threadgets blocked for IO Since the threads ina server mostly execute unrelated piecesof code the locality of reference isreduced hence the effectiveness of dif-ferent caching mechanisms One way tosolve this problem is to throw in morehardware But Larus presented a com-plementary view where the programbehavior is investigated and used toimprove the performance
The problem in this scheme is to iden-tify pieces of code that can be batchedtogether for processing One simple wayis to look at the next program countervalues and use them to club threadstogether However this talk presented anew programming abstraction ldquostagedcomputationrdquo which replaces the threadmodel with ldquostagesrdquo A stage is anabstraction for operations with similarbehavior and locality The StagedServerlibrary can be used for programming inthis model It is a collection of C++
classes that implement staged computa-tion and cohort scheduling on either auniprocessor or multiprocessor It canbe used to define stages
The presentation showed the experi-mental evaluation of the cohort schedul-ing over the thread-based model byimplementing two servers a Web serverwhich is mainly IO bound and a pub-lish-subscribe server which is mainlycompute bound The SURGE bench-mark was used for the first experimentand the Fabret workload for the secondThe results showed that cohort-schedul-ing-based implementation gave a betterthroughput than the thread-basedimplementation at high loads
NETWORK PERFORMANCE
Summarized by Xiaobo Fan
ETE PASSIVE END-TO-END INTERNET SERVICE
PERFORMANCE MONITORING
Yun Fu Amin Vahdat Duke UniversityLudmila Cherkasova Wenting Tang HPLabs
This paper won the Best Student Paperaward
Ludmila Cherkasova began by listingseveral questions most Web serviceproviders want answered in order toimprove service quality She reviewedthe difficulties of making accurate andefficient end-to-end Web service mea-surement and the shortcomings of cur-rently available approaches Theypropose a passive trace-based architec-ture called EtE to monitor Web serverperformance on behalf of end users
The first step is to collect network pack-ets passively The second module recon-structs all TCP connections and extractsHTTP transactions To reconstruct Webpage accesses they first build a knowl-edge base indexed by client IP and URLand then group objects to the relatedWeb pages they are embedded in Statis-tical analysis is used to handle non-matched objects EtE Monitor cangenerate three groups of metrics to mea-sure Web service performance responsetime Web caching and page abortion
To demonstrate the benefits of EtE mon-itor Cherkasova talked about two casestudies and based on the calculatedmetrics gave some insightful explana-tions about whatrsquos happening behind the variations of Web performanceThrough validation they claim theirapproach provides a very close approxi-mation to the real scenario
THE PERFORMANCE OF REMOTE DISPLAY
MECHANISMS FOR THIN-CLIENT COMPUTING
S Jae Yang Jason Nieh Matt SelskyNikhil Tiwari Columbia University
Noting the trend toward thin-clientcomputing the authors compared dif-ferent techniques and design choices inmeasuring the performance of six popu-lar thin-client platforms ndash CitrixMetaFrame Microsoft Terminal Ser-vices Sun Ray Tarantella VNC and XAfter pointing out several challenges inbenchmarking thin clients Yang pro-posed slow-motion benchmarking toachieve non-invasive packet monitoringand consistent visual quality Basicallythey insert delays between separatevisual events in the benchmark applica-tions of the server side so that theclientrsquos display update can catch up withthe serverrsquos processing speed The exper-iments are conducted on an emulatednetwork over a range of network band-widths
Their results show that thin clients canprovide good performance for Webapplications in LAN environments butonly some platforms performed well forvideo benchmark Pixel-based encodingmay achieve better performance andbandwidth efficiency than high-levelgraphics Display caching and compres-sion should be used with care
A MECHANISM FOR TCP-FRIENDLY
TRANSPORT-LEVEL PROTOCOL
COORDINATION
David E Ott and Ketan Mayer-PatelUniversity of North Carolina ChapelHill
A revised transport-level protocol opti-mized for cluster-to-cluster (C-to-C)
71October 2002 login
C
ON
FERE
NC
ERE
PORT
Sapplications is what David Ott tries toexplain in his talk A C-to-C applicationclass is identified as one set of processescommunicating to another set ofprocesses across a common Internetpath The fundamental problem with C-to-C applications is how to coordinateall C-to-C communication flows so thatthey share a consistent view of the com-mon C-to-C network adapt to changingnetwork conditions and cooperate tomeet specific requirements Aggregationpoints (AP) are placed at the first andlast hop of the common data path toprobe network conditions (latencybandwidth loss rate etc) To carry andtransfer this information a new protocolndash Coordination Protocol (CP) ndash isinserted between the network layer (IP)and the transport layer (TCP UCP etc)Ott illustrated how the CP header isupdated and used when packets origi-nate from source and traverse throughlocal and remote AP to arrive at theirdestination and how AP maintains aper-cluster state table and detects net-work conditions Through simulationresults this coordination mechanismappears effective in sharing commonnetwork resources among C-to-C com-munication flows
STORAGE SYSTEMS
Summarized by Praveen Yalagandula
MY CACHE OR YOURS MAKING STORAGE
MORE EXCLUSIVE
Theodore M Wong Carnegie MellonUniversity John Wilkes HP Labs
Theodore Wong explained the ineffi-ciency of current ldquoinclusiverdquo cachingschemes in storage area networks ndashwhen a client accesses a block the blockis read from the disk and is cached atboth the disk array cache and at theclientrsquos cache Then he presented theconcept of ldquoexclusiverdquo caching where ablock accessed by a client is only cachedin that clientrsquos cache On eviction fromthe clientrsquos cache they have come upwith a DEMOTE operation to move thedata block to the tail of the array cache
The new exclusion caching schemes areevaluated on both single-client and mul-tiple-client systems The single-clientresults are presented for two differenttypes of synthetic workloads Random(transaction-processing type workloads)and Zipf (typical Web workloads) Theexclusive policy was quite effective inachieving higher hit rates and lower readlatencies They also showed a 22 timeshit-rate improvement over inclusivetechniques in the case of a real-lifeworkload httpd with a single-client set-ting
The DEMOTE scheme also performedwell in the case of multiple-client sys-tems when the data accessed by clients isdisjointed In the case of shared dataworkloads this scheme performed worsethan the inclusive schemes Theodorethen presented an adaptive exclusivecaching scheme ndash a block accessed by aclient is placed at the tail in the clientrsquoscache and also in the array cache at anappropriate place determined by thepopularity of the block The popularityof the block is measured by maintaininga ghost cache to accumulate the numberof times each block is accessed This newadaptive scheme has achieved a maxi-mum of 52 speedup in the meanlatency in the experiments with real-lifeworkloads
For more information visit httpwwwcscmuedu~tmwongresearch and httpwwwhplhpcomresearchitccslssp
BRIDGING THE INFORMATION GAP IN
STORAGE PROTOCOL STACKS
Timothy E Denehy Andrea C Arpaci-Dusseau Remzi H Arpaci-DusseauUniversity of Wisconsin Madison
Currently there is a huge informationgap between storage systems and file sys-tems The interface exposed by storagesystems to file systems based on blocksand providing only readwrite interfacesis very narrow This leads to poor per-formance as a whole because of dupli-cated functionality in both systems and
USENIX 2002
reduced functionality resulting fromstorage systemsrsquo lack of file information
The speaker presented two enhance-ments ExRaid an exposed RAID andILFS Informed Log-Structured File Sys-tem ExRAID is an enhancement to theblock-based RAID storage system thatexposes the following three types ofinformation to the file system (1)regions ndash contiguous portions of theaddress space comprising one or multi-ple disks (2) performance informationabout queue lengths and throughput ofthe regions revealing disk heterogeneityto the file systems and (3) failure infor-mation ndash dynamically updated informa-tion conveying the number of tolerablefailures in each region
ILFS allows online incremental expan-sion of the storage space performsdynamic parallelism using ExRAIDrsquosperformance information to performwell on heterogeneous storage systemsand provides a range of different mecha-nisms with different granularities forredundancy of files using ExRAIDrsquosregion failure characteristics These newfeatures are added to LFS with only a19 increase in the code size
More informationhttpwwwcswisceduwind
MAXIMIZING THROUGHPUT IN REPLICATED
DISK STRIPING OF VARIABLE
BIT-RATE STREAMS
Stergios V Anastasiadis Duke Univer-sity Kenneth C Sevcik and MichaelStumm University of Toronto
There is an increasing demand for thecontinuous real-time streaming ofmedia files Disk striping is a commontechnique used for supporting manyconnections on the media servers Buteven with 12 million hours of meantime between failures there will be morethan one disk failure per week with 7000disks Fault tolerance can be achieved byusing data replication and reservingextra bandwidth during normal opera-tion This work focused on supportingthe most common variable bit-ratestream formats (eg MPEG)
72 Vol 27 No 5 login
The authors used prototype mediaserver EXEDRA to implement and eval-uate different replication and bandwidthreservation schemes This system sup-ports a variable bit-rate scheme doesstride-based disk allocation and is capa-ble of supporting variable-grain diskstriping Two replication policies arepresented deterministic ndash data blocksare replicated in a round-robin fashionon the secondary replicas random ndashdata blocks are replicated on the ran-domly chosen secondary replicas Twobandwidth reservation techniques arealso presented mirroring reservation ndashdisk bandwidth is reserved for both primary and replicas of the media fileduring the playback and minimumreservation ndash a more efficient scheme inwhich bandwidth is reserved only for thesum of primary data access time and themaximum of the backup data accesstimes required in each round
Experimental results showed that deter-ministic replica placement is better thanrandom placement for small disks mini-mum disk bandwidth reservation istwice as good as mirroring in thethroughput achieved and fault tolerancecan be achieved with a minimal impacton the throughput
TOOLS
Summarized by Li Xiao
SIMPLE AND GENERAL STATISTICAL PROFILING
WITH PCT
Charles Blake and Steven Bauer MIT
This talk introduced a Profile CollectionToolkit (PCT) ndash a sampling-based CPUprofiling facility A novel aspect of PCTis that it allows sampling of semanticallyrich data such as function call stacksfunction parameters local or globalvariables CPU registers or other execu-tion contexts The design objectives ofPCT were driven by user needs and theinadequacies or inaccessibility of priorsystems
The rich data collection capability isachieved via a debugger-controller pro-gram dbct1 The talk then introducedthe profiling collector and data collec-
tion activation PCT file format optionsprovide flexibility and various ways tostore exact states PCT also has analysisoptions two examples were given ndashldquoSimplerdquo and ldquoMixed User and LinuxCoderdquo
The talk then went into how generalsampling helps in the following areas adebugger-controller state machineexample script fragments a simpleexample program and general valuereports in terms of call-path histogramsnumeric value histograms and data gen-erality Related work such as gprof andExpect was then summarized The con-tribution of this work is to provide aportable general value sampling toolThe limitations are that PCT does notsupport much of strip and count inac-curacies happen because of its statisticalnature Based on this limitation -g ispreferred over strip
Possible future directions included sup-porting more debuggers such as dbxand xdb script generators for otherkinds of traces more canned reports forgeneral values and a libgdb-basedlibrary sampler PCT is available athttppdoslcsmitedupct PCT runs onalmost any UNIX-like system
ENGINEERING A DIFFERENCING AND
COMPRESSION DATA FORMAT
David G Korn and Kiem Phong VoATampT Laboratories ndash Research
This talk began with an equation ldquoDif-ferencing + Compression = Delta Com-pressionrdquo Compression removesinformation redundancy in a data setExamples are gzip bzip compress andpack Differencing encodes differencesbetween two data sets Examples are diff -e fdelta bdiff and xdelta Deltacompression compresses a data set givenanother combining differencing andcompression and reduces to pure com-pression when therersquos no commonality
After presenting an overall scheme for adelta compressor and showing deltacompression performance this talk dis-cussed the encoding format of a newlydesigned Vcdiff for delta compression A
Vcdiff instruction code table consists of256 entries of each coding up to a pair ofinstructions and recodes I-byte indicesand any additional data
The talk then showed Vcdiff rsquos perfor-mance with Web data collected fromCNN compared with W3C standardGdiff Gdiff+gzip and diff+gzip whereGdiff was computed using Vcdiff deltainstructions The results of two experi-ments ldquoFirstrdquo and ldquoSuccessiverdquo werepresented In ldquoFirstrdquo each file is com-pressed against the first file collected inldquoSuccessiverdquo each file is compressedagainst the one in the previous hourThe diff+gzip did not work well becausediff was line-oriented Vcdiff performedfavorably compared with other formats
Vcdiff is part of the Vcodex packageVcodex is a platform for all commondata transformations delta compres-sion plain compression encryptiontranscoding (eg uuencode base64) Itis structured in three layers for maxi-mum usability Base library uses Dis-plines and Methods interfaces
The code for Vcdiff can be found athttpwwwresearchattcomswtoolsThey are moving Vcdiff to the IETFstandard as a comprehensive platformfor transforming data Please refer tohttpwwwietforginternet-draftdraft-korn-vcdiff06txt
WHERE IN THE NET
Summarized by Amit Purohit
A PRECISE AND EFFICIENT EVALUATION OF
THE PROXIMITY BETWEEN WEB CLIENTS AND
THEIR LOCAL DNS SERVERS
Zhuoqing Morley Mao University ofCalifornia Berkeley Charles D CranorFred Douglis Michael RabinovichOliver Spatscheck and Jia Wang ATampTLabs ndash Research
Content Distribution Networks (CDNs)attempt to improve Web performance bydelivering Web content to end usersfrom servers located at the edge of thenetwork When a Web client requestscontent the CDN dynamically chooses aserver to route the request to usually the
73October 2002 login
C
ON
FERE
NC
ERE
PORT
Sone that is nearest the client As CDNshave access only to the IP address of thelocal DNS server (LDNS) of the clientthe CDNrsquos authoritative DNS servermaps the clientrsquos LDNS to a geographicregion within a particular network andcombines that with network and serverload information to perform CDNserver selection
This method has two limitations First itis based on the implicit assumption thatclients are close to their LDNS This maynot always be valid Second a singlerequest from an LDNS can represent adiffering number of Web clients ndash calledthe hidden load factor Knowledge of thehidden load factor can be used toachieve better load distribution
The extent of the first limitation and itsimpact on the CDN server selection isdealt with To determine the associationsbetween clients and their LDNS a sim-ple non-intrusive and efficient map-ping technique was developed The datacollected was used to study the impact ofproximity on DNS-based server selec-tion using four different proximity met-rics (1) autonomous system (AS)clustering ndash observing whether a client isin the same AS as its LDNS ndash concludedthat LDNS is very good for coarse-grained server selection as 64 of theassociations belong to the same AS(2) network clustering ndash observingwhether a client is in the same network-aware cluster (NAC) ndash implied that DNSis less useful for finer-grained serverselection since only 16 of the clientand LDNS are in the same NAC (3)traceroute divergence ndash the length of thedivergent paths to the client and itsLDNS from a probe point using tracer-oute ndash implies that most clients aretopologically close to their LDNS asviewed from a randomly chosen probesite (4) round-trip time (RTT) correla-tion (some CDNs select severs based onRTT between the CDN server and theclientrsquos LDNS) examines the correlationbetween the message RTTs from a probepoint to the client and its local DNS
results imply this is a reasonable metricto use to avoid really distant servers
A study of the impact that client-LDNSassociations have on DNS-based serverselection concludes that knowing theclientrsquos IP address would allow moreaccurate server selection in a large num-ber of cases The optimality of the serverselection also depends on the serverdensity placement and selection algo-rithms
Further information can be found athttpwwweecsberkeleyedu~zmaomyresearchhtml or by contactingzmaoeecsberkeleyedu
GEOGRAPHIC PROPERTIES OF INTERNET
ROUTING
Lakshminarayanan SubramanianVenkata N Padmanabhan MicrosoftResearch Randy H Katz University ofCalifornia at Berkeley
Geographic information can provideinsights into the structure and function-ing of the Internet including interac-tions between different autonomoussystems by analyzing certain propertiesof Internet routing It can be used tomeasure and quantify certain routingproperties such as circuitous routinghot-potato routing and geographic faulttolerance
Traceroute has been used to gather therequired data and Geotrack tool hasbeen used to determine the location ofthe nodes along each network path Thisenables the computation of ldquolinearizeddistancesrdquo which is the sum of the geo-graphic distances between successivepairs of routers along the path
In order to measure the circuitousnessof a path a metric ldquodistance ratiordquo hasbeen defined as the ratio of the lin-earized distance of a path to the geo-graphic distance between the source anddestination of the path From the data ithas been observed that the circuitous-ness of a route depends on both geo-graphic and network location of the endhosts A large value of the distance ratioenables us to flag paths that are highly
USENIX 2002
circuitous possibly (though not neces-sarily) because of routing anomalies Itis also shown that the minimum delaybetween end hosts is strongly correlatedwith the linearized distance of the path
Geographic information can be used tostudy various aspects of wide-area Inter-net paths that traverse multiple ISPs Itwas found that end-to-end Internetpaths tend to be more circuitous thanintra-ISP paths the cause for this beingthe peering relationships between ISPsAlso paths traversing substantial dis-tances within two or more ISPs tend tobe more circuitous than paths largelytraversing only a single ISP Anotherfinding is that ISPs generally employhot-potato routing
Geographic information can also beused to capture the fact that two seem-ingly unrelated routers can be suscepti-ble to correlated failures By using thegeographic information of routers wecan construct a geographic topology ofan ISP Using this we can find the toler-ance of an ISPrsquos network to the total net-work failure in a geographic region
For further information contactlakmecsberkeleyedu
PROVIDING PROCESS ORIGIN INFORMATION
TO AID IN NETWORK TRACEBACK
Florian P Buchholz Purdue UniversityClay Shields Georgetown University
Network traceback is currently limitedbecause host audit systems do not main-tain enough information to matchincoming network traffic to outgoingnetwork traffic The talk presented analternative assigning origin informationto every process and logging it duringinteractive login creation
The current implementation concen-trates mainly on interactive sessions inwhich an event is logged when a newconnection is established using SSH orTelnet The method is effective andcould successfully determine steppingstones and reliably detect the source of aDDoS attack The speaker then talkedabout the related work done in this area
74 Vol 27 No 5 login
which led to discussion of the latestdevelopment in the area of ldquohostcausalityrdquo
The speaker ended with the limitationsand future directions of the frameworkThis framework could be extended andcould find many applications in the areaof security Current implementationdoesnrsquot take care of startup scripts andcron jobs but incorporating the origininformation in FS could solve this prob-lem In the current implementation log-ging is just implemented as a proof ofconcept It could be made safe in manyways and this could be another impor-tant aspect of future work
PROGRAMMING
Summarized by Amit Purohit
CYCLONE A SAFE DIALECT OF C
Trevor Jim ATampT Labs ndash Research GregMorrisett Dan Grossman MichaelHicks James Cheney Yanling WangCornell University
Cyclone is designed to provide protec-tion against attacks such as buffer over-flows format string attacks andmemory management errors The cur-rent structure of C allows programmersto write vulnerable programs Cycloneextends C so that it has the safety guar-antee of Java while keeping C syntaxtypes and semantics untouched
The Cyclone compiler performs staticanalysis of the source code and insertsruntime checks into the compiled out-put at places where the analysis cannotdetermine that the code execution willnot violate safety constraints Cycloneimposes many restrictions to preservesafety such as NULL checksThesechecks do not exist in normal C
The speaker then talked about somesample code written in Cyclone and howit tackles safety problems Porting anexisting C application to Cyclone ispretty easy with fewer than 10 changerequired The current implementationconcentrates more on safety than onperformance ndash hence the performance
penalty is substantial Cyclone was ableto find many lingering bugs The finalstep of the project will be to write acompiler to convert a normal C programto an equivalent Cyclone program
COOPERATIVE TASK MANAGEMENT WITHOUT
MANUAL STACK MANAGEMENT
Atul Adya Jon Howell MarvinTheimer William J Bolosky John RDouceur Microsoft Research
The speaker described the definitionsand motivations behind the distinctconcepts of task management stackmanagement IO management conflictmanagement and data partitioningConventional concurrent programminguses preemptive task management andexploits the automatic stack manage-ment of a standard language In the sec-ond approach cooperative tasks areorganized as event handlers that yieldcontrol by returning control to the eventscheduler manually unrolling theirstacks In this project they have adopteda hybrid approach that makes it possiblefor both stack management styles tocoexist Thus programmers can codeassuming one or the other of these stackmanagement styles will operate depend-ing upon the application The speakeralso gave a detailed example of how touse their mechanism to switch betweenthe two styles
The talk then continued with the imple-mentation They were able to preemptmany subtle concurrency problems byusing cooperative task managementPaying a cost up-front to reduce a subtlerace condition proved to be a goodinvestment
Though the choice of task managementis fundamental the choice of stack man-agement can be left to individual tasteThis project enables use of any type ofstack management in conjunction withcooperative task management
IMPROVING WAIT-FREE ALGORITHMS FOR
INTERPROCESS COMMUNICATION IN
EMBEDDED REAL-TIME SYSTEMS
Hai Huang Padmanabhan Pillai KangG Shin University of Michigan
The main characteristic of the real-timetime-sensitive system is its pre-dictable response time But concurrencymanagement creates hurdles to achiev-ing this goal because of the use of locksto maintain consistency To solve thisproblem many wait-free algorithmshave been developed but these are typi-cally high-cost solutions By takingadvantage of the temporal characteris-tics of the system however the time andspace overhead can be reduced
The speaker presented an algorithm fortemporal concurrency control andapplied this technique to improve threewait-free algorithms A single-writermultiple-reader wait-free algorithm anda double-buffer algorithm were pro-posed
Using their transformation mechanismthey achieved an improvement of17ndash66 in ACET and a 14ndash70 reduc-tion in memory requirements for IPCalgorithms This mechanism is extensi-ble and can be applied to other non-blocking IPC algorithms as well Futurework involves reducing the synchroniz-ing overheads in more general IPC algo-rithms with multiple-writer semantics
MOBILITY
Summarized by Praveen Yalagandula
ROBUST POSITIONING ALGORITHMS FOR
DISTRIBUTED AD-HOC WIRELESS SENSOR
NETWORKS
Chris Savarese Jan Rabaey Universityof California Berkeley Koen Langen-doen Delft University of Technology
The Pico Radio Network comprisesmore than a hundred sensors monitorsand actuators equipped with wirelessconnectivity with the following proper-ties (1) no infrastructure (2) computa-tion in a distributed fashion instead ofcentralized computation (3) dynamictopology (4) limited radio range
75October 2002 login
C
ON
FERE
NC
ERE
PORT
S(5) nodes that can act as repeaters(6) devices of low cost and with lowpower and (7) sparse anchor nodes ndashnodes with GPS capability A positioningalgorithm determines the geographicalposition of each node in the network
In two-dimensional space each nodeneeds three reference positions to esti-mate its geographical position There aretwo problems that make positioning dif-ficult in the Pico Radio Network typesetting (1) a sparse anchor node prob-lem and (2) a range error problem
The two-phase approach that theauthors have taken in solving the prob-lem consists of (1) a hop-terrain algo-rithm ndash in this first phase each noderoughly guesses its location by the dis-tance calculated using multi-hops to theanchor points and (2) a refinementalgorithm ndash in this second phase eachnode uses its neighborsrsquo positions torefine its own position estimate Toguarantee the convergence thisapproach uses confidence metricsassigning a value of 10 for anchor nodesand 01 to start for other nodes andincreasing these with each iteration
A simulation tool OMNet++ was usedfor both phases and in various scenariosThe results show that they achievedposition errors of less than 33 in a sce-nario with 5 anchor nodes an averageconnectivity of 7 and 5 range mea-surement error
Guidelines for anchor node deploymentare high connectivity (gt10) a reason-able fraction of anchor nodes (gt 5)and for anchor placement coverededges
APPLICATION-SPECIFIC NETWORK
MANAGEMENT FOR ENERGY-AWARE
STREAMING OF POPULAR MULTIMEDIA
FORMATS
Surendar Chandra University of Geor-gia and Amin Vahdat Duke University
The main hindrance in supporting theincreasing demand for mobile multime-dia on PDA is the battery capacity of the
small devices The idle-time power con-sumption is almost the same as thereceive power consumption on typicalwireless network cards (1319mW vs1425mW) while the sleep state con-sumes far less power (177mW) To letthe system consume energy proportionalto the stream quality the network cardshould transition to sleep state aggres-sively between each packet
The speaker presented previous workwhere different multimedia streamswere studied for PDAs MS Media RealMedia and Quicktime It was found thatif the inter-packet gap is predictablethen huge savings are possible forexample about 80 savings in MSmedia streams with few packet lossesThe limitation of the IEEE 80211bpower-save mode comes into play whenthere are two or more nodes producinga delay between beacon time and whenthe node receivessends packets Thisbadly affects the streams since higherenergy is consumed while waiting
The authors propose traffic shaping forenergy conservation where this is doneby proxy such that packets arrive at reg-ular intervals This is achieved using alocal proxy in the access point and aclient-side proxy in the mobile hostSimulations show that traffic shapingreduces energy consumption and alsoreveals that the higher bandwidthstreams have a lower energy metric(mJkB)
More information is at httpgreenhousecsugaedu
CHARACTERIZING ALERT AND BROWSE SER-
VICES OF MOBILE CLIENTS
Atul Adya Paramvir Bahl and Lili QiuMicrosoft Research
Even though there is a dramatic increasein Internet access from wireless devicesthere are not many studies done oncharacterizing this traffic In this paperthe authors characterize the trafficobserved on the MSN Web site withboth notification and browse traces
USENIX 2002
Around 33 million browsing accessesand about 325 million notificationentries are present in the traces Threetypes of analyses are done for each oneof these two services content analysisconcerning the most popular contentcategories and their distribution popu-larity analysis or the popularity distri-bution of documents and user behavioranalysis
The analysis of the notification logsshows that document access rates followa Zipf-like distribution with most of theaccesses concentrated on a small num-ber of messages and the accesses exhibitgeographical locality ndash users from samelocality tend to receive similar notifica-tion content The browser log analysisshows that a smaller set of URLs areaccessed most times though the accesspattern does not fit any Zipf curve andthe highly accessed URLS remain stableA correlation study between the notifi-cation and browsing services shows thatwireless users have a moderate correla-tion of 012
FREENIX TRACK SESSIONS
BUILDING APPLICATIONS
Summarized by Matt Butner
INTERACTIVE 3-D GRAPHICS APPLICATIONS
FOR TCL
Oliver Kersting Juumlrgen Doumlllner HassoPlattner Institute for Software SystemsEngineering University of Potsdam
The integration of 3-D image renderingfunctionality into a scripting languagepermits interactive and animated 3-Ddevelopment and application withoutthe formalities and precision demandedby low-level CC++ graphics and visual-ization libraries The large and complexC++ API of the Virtual Rendering Sys-tem (VRS) can be combined with theconveniences of the Tcl scripting lan-guage The mapping of class interfaces isdone via an automated process and gen-erates respective wrapper classes all ofwhich ensures complete API accessibilityand functionality without any signifi-
76 Vol 27 No 5 login
cant performance retribution The gen-eral C++-to-Tcl mapper SWIG grantsthe necessary object and hierarchal abili-ties without the use of object-orientedTcl extensions The final API mappingtechniques address C++ features such asclasses overloaded methods enumera-tions and inheritance relations All areimplemented in a proof-of-concept thatmaps the complete C++ API of the VRSto Tcl and are showcased in a completeinteractive 3-D-map system
THE AGFL GRAMMAR WORK LAB
Cornelis HA Coster Erik VerbruggenUniversity of Nijmegen (KUN)
The growth and implementation of Nat-ural Language Processing (NLP) is thecornerstone of the continued evolutionand implementation of truly intelligentsearch machines and services In partdue to the growing collections of com-puter-stored human-readable docu-ments in the public domain theimplementation of linguistic analysiswill become necessary for desirable pre-cision and resolution Subsequentlysuch tools and linguistic resources mustbe released into the public domain andthey have done so with the release of theAGFL Grammar Work Lab under theGNU Public License as a tool for lin-guistic research and the development ofNLP-based applications
The AGFL (Affix Grammars over aFinite Lattice) Grammar Work Labmeshes context-free grammars withfinite set-valued features that are accept-able to a range of languages In com-puter science terms ldquoSyntax rules areprocedures with parameters and a non-deterministic executionrdquo The EnglishPhrases for Information Retrieval(EP4IR) released with the AGFL-GWLas a robust grammar of English is anAGFL-GWL generated English parserthat outputs ldquoHeadModifiedrdquo framesThe sentences ldquoCompanyX sponsoredthis conferencerdquo and ldquoThis conferencewas sponsored by CompanyXrdquo both gen-erate [CompanyX[sponsored confer-
ence]] The hope is that public availabil-ity of such tools will encourage furtherdevelopment of grammar and lexicalsoftware systems
SWILL A SIMPLE EMBEDDED WEB SERVER
LIBRARY
Sotiria Lampoudi David M BeazleyUniversity of Chicago
This paper won the FREENIX Best Stu-dent Paper award
SWILL (Simple Web Interface LinkLibrary) is a simple Web server in theform of a C library whose developmentwas motivated by a wish to give coolapplications an interface to the Web TheSWILL library provides a simple inter-face that can be efficiently implementedfor tasks that vary from flexible Web-based monitoring to software debuggingand diagnostics Though originallydesigned to be integrated with high-per-formance scientific simulation softwarethe interface is generic enough to allowfor unbounded uses SWILL is a single-threaded server relying upon non-blocking IO through the creation of atemporary server which services IOrequests SWILL does not provide SSL orcryptographic authentication but doeshave HTTP authentication abilities
A fantastic feature of SWILL is its sup-port for SPMD-style parallel applica-tions which utilize MPI provingvaluable for Beowulf clusters and largeparallel machines Another practicalapplication was the implementation ofSWILL in a modified Yalnix emulator byUniversity of Chicago Operating Sys-tems courses which utilized the addedYalnix functionality for OS developmentand debugging SWILL requires minimalmemory overhead and relies upon theHTTP10 protocol
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
62 Vol 27 No 5 login
conference reportsKEYNOTE ADDRESS
THE INTERNETrsquoS COMING SILENT SPRING
Lawrence Lessig Stanford University
Summarized by David E Ott
In a talk that received a standing ova-tion Lawrence Lessig pointed out therecent legal crisis that is stifling innova-tion by extending notions of privateownership of technology beyond rea-sonable limits
Several lessons from history are instruc-tive (1) Edwin Armstrong the creator of FM radio technology became anenemy to RCA which launched a legalcampaign to suppress the technology(2) packet switching networks proposedby Paul Baron were seen by ATampT as anew competing technology that had tobe suppressed (3) Disney took Grimmtales by then in the public domain andretold them in magically innovativeways Should building upon the past inthis way be considered an offense
ldquoThe truth is architectures can allowrdquothat is freedom for innovation can bebuilt into architectures Consider theInternet a simple core allows for anunlimited number of smart end-to-endapplications
Compare an ATampT proprietary core net-work to the Internetrsquos end-to-endmodel The number of innovators forthe former is one company while for thelatter itrsquos potentially the number of peo-ple connected to it Innovations forATampT are designed to benefit the ownerwhile the Internet is a wide-open boardthat allows all kinds of benefits to allkinds of groups The diversity of con-tributors in the Internet arena is stagger-ing
The auction of spectrum by the FCC isanother case in point The spectrum isusually seen as a fixed resource charac-terized by scarcity The courts have seenspectrum as something to be owned asproperty Technologists however haveshown that capacity can be a function of
OUR THANKS TO THE SUMMARIZERS
For the USENIX Annual Technical Conference
Josh Simon who organized the collecting of
the summaries in his usual flawless fashion
Steve Bauer
Florian Buchholz
Matt Butner
Pradipta De
Xiaobo Fan
Hai Huang
Scott Kilroy
Teri Lampoudi
Josh Lothian
Bosko Milekic
Juan Navarro
David E Ott
Amit Purohit
Brennan Reynolds
Matt Selsky
JD Welch
Li Xiao
Praveen Yalagandula
Haijin Yan
Gary Zacheiss
2002 USENIX Annual Technical ConferenceMONTEREY CALIFORNIA USAJUNE 10-15 2002ANNOUNCEMENTS
Summarized by Josh Simon
The 2002 USENIX Annual TechnicalConference was very exciting The gen-eral track had 105 papers submitted (up28 from 82 in 2001) and accepted 25(19 from students) the FREENIX trackhad 53 submitted (up from 52 in 2001)and accepted 26 (7 from students)
The two annual USENIX-given awardswere presented by outgoing USENIXBoard President Dan Geer The USENIXLifetime Achievement Award (also
known as theldquoFlamerdquo becauseof the shape ofthe award) wentto JamesGosling for hiscontributionsincluding thePascal compilerfor MulticsEmacs an early
SMP UNIX work on X11 and Sunrsquoswindowing system the first enscriptand Java The Software Tools UsersGroup (STUG) Award was presented tothe Apache Foundation and accepted byRasmus Lerdorf In addition to the well-known Web server Apache produces
Jakartamod_perlmod_tcl andXML parserwith over 80members in at least 15countries
James Gosling
Rasmus Lerdorf
architecture As David Reed arguescapacity can scale with the number ofusers ndash assuming an effective technologyand an open architecture
Developments over the last three yearsare disturbing and can be summarizedas two layers of corruption intellectual-property rights constraining technicalinnovation and proprietary hardwareplatforms constraining software innova-tion
Issues have been framed by the courtslargely in two mistaken ways (1) itrsquostheir property ndash a new technologyshouldnrsquot interfere with the rights ofcurrent technology owners (2) itrsquos justtheft ndash copyright laws should be upheldby suppressing certain new technologies
In fact we must reframe the debate fromldquoitrsquos their propertyrdquo to a ldquohighwaysrdquometaphor that acts as a neutral platformfor innovation without discriminationldquoTheftrdquo should be reframed as ldquoWaltDisneyrdquo who built upon works from thepast in richly creative ways that demon-strate the utility of allowing work toreach the public domain
In the end creativity depends upon thebalance between property and accesspublic and private controlled and com-mon access Free code builds a free cul-ture and open architectures are whatgive rise to the freedom to innovate Allof us need to become involved inreframing the debate on this issue asRachel Carsonrsquos Silent Spring points outan entire ecology can be undermined bysmall changes from within
INVITED TALKS
THE IETF OR WHERE DO ALL THOSE RFCS
COME FROM ANYWAY
Steve Bellovin ATampT Labs ndash Research
Summarized by Josh Simon
The Internet Engineering Task Force(IETF) is a standards body but not alegal entity consisting of individuals(not organizations) and driven by a con-sensus-based decision model Anyonewho ldquoshows uprdquo ndash be it at the thrice-
63October 2002 login
C
ON
FERE
NC
ERE
PORT
Sannual in-person meetings or on theemail lists for the various groups ndash canjoin and be a member The IETF is con-cerned with Internet protocols and openstandards not LAN-specific (such asAppletalk) or layer-1 or -2 (like copperversus fiber)
The organizational structure is looseThere are many working groups eachwith a specific focus within severalareas Each area has an area directorwho collectively form the Internet Engi-neering Steering Group (IESG) The sixpermanent areas are Internet (withworking groups for IPv6 DNS andICMP) Transport (TCP QoS VoIP andSCTP) Applications (mail some WebLDAP) Routing (OSPF BGP) Opera-tions and Management (SNMP) andSecurity (IPSec TLS SMIME) Thereare also two other areas SubIP is a tem-porary area for things underneath the IPprotocol stack (such as MPLS IP overwireless and traffic engineering) andtherersquos a General area for miscellaneousand process-based working groups
Internet Requests for Comments (RFCs)fall into three tracks Standard Informa-tional and Experimental Note that thismeans that not all RFCs are standardsThe RFCs in the Informational track aregenerally for proprietary protocols orApril first jokes those in the Experimen-tal track are results ideas or theories
The RFCs in the Standard track comefrom working groups in the variousareas through a time-consuming com-plex process Working groups are createdwith an agenda a problem statement anemail list some draft RFCs and a chairThey typically start out as a BoF sessionThe working group and the IESG makea charter to define the scope milestonesand deadlines the Internet AdvisoryBoard (IAB) ensures that the workinggroup proposals are architecturallysound Working groups are narrowlyfocused and are supposed to die off oncethe problem is solved and all milestonesachieved Working groups meet andwork mainly through the email list
though there are three in-person high-bandwidth meetings per year Howeverdecisions reached in person must be rat-ified by the mailing list since not every-body can get to three meetings per yearThey produce RFCs which go throughthe Standard track these need to gothrough the entire working group beforebeing submitted for comment to theentire IETF and then to the IESG MostRFCs wind up going back to the work-ing group at least once from the areadirector or IESG level
The format of an RFC is well-definedand requires it be published in plain 7-bit ASCII Theyrsquore freely redistributableand the IETF reserves the right ofchange control on all Standard-trackRFCs
The big problems the IETF is currentlyfacing are security internationalizationand congestion control Security has tobe designed into protocols from thestart Internationalization has shown usthat 7-bit-only ASCII is bad and doesnrsquotwork especially for those character setsthat require more than 7 bits (likeKanji) UTF-8 is a reasonable compro-mise But what about domain namesWhile not specified as requiring 7-bitASCII in the specifications most DNSapplications assume a 7-bit character setin the namespace This is a hard prob-lem Finally congestion control isanother hard problem since the Internetis not the same as a really big LAN
INTRODUCTION TO AIR TRAFFIC
MANAGEMENT SYSTEMS
Ron Reisman NASA Ames ResearchCenter Rob Savoye Seneca Software
Summarized by Josh Simon
Air traffic control is organized into fourdomains surface which runs out of theairport control tower and controls theaircraft on the ground (eg taxi andtakeoff) terminal area which covers air-craft at 11000 feet and below handledby the Terminal Radar Approach Con-trol (TRACON) facilities en routewhich covers between 11000 and 40000feet including climb descent and at-
USENIX 2002
altitude flight runs from the 20 AirRoute Traffic Control Centers (ARTCCpronounced ldquoartsyrdquo) and traffic flowmanagement which is the strategic armEach area has sectors for low high andvery-high flight Each sector has a con-troller team including one person onthe microphone and handles between12 and 16 aircraft at a time Since thenumber of sectors and areas is limitedand fixed therersquos limited system capac-ity The events of September 11 2001gave us a respite in terms of systemusage but based on path growth pat-terns the air traffic system will be over-subscribed within two to three yearsHow do we handle this oversubscrip-tion
Air Traffic Management (ATM) Deci-sion Support Tools (DST) use physicsaeronautics heuristics (expert systems)fuzzy logic and neural nets to help the(human) aircraft controllers route air-craft around The rest of the talk focusedon capacity issues but the DST alsohandle safety and security issues Thesoftware follows open standards (ISOPOSIX and ANSI) The team at NASAAmes made Center-TRACON Automa-tion System (CTAS) which is softwarefor each of the ARTCCs portable fromSolaris to HP-UX and Linux as wellUnlike just about every other major soft-ware project this one really is standardand portable co-presenter Rob Savoyehas experience in maintaining gcc onmultiple platforms and is the projectlead on the portability and standardsissues for the code CTAS allows theARTCCs to upgrade and enhance indi-vidual aspects or parts of the system itisnrsquot a monolithic all-or-nothing entitylike the old ATM systems
Some future areas of research include ahead-mounted augmented reality devicefor tower operators to improve their sit-uational awareness by automatinghuman factors and new digital globalpositioning system (DGPS) technologieswhich are accurate within inches insteadof feet
64 Vol 27 No 5 login
Questions centered around advancedavionics (eg getting rid of ground con-trol) cooperation between the US andEurope for software development (wersquoreworking together on software develop-ment but the various European coun-triesrsquo controllers donrsquot talk well to eachother) and privatization
ADVENTURES IN DNS
Bill Manning ISI
Summarized by Brennan Reynolds
Manning began by posing a simplequestion is DNS really a critical infra-structure The answer is not simple Per-haps seven years ago when the Internetwas just starting to become popular theanswer was a definite no But today withIPv6 being implemented and so manytransactions being conducted over theInternet the question does not have aclear-cut answer The Internet Engineer-ing Task Force (IETF) has several newmodifications to the DNS service thatmay be used to protect and extend itsusability
The first extension Manning discussedwas Internationalized Domain Names(IDN) To date all DNS records arebased on the ASCII character set butmany addresses in the Internetrsquos globalnetwork cannot be easily written inASCII characters The goal of IDN is toprovide encoding for hostnames that isfair efficient and allows for a smoothtransition from the current scheme Thework has resulted in two encodingschemes ACE and UTF-8 Each encod-ing is independent of the other but theycan be used together in various combi-nations Manning expressed his opinionthat while neither is an ideal solutionACE appears to be the lesser of two evilsA major hindrance getting IDN rolledout into the Internetrsquos DNS root struc-ture is the increase in zone file complex-ity
Manningrsquos next topic was the inclusionof IPv6 records In an IPv4 world it ispossible for administrators to rememberthe numeric representation of anaddress IPv6 makes the addresses too
long and complex to be easily remem-bered This means that DNS will play avital role in getting IPv6 deployed Sev-eral new resource records (RR) havebeen proposed to handle the translationincluding AAAA A6 DNAME and BIT-STRING Manning commented on thedifference between IPv4 and v6 as atransport protocol and that systemstuned for v4 traffic will suffer a perfor-mance hit when using v6 This is largelyattributed to the increase in data trans-mitted per DNS request
The final extension Manning discussedwas DNSSec He introduced this exten-sion as a mechanism that protects thesystem from itself DNSSec protectsagainst data spoofing and providesauthentication between servers Itincludes a cryptographic signature ofthe RR set to ensure authenticity andintegrity By signing the entire set theamount of computation is kept to aminimum The information itself isstored in a new RR within each zone fileon the DNS server
Manning briefly commented on the useof DNS to provide a PKI infrastructurestating that that was not the purpose ofDNS and therefore it should not be usedin that fashion The signing of the RRsets can be done hierarchically resultingin the use of a single trusted key at theroot of the DNS tree to sign all sets tothe leafs However the job of keyreplacement and rollover is extremelydifficult for a system that is distributedacross the globe
Manning stated that in an operation testbed with all of these extensions enabledthe packet size for a single queryresponse grew from 518 bytes to largerthan 18000 This results in a largeincrease of bandwidth usage for highvolume name servers and puts There-fore in Manningrsquos opinion not all ofthese features will be deployed in thenear future For those looking for moreinformation Billrsquos Web site can be foundat httpwwwisieduotdr
THE JOY OF BREAKING THINGS
Pat Parseghian Transmeta
Summarized by Juan Navarro
Pat Parseghian described her experiencein testing the Crusoe microprocessor atthe Transmeta Lab for Compatibility(TLC) where the motto is ldquoYou make it we break itrdquo and the goal is to makeengineers miserable
She first gave an overview of Crusoewhose most distinctive feature is a codemorphing layer that translates x86instructions into native VLIW Suchpeculiarity makes the testing processparticularly challenging because itinvolves testing two processors (theldquoexternalrdquo x86 and the ldquointernalrdquo VLIW)and also because of variations on howthe code-morphing layer works (it mayinterpret code the first few times it seesan instruction sequence and translateand save in a translation cache after-wards) There are also reproducibilityissues since the VLIW processor mayrun at different speeds to save energy
Pat then described the tests that theysubject the processor to (hardware com-patibility tests common applicationsoperating systems envelope-pushinggames and hard-to-acquire legacy appli-cations) and the issues that must be con-sidered when defining testing policiesShe gave some testing tips includingorganizational issues like trackingresources and results
To illustrate the testing process Pat gavea list of possible causes of a system crashthat must be investigated If it is the sili-con then it might be because it is dam-aged or because of a manufacturingproblem or a design flaw If the problemis in the code-morphing layer is thefault with the interpreter or with thetranslator The fault could also be exter-nal to the processor it could be thehomemade system BIOS a faulty main-board or an operator error Or Trans-meta might not be at fault at all thecrash might be due to a bug in the OS orthe application The key to pinpointing
65October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe cause of a problem is to isolate it byidentifying the conditions that repro-duce the problem and repeating thoseconditions in other test platforms andnon-Crusoe systems Then relevant fac-tors must be identified based on productknowledge past experience and com-mon sense
To conclude Pat suggested that some ofTLCrsquos testing lessons can be applied toproducts that the audience was involvedin and assured us that breaking things isfun
TECHNOLOGY LIBERTY FREEDOM AND
WASHINGTON
Alan Davidson Center for Democracyand Technology
Summarized by Steve Bauer
ldquoExperience should teach us to be moston our guard to protect liberty when theGovernmentrsquos purposes are beneficentMen born to freedom are naturally alertto repel invasion of their liberty by evil-minded rulers The greatest dangers toliberty lurk in insidious encroachmentby men of zeal well-meaning but with-out understandingrdquo ndash Louis BrandeisOlmstead v US
Alan Davison an associate director atthe CDT (httpwwwcdtorg) since1996 concluded his talk with this quoteIn many ways it aptly characterizes theimportance to the USENIX communityof the topics he covered The majorthemes of the talk were the impact oflaw on individual liberty system archi-tectures and appropriate responses bythe technical community
The first part of the talk provided theaudience with an overview of legislationeither already introduced or likely to beintroduced in the US Congress Thisincluded various proposals to protectchildren such as relegating all sexualcontent to a xxx domain or providingkid-safe zones such as kidsus Othersimilar laws discussed were the Chil-drenrsquos Online Protection Act and legisla-tion dealing with virtual child pornography
Other lower-profile pieces covered werelaws prohibiting false contact informa-tion in emails and domain registrationdatabases Similar proposals exist thatwould prohibit misleading subject linesin emails and require messages to clearlyidentify if they are advertisementsBriefly mentioned were laws impactingonline gambling
Bills and laws affecting the architecturaldesign of systems and networks and var-ious efforts to establish boundaries andborders in cyberspace were then dis-cussed These included the ConsumerBroadband and Television PromotionAct and the French cases against Yahooand its CEO for allowing the sale of Nazimaterials on the Yahoo auction site
A major part of the talk focused on thetechnological aspects of the USA-PatriotAct passed in the wake of the September11 attacks Topics included ldquopen regis-tersrdquo for the Internet expanded govern-ment power to conduct secret searchesroving wiretaps nationwide service ofwarrants sharing grand jury and intelli-gence information and the establish-ment of an identification system forvisitors to the US
The technological community needs tobe aware and care about the impact oflaw on individual liberty and the systemsthe community builds Specific sugges-tions included designing privacy andanonymity into architectures and limit-ing information collection particularlyany personally identifiable data to onlywhat is essential Finally Alan empha-sized that many people simply do notunderstand the implications of lawThus the technology community has aimportant role in helping make theseimplications clear
TAKING AN OPEN SOURCE PROJECT TO
MARKET
Eric Allman Sendmail Inc
Summarized by Scott Kilroy
Eric started the talk with the upsides anddownsides to open source software Theupsides include rapid feedback high-
USENIX 2002
quality work (mostly) lots of hands andan intensely engineering-drivenapproach The downsides include nosupport structure limited project mar-keting expertise and volunteers havinglimited time
In 1998 Sendmail was becoming a suc-cess disaster A success disaster includestwo key elements (1) volunteers startspending all their time supporting cur-rent functionality instead of enhancing(this heavy burden can lead to projectstagnation and eventually death)(2) competition for the better-fundedsources can lead to FUD (fear uncer-tainty and doubt) about the project
Eric then led listeners down the path hetook to turn Sendmail into a businessEric envisioned Sendmail Inc as a small-ish and close-knit company but soonrealized that the family atmosphere hedesired could not last as the companygrew He observed that maybe only thefirst 10 people will buy into your visionafter which each individual is primarilyinterested in something else (eg powerwealth or status) He warns that therewill be people working for you that youdo not like Eric particularly did not carefor sales people but he emphasized howimportant sales marketing finance andlegal people are to companies
The nature of companies in infancy canbe summed up as you need money andyou need investors in order to getmoney Investors want a return oninvestment so money managementbecomes critical Companies thereforemust optimize money functions Ericrsquosexperience with investors were lessonsall in themselves More than givingmoney good investors can provide con-nections and insight so you donrsquot wantinvestors who donrsquot believe in you
A company must have bug historychange management and people whounderstand the product and have a senseof the target market Eric now sees theimportance of marketing targetresearch A company can never forget
66 Vol 27 No 5 login
the simple equation that drives businessprofit = revenue - expense
Finally the concept of value should bebased on what is valuable to the cus-tomer Eric observed that his customersneeded documentation extra value inthe product that they couldnrsquot get else-where and technical support
Eric learned hard lessons along the wayIf you want to do interesting opensource it might be best not to be toosuccessful You donrsquot want to let thelarger dragons (companies) notice youIf you want a commercial user base youhave to manage process watch corpo-rate culture provide value to customerswatch bottom lines and develop a thickskin
Starting a company is easy but perpetu-ating it is extremely difficult
INFORMATION VISUALIZATION FOR
SYSTEMS PEOPLE
Tamara Munzner University of BritishColumbia
Summarized by JD Welch
Munzner presented interesting evidencethat information visualization throughinteractive representations of data canhelp people to perform tasks more effec-tively and reduce the load on workingmemory
A popular technique in visualizing datais abstracting it through node-link rep-resentation ndash for example a list of cross-references in a book Representing therelationships between references in agraphical (node-link) manner ldquooffloadscognition to the human perceptual sys-temrdquo These graphs can be producedmanually but automated drawing allowsfor increased complexity and quickerproduction times
The way in which people interpret visualdata is important in designing effectivevisualization systems Attention to pre-attentive visual cues are key to successPicking out a red dot among a field ofidentically shaped blue dots is signifi-cantly faster than the same data repre-
sented in a mixture of hue and shapevariations There are many pre-attentivevisual cues including size orientationintersection and intensity The accuracyof these cues can be ranked with posi-tion triggering high accuracy and coloror density on the low end
Visual cues combined with differentdata types ndash quantitative (eg 10inches) ordered (small medium large)and categorical (apples oranges) ndash canalso be ranked with spatial positionbeing best for all types Beyond thiscommonality however accuracy variedwidely with length ranked second forquantitative data eighth for ordinal andninth for categorical data types
These guidelines about visual perceptioncan be applied to information visualiza-tion which unlike scientific visualiza-tion focuses on abstract data and choiceof specialization Several techniques areused to clarify abstract data includingmulti-part glyphs where changes inindividual parts are incorporated into aneasier to understand gestalt interactiv-ity motion and animation In large datasets techniques like ldquofocus + contextrdquowhere a zoomed portion of the graph isshown along with a thumbnail view ofthe entire graph are used to minimizeuser disorientation
Future problems include dealing withhuge databases such as the HumanGenome reckoning dynamic data likethe changing structure of the Web orreal-time network monitoring andtransforming ldquopixel boundrdquo displays intolarge ldquodigital wallpaperrdquo-type systems
FIXING NETWORK SECURITY BY HACKING
THE BUSINESS CLIMATE
Bruce Schneier Counterpane InternetSecurity
Summarized by Florian Buchholz
Bruce Schneier identified security as oneof the fundamental building blocks ofthe Internet A certain degree of securityis needed for all things on the Net andthe limits of security will become limitsof the Net itself However companies are
hesitant to employ security measuresScience is doing well but one cannot seethe benefits in the real world Anincreasing number of users means moreproblems affecting more people Oldproblems such as buffer overflowshavenrsquot gone away and new problemsshow up Furthermore the amount ofexpertise needed to launch attacks isdecreasing
Schneier argues that complexity is theenemy of security and that while secu-rity is getting better the growth in com-plexity is outpacing it As security isfundamentally a people problem oneshouldnrsquot focus on technologies butrather should look at businesses busi-ness motivations and business costsTraditionally one can distinguishbetween two security models threatavoidance where security is absoluteand risk management where security isrelative and one has to mitigate the riskwith technologies and procedures In thelatter model one has to find a pointwith reasonable security at an acceptablecost
After a brief discussion on how securityis handled in businesses today in whichhe concluded that businesses ldquotalk bigabout it but do as little as possiblerdquoSchneier identified four necessary stepsto fix the problems
First enforce liabilities Today no realconsequences have to be feared fromsecurity incidents Holding peopleaccountable will increase the trans-parency of security and give an incentiveto make processes public As possibleoptions to achieve this he listed indus-try-defined standards federal regula-tion and lawsuits The problems withenforcement however lie in difficultiesassociated with the international natureof the problem and the fact that com-plexity makes assigning liabilities diffi-cult Furthermore fear of liability couldhave a chilling effect on free softwaredevelopment and could stifle new com-panies
67October 2002 login
C
ON
FERE
NC
ERE
PORT
SAs a second step Schneier identified theneed to allow partners to transfer liabil-ity Insurance would spread liability riskamong a group and would be a CEOrsquosprimary risk analysis tool There is aneed for standardized risk models andprotection profiles and for more securi-ties as opposed to empty press releasesSchneier predicts that insurance will cre-ate a marketplace for security where cus-tomers and vendors will have the abilityto accept liabilities from each otherComputer-security insurance shouldsoon be as common as household or fireinsurance and from that developmentinsurance will become the driving factorof the security business
The next step is to provide mechanismsto reduce risks both before and aftersoftware is released techniques andprocesses to improve software quality aswell as an evolution of security manage-ment are therefore needed Currentlysecurity products try to rebuild the wallsndash such as physical badges and entrydoorways ndash that were lost when gettingconnected Schneier believes thisldquofortress metaphorrdquo is bad one shouldthink of the problem more in terms of acity Since most businesses cannot affordproper security outsourcing is the onlyway to make security scalable With out-sourcing the concenpt of best practicesbecomes important and insurance com-panies can be tied to them outsourcingwill level the playing field
As a final step Schneier predicts thatrational prosecution and education willlead to deterrence He claims that peoplefeel safe because they live in a lawfulsociety whereas the Internet is classifiedas lawless very much like the AmericanWest in the 1800s or a society ruled bywarlords This is because it is difficult toprove who an attacker is prosecution ishampered by complicated evidencegathering and irrational prosecutionSchneier believes however that educa-tion will play a major role in turning theInternet into a lawful society Specifi-cally he pointed out that we need lawsthat can be explained
Risks wonrsquot go away the best we can dois manage them A company able tomanage risk better will be more prof-itable and we need to give CEOs thenecessary risk-management tools
There were numerous questions Listedbelow are the more complicated ones ina paraphrased QampA format
Q Will liability be effective will insur-ance companies be willing to accept therisks A The government might have tostep in it needs to be seen how it playsout
Q Is there an analogy to the real worldin the fact that in a lawless society onlythe rich can afford security Securitysolutions differentiate between classesA Schneier disagreed with the state-ment giving an analogy to front-doorlocks but conceded that there might bespecial cases
Q Doesnrsquot homogeneity hurt securityA Homogeneity is oversold Diversetypes can be more survivable but giventhe limited number of options the dif-ference will be negligible
Q Regulations in airbag protection haveled to deaths in some cases How can wekeep the pendulum from swinging theother way A Lobbying will not be pre-vented An imperfect solution is proba-ble there might be reversals ofrequirements such as the airbag one
Q What about personal liability A Thiswill be analogous to auto insurance lia-bility comes with computerNet access
Q If the rule of law is to become realitythere must be a law enforcement func-tion that applies to a physical space Youcannot do that with any existing govern-ment agency for the whole worldShould an organization like the UNassume that role A Schneier was notconvinced global enforcement is possi-ble
Q What advice should we take awayfrom this talk A Liability is comingSince the network is important to ourinfrastructure eventually the problems
USENIX 2002
will be solved in a legal environmentYou need to start thinking about how tosolve the problems and how the solu-tions will affect us
The entire speech (including slides) isavailable at httpwwwcounterpanecompresentation4pdf
LIFE IN AN OPEN SOURCE STARTUP
Daryll Strauss Consultant
Summarized by Teri Lampoudi
This talk was packed with morsels ofinsight and tidbits of information aboutwhat life in an open source startup islike (though ldquowas likerdquo might be moreappropriate) what the issues are instarting maintaining and commercial-izing an open source project and theway hardware vendors treat open sourcedevelopers Strauss began by tracing thetimeline of his involvement with devel-oping 3-D graphics support for Linuxfrom the 3dfx voodoo1 driver for Linuxin 1997 to the establishment of Preci-sion Insight in 1998 to its buyout by VALinux to the dismantling of his groupby VA to what the future may hold
Obviously there are benefits from doingopen source development both for thedeveloper and for the end user Animportant point was that open sourcedevelops entire technologies not justproducts A misapprehension is theinherent difficulty of managing a projectthat accepts code from a large number ofdevelopers some volunteer and somepaid The notion that code can just bethrown over the wall into the world is aBig Lie
How does one start and keep afloat anopen source development company Inthe case of Precision Insight the subjectmatter ndash developing graphics andOpenGL for Linux ndash required a consid-erable amount of expertise intimateknowledge of the hardware the librariesX and the Linux kernel as well as theend applications Expertise is mar-ketable What helps even further is hav-ing a visible virtual team of expertspeople who have an established trackrecord of contributions to open source
68 Vol 27 No 5 login
in the particular area of expertise Sup-port from the hardware and softwarevendors came next While everyone waswilling to contract PI to write the por-tions of code that were specific to theirhardware no one wanted to pay the billfor developing the underlying founda-tion that was necessary for the drivers tobe useful In the PI model the infra-structure cost was shared by several cus-tomers at once and the technology keptmoving Once a driver is written for avendor the vendor is perfectly capableof figuring out how to write the nextdriver necessary thereby obviating theneed to contract the job This and ques-tions of protecting intellectual propertyndash hardware design in this case ndash aredeeply problematic with the open sourcemode of development This is not to saythat there is no way to get over thembut that they are likely to arise moreoften than not
Management issues seem equally impor-tant In the PI setting contracts wereflexible ndash both a good and a bad thing ndashand developers overworked Furtherdevelopment ldquoin a fishbowlrdquo that isunder public scrutiny is not easyStrauss stressed the value of good com-munication and the use of various out-of-sync communication methods likeIRC and mailing lists
Finally Strauss discussed what portionsof software can be free and what can beproprietary His suggestion was that hor-izontal markets want to be free whereasvertically one can develop proprietarysolutions The talk closed with a smallstroll through the problems that projectpolitics brings up from things likemerging branches to mailing list andsource tree readability
GENERAL TRACK SESSIONS
FILE SYSTEMS
Summarized by Haijin Yan
STRUCTURE AND PERFORMANCE OF THE
DIRECT ACCESS FILE SYSTEM
Kostas Magoutis Salimah AddetiaAlexandra Fedorova and Margo ISeltzer Harvard University JeffreyChase Andrew Gallatin Richard Kisleyand Rajiv Wickremesinghe Duke Uni-versity and Eran Gabber Lucent Tech-nologies
This paper won the Best Paper award
The Direct Access File System (DAFS) isa new standard for network-attachedstorage over direct-access transport net-works DAFS takes advantage of user-level network interface standards thatenable client-side functionality in whichremote data access resides in a libraryrather than in the kernel This reducesthe overhead of memory copy for datamovement and protocol overheadRemote Direct Memory Access (RDMA)is a direct access transport network thatallows the network adapter to reducecopy overhead by accessing applicationbuffers directly
This paper explores the fundamentalstructural and performance characteris-tics of network file access using a user-level file system structure on adirect-access transport network withRDMA It describes DAFS-based clientand server reference implementationsfor FreeBSD and reports experimentalresults comparing DAFS to a zero-copyNFS implementation It illustrates thebenefits and trade-offs of these tech-niques to provide a basis for informedchoices about deployment of DAFS-based systems and similar extensions toother network file protocols such asNFS Experiments show that DAFS givesapplications direct control over an IOsystem and increases the client CPUrsquosusage while the client is doing IOFuture work includes how to addresslongstanding problems related to theintegration of the application and filesystem for high-performance applica-tions
CONQUEST BETTER PERFORMANCE THROUGH
A DISKPERSISTENT-RAM HYBRID FILE
SYSTEM
An-I A Wang Peter Reiher and GeraldJ Popek UCLA Geoffrey M Kuenning Harvey Mudd College
Motivated by the declining cost of per-sistent RAM the authors propose theConquest file system which stores allsmall files metadata executables andshared libraries in persistent RAM diskshold only the data content of remaininglarge files Compared to alternativessuch as caching and RAM file systemsConquest has the advantages of effi-ciency consistency and reliability at areduced cost Using popular bench-marks experiments show that Conquestincurs little overhead while achievingfaster performance Future workincludes designing mechanisms foradjusting file-size threshold dynamicallyand finding a better disk layout for largedata blocks
EXPLOITING GRAY-BOX KNOWLEDGE OF
BUFFER-CACHE MANAGEMENT
Nathan C Burnett John Bent AndreaC Arpaci-Dusseau and Remzi HArpaci-Dusseau University of Wisconsin Madison
Knowing what algorithm is used tomanage the buffer cache is very impor-tant for improving application perfor-mance However there is currently nointerface for finding this algorithm Thispaper introduces Dust a simple finger-printing tool that is able to identify thebuffer-cache replacement policy Dustautomatically identifies the cache sizeand replacement policy based on theconfiguring attributes of access ordersrecency frequency and long-term his-tory Through simulation Dust was ableto distinguish between a variety ofreplacement algorithm policies found inthe literature FIFO LRU LFU ClockSegmented FIFO 2Qand LRU-K Fur-ther experiments of fingerprinting realoperating system such as NetBSDLinux and Solaris show that Dust is ableto identify the hidden cache replacementalgorithm
69October 2002 login
C
ON
FERE
NC
ERE
PORT
SBy knowing the underlying cachereplacement policy a cache-aware Webserver can reschedule the requests on acached request-first policy to obtain per-formance improvement Experimentsshow that cache-aware schedulingimproves average response time and sys-tem throughput
OPERATING SYSTEMS
(AND DANCING BEARS)
Summarized by Matt Butner
THE JX OPERATING SYSTEM
Michael Golm Meik Felser ChristianWawersich and Juumlrgen Kleinoumlder University of Erlagen-Nuumlrnberg
The talk opened by emphasizing theneed for and the practicality of a func-tional Java OS Such implementation inthe JX OS attempts to mimic the recenttrend toward using highly abstractedlanguages in application development inorder to create an OS as functional andpowerful as one written in a lower-levellanguage
The JX is a micro-kernel solution thatuses separate JVMs for each entity of thekernel and in some cases for eachapplication The separated domains donot share objects and have no threadmigration and each domain implementsits own code The interesting part of thepresentation was the discussion of sys-tem-level programming with Java keyareas discussed were memory manage-ment and interrupt handlers Theauthors concluded by noting that theirtype-safe and modular system resultedin a robust system with great configura-tion flexibility and acceptable perfor-mance
DESIGN EVOLUTION OF THE EROS
SINGLE-LEVEL STORE
Jonathan S Shapiro Johns HopkinsUniversity Jonathan Adams Universityof Pennsylvania
This presentation outlined current char-acteristics of file systems and some oftheir least desirable characteristics Itwas based on the revival of the EROS
single-level-store approach for currentattempts to capitalize on system consis-tency and efficiency Their solution didnot relate directly to EROS but ratherto the design and use of the constructsand notions on which EROS is based Byextending the mapping of the memorysystem to include the disk systems arefurther able to ensure global persistencewithout regard for a disk structure Inexplaining the need for absolute systemconsistency and an environment whichby definition is not allowed to crash thegoal of having such an exhaustive designbecomes clear The cool part of thiswork is the availability of its design andarchitecture to the public
THINK A SOFTWARE FRAMEWORK FOR
COMPONENT-BASED OPERATING SYSTEM
KERNELS
Jean-Philippe Fassino France TeacuteleacutecomRampD Jean-Bernard Stefani INRIA JuliaLawall DIKU Gilles Muller INRIA
This presentation discussed the need forcomponent-based operating systemsand respective structures to ensure flexi-bility for arbitrary-sized systems Thinkprovides a binding model that mapsuniformed components for OS develop-ers and architects to follow ensuringconsistent implementations for an arbi-trary system size However the goal isnot to force developers into a predefinedkernel but to promote the use of certaincomponents in varied ways
Thinkrsquos concentration is primarily onembedded systems where short devel-opment time is necessary but is con-strained by rigorous needs and limitedresources The need to build flexible sys-tems in such an environment can becostly and implementation-specific butThink creates an environment sup-ported by the ability to dynamically loadtype-safe components This allows formore flexible systems that retain func-tionality because of Thinkrsquos ability tobind fine-grained components Bench-marks revealed that dedicated micro-kernels can show performanceimprovements in comparison to mono-
USENIX 2002
lithic kernels Notable future workincludes building components for low-end appliances and the development of areal-time OS component library
BUILDING SERVICES
Summarized by Pradipta De
NINJA A FRAMEWORK FOR NETWORK
SERVICES
J Robert von Behren Eric A BrewerNikita Borisov Michael Chen MattWelsh Josh MacDonald Jeremy Lauand David Culler University of Califor-nia Berkeley
Robert von Behren presented Ninja anongoing project that aims to provide aframework for building robust and scal-able Internet-based network serviceslike Web-hosting instant-messagingemail and file-sharing applicationsRobert drew attention to the difficultiesof writing cluster applications One hasto take care of data consistency andissues of concurrency and for robustapplications there are problems relatedto fault tolerance Ninja works as awrapper to relieve the user of theseproblems The goal of the project isbuilding network services that are scala-ble and highly available maintainingpersistent data providing gracefuldegradation and supporting online evo-lution
The use of clusters distinguishes thissetup from generic distributed systemsin terms of reliability and security aswell as providing a partition-free net-work The next important feature of thisproject is the new programming modelwhich is more restrictive than a generalprogramming model but still expressiveenough to write most of the applica-tions This model described as ldquosingleprogram multiple connectionrdquo usesintertask parallelism instead of multi-threaded concurrency and is character-ized by all nodes running the sameprogram with connections delegated tothe nodes by a centralized connectionmanager (CM) A CM takes care of hid-ing the details of mapping an externalconnection to an internal node Ninja
70 Vol 27 No 5 login
also uses a design called ldquostaged event-driven architecturerdquo where each serviceis broken down into a set of stages con-nected by event queues This architec-ture is suitable for a modular design andhelps in graceful degradation by adap-tive load shedding from the eventqueues
The presentation concluded with exam-ples of implementation and evaluationof a Web server and an email systemcalled NinjaMail to show the ease ofauthoring and the efficacy of using theNinja framework for service develop-ment
USING COHORT-SCHEDULING TO ENHANCE
SERVER PERFORMANCE
James R Larus and Michael ParkesMicrosoft Research
James Larus presented a new schedulingpolicy for increasing the throughput ofserver applications Cohort schedulingbatches execution of similar operationsarising in different server requests Theusual programming paradigm in han-dling server requests is to spawn multi-ple concurrent threads and switch fromone thread to another whenever a threadgets blocked for IO Since the threads ina server mostly execute unrelated piecesof code the locality of reference isreduced hence the effectiveness of dif-ferent caching mechanisms One way tosolve this problem is to throw in morehardware But Larus presented a com-plementary view where the programbehavior is investigated and used toimprove the performance
The problem in this scheme is to iden-tify pieces of code that can be batchedtogether for processing One simple wayis to look at the next program countervalues and use them to club threadstogether However this talk presented anew programming abstraction ldquostagedcomputationrdquo which replaces the threadmodel with ldquostagesrdquo A stage is anabstraction for operations with similarbehavior and locality The StagedServerlibrary can be used for programming inthis model It is a collection of C++
classes that implement staged computa-tion and cohort scheduling on either auniprocessor or multiprocessor It canbe used to define stages
The presentation showed the experi-mental evaluation of the cohort schedul-ing over the thread-based model byimplementing two servers a Web serverwhich is mainly IO bound and a pub-lish-subscribe server which is mainlycompute bound The SURGE bench-mark was used for the first experimentand the Fabret workload for the secondThe results showed that cohort-schedul-ing-based implementation gave a betterthroughput than the thread-basedimplementation at high loads
NETWORK PERFORMANCE
Summarized by Xiaobo Fan
ETE PASSIVE END-TO-END INTERNET SERVICE
PERFORMANCE MONITORING
Yun Fu Amin Vahdat Duke UniversityLudmila Cherkasova Wenting Tang HPLabs
This paper won the Best Student Paperaward
Ludmila Cherkasova began by listingseveral questions most Web serviceproviders want answered in order toimprove service quality She reviewedthe difficulties of making accurate andefficient end-to-end Web service mea-surement and the shortcomings of cur-rently available approaches Theypropose a passive trace-based architec-ture called EtE to monitor Web serverperformance on behalf of end users
The first step is to collect network pack-ets passively The second module recon-structs all TCP connections and extractsHTTP transactions To reconstruct Webpage accesses they first build a knowl-edge base indexed by client IP and URLand then group objects to the relatedWeb pages they are embedded in Statis-tical analysis is used to handle non-matched objects EtE Monitor cangenerate three groups of metrics to mea-sure Web service performance responsetime Web caching and page abortion
To demonstrate the benefits of EtE mon-itor Cherkasova talked about two casestudies and based on the calculatedmetrics gave some insightful explana-tions about whatrsquos happening behind the variations of Web performanceThrough validation they claim theirapproach provides a very close approxi-mation to the real scenario
THE PERFORMANCE OF REMOTE DISPLAY
MECHANISMS FOR THIN-CLIENT COMPUTING
S Jae Yang Jason Nieh Matt SelskyNikhil Tiwari Columbia University
Noting the trend toward thin-clientcomputing the authors compared dif-ferent techniques and design choices inmeasuring the performance of six popu-lar thin-client platforms ndash CitrixMetaFrame Microsoft Terminal Ser-vices Sun Ray Tarantella VNC and XAfter pointing out several challenges inbenchmarking thin clients Yang pro-posed slow-motion benchmarking toachieve non-invasive packet monitoringand consistent visual quality Basicallythey insert delays between separatevisual events in the benchmark applica-tions of the server side so that theclientrsquos display update can catch up withthe serverrsquos processing speed The exper-iments are conducted on an emulatednetwork over a range of network band-widths
Their results show that thin clients canprovide good performance for Webapplications in LAN environments butonly some platforms performed well forvideo benchmark Pixel-based encodingmay achieve better performance andbandwidth efficiency than high-levelgraphics Display caching and compres-sion should be used with care
A MECHANISM FOR TCP-FRIENDLY
TRANSPORT-LEVEL PROTOCOL
COORDINATION
David E Ott and Ketan Mayer-PatelUniversity of North Carolina ChapelHill
A revised transport-level protocol opti-mized for cluster-to-cluster (C-to-C)
71October 2002 login
C
ON
FERE
NC
ERE
PORT
Sapplications is what David Ott tries toexplain in his talk A C-to-C applicationclass is identified as one set of processescommunicating to another set ofprocesses across a common Internetpath The fundamental problem with C-to-C applications is how to coordinateall C-to-C communication flows so thatthey share a consistent view of the com-mon C-to-C network adapt to changingnetwork conditions and cooperate tomeet specific requirements Aggregationpoints (AP) are placed at the first andlast hop of the common data path toprobe network conditions (latencybandwidth loss rate etc) To carry andtransfer this information a new protocolndash Coordination Protocol (CP) ndash isinserted between the network layer (IP)and the transport layer (TCP UCP etc)Ott illustrated how the CP header isupdated and used when packets origi-nate from source and traverse throughlocal and remote AP to arrive at theirdestination and how AP maintains aper-cluster state table and detects net-work conditions Through simulationresults this coordination mechanismappears effective in sharing commonnetwork resources among C-to-C com-munication flows
STORAGE SYSTEMS
Summarized by Praveen Yalagandula
MY CACHE OR YOURS MAKING STORAGE
MORE EXCLUSIVE
Theodore M Wong Carnegie MellonUniversity John Wilkes HP Labs
Theodore Wong explained the ineffi-ciency of current ldquoinclusiverdquo cachingschemes in storage area networks ndashwhen a client accesses a block the blockis read from the disk and is cached atboth the disk array cache and at theclientrsquos cache Then he presented theconcept of ldquoexclusiverdquo caching where ablock accessed by a client is only cachedin that clientrsquos cache On eviction fromthe clientrsquos cache they have come upwith a DEMOTE operation to move thedata block to the tail of the array cache
The new exclusion caching schemes areevaluated on both single-client and mul-tiple-client systems The single-clientresults are presented for two differenttypes of synthetic workloads Random(transaction-processing type workloads)and Zipf (typical Web workloads) Theexclusive policy was quite effective inachieving higher hit rates and lower readlatencies They also showed a 22 timeshit-rate improvement over inclusivetechniques in the case of a real-lifeworkload httpd with a single-client set-ting
The DEMOTE scheme also performedwell in the case of multiple-client sys-tems when the data accessed by clients isdisjointed In the case of shared dataworkloads this scheme performed worsethan the inclusive schemes Theodorethen presented an adaptive exclusivecaching scheme ndash a block accessed by aclient is placed at the tail in the clientrsquoscache and also in the array cache at anappropriate place determined by thepopularity of the block The popularityof the block is measured by maintaininga ghost cache to accumulate the numberof times each block is accessed This newadaptive scheme has achieved a maxi-mum of 52 speedup in the meanlatency in the experiments with real-lifeworkloads
For more information visit httpwwwcscmuedu~tmwongresearch and httpwwwhplhpcomresearchitccslssp
BRIDGING THE INFORMATION GAP IN
STORAGE PROTOCOL STACKS
Timothy E Denehy Andrea C Arpaci-Dusseau Remzi H Arpaci-DusseauUniversity of Wisconsin Madison
Currently there is a huge informationgap between storage systems and file sys-tems The interface exposed by storagesystems to file systems based on blocksand providing only readwrite interfacesis very narrow This leads to poor per-formance as a whole because of dupli-cated functionality in both systems and
USENIX 2002
reduced functionality resulting fromstorage systemsrsquo lack of file information
The speaker presented two enhance-ments ExRaid an exposed RAID andILFS Informed Log-Structured File Sys-tem ExRAID is an enhancement to theblock-based RAID storage system thatexposes the following three types ofinformation to the file system (1)regions ndash contiguous portions of theaddress space comprising one or multi-ple disks (2) performance informationabout queue lengths and throughput ofthe regions revealing disk heterogeneityto the file systems and (3) failure infor-mation ndash dynamically updated informa-tion conveying the number of tolerablefailures in each region
ILFS allows online incremental expan-sion of the storage space performsdynamic parallelism using ExRAIDrsquosperformance information to performwell on heterogeneous storage systemsand provides a range of different mecha-nisms with different granularities forredundancy of files using ExRAIDrsquosregion failure characteristics These newfeatures are added to LFS with only a19 increase in the code size
More informationhttpwwwcswisceduwind
MAXIMIZING THROUGHPUT IN REPLICATED
DISK STRIPING OF VARIABLE
BIT-RATE STREAMS
Stergios V Anastasiadis Duke Univer-sity Kenneth C Sevcik and MichaelStumm University of Toronto
There is an increasing demand for thecontinuous real-time streaming ofmedia files Disk striping is a commontechnique used for supporting manyconnections on the media servers Buteven with 12 million hours of meantime between failures there will be morethan one disk failure per week with 7000disks Fault tolerance can be achieved byusing data replication and reservingextra bandwidth during normal opera-tion This work focused on supportingthe most common variable bit-ratestream formats (eg MPEG)
72 Vol 27 No 5 login
The authors used prototype mediaserver EXEDRA to implement and eval-uate different replication and bandwidthreservation schemes This system sup-ports a variable bit-rate scheme doesstride-based disk allocation and is capa-ble of supporting variable-grain diskstriping Two replication policies arepresented deterministic ndash data blocksare replicated in a round-robin fashionon the secondary replicas random ndashdata blocks are replicated on the ran-domly chosen secondary replicas Twobandwidth reservation techniques arealso presented mirroring reservation ndashdisk bandwidth is reserved for both primary and replicas of the media fileduring the playback and minimumreservation ndash a more efficient scheme inwhich bandwidth is reserved only for thesum of primary data access time and themaximum of the backup data accesstimes required in each round
Experimental results showed that deter-ministic replica placement is better thanrandom placement for small disks mini-mum disk bandwidth reservation istwice as good as mirroring in thethroughput achieved and fault tolerancecan be achieved with a minimal impacton the throughput
TOOLS
Summarized by Li Xiao
SIMPLE AND GENERAL STATISTICAL PROFILING
WITH PCT
Charles Blake and Steven Bauer MIT
This talk introduced a Profile CollectionToolkit (PCT) ndash a sampling-based CPUprofiling facility A novel aspect of PCTis that it allows sampling of semanticallyrich data such as function call stacksfunction parameters local or globalvariables CPU registers or other execu-tion contexts The design objectives ofPCT were driven by user needs and theinadequacies or inaccessibility of priorsystems
The rich data collection capability isachieved via a debugger-controller pro-gram dbct1 The talk then introducedthe profiling collector and data collec-
tion activation PCT file format optionsprovide flexibility and various ways tostore exact states PCT also has analysisoptions two examples were given ndashldquoSimplerdquo and ldquoMixed User and LinuxCoderdquo
The talk then went into how generalsampling helps in the following areas adebugger-controller state machineexample script fragments a simpleexample program and general valuereports in terms of call-path histogramsnumeric value histograms and data gen-erality Related work such as gprof andExpect was then summarized The con-tribution of this work is to provide aportable general value sampling toolThe limitations are that PCT does notsupport much of strip and count inac-curacies happen because of its statisticalnature Based on this limitation -g ispreferred over strip
Possible future directions included sup-porting more debuggers such as dbxand xdb script generators for otherkinds of traces more canned reports forgeneral values and a libgdb-basedlibrary sampler PCT is available athttppdoslcsmitedupct PCT runs onalmost any UNIX-like system
ENGINEERING A DIFFERENCING AND
COMPRESSION DATA FORMAT
David G Korn and Kiem Phong VoATampT Laboratories ndash Research
This talk began with an equation ldquoDif-ferencing + Compression = Delta Com-pressionrdquo Compression removesinformation redundancy in a data setExamples are gzip bzip compress andpack Differencing encodes differencesbetween two data sets Examples are diff -e fdelta bdiff and xdelta Deltacompression compresses a data set givenanother combining differencing andcompression and reduces to pure com-pression when therersquos no commonality
After presenting an overall scheme for adelta compressor and showing deltacompression performance this talk dis-cussed the encoding format of a newlydesigned Vcdiff for delta compression A
Vcdiff instruction code table consists of256 entries of each coding up to a pair ofinstructions and recodes I-byte indicesand any additional data
The talk then showed Vcdiff rsquos perfor-mance with Web data collected fromCNN compared with W3C standardGdiff Gdiff+gzip and diff+gzip whereGdiff was computed using Vcdiff deltainstructions The results of two experi-ments ldquoFirstrdquo and ldquoSuccessiverdquo werepresented In ldquoFirstrdquo each file is com-pressed against the first file collected inldquoSuccessiverdquo each file is compressedagainst the one in the previous hourThe diff+gzip did not work well becausediff was line-oriented Vcdiff performedfavorably compared with other formats
Vcdiff is part of the Vcodex packageVcodex is a platform for all commondata transformations delta compres-sion plain compression encryptiontranscoding (eg uuencode base64) Itis structured in three layers for maxi-mum usability Base library uses Dis-plines and Methods interfaces
The code for Vcdiff can be found athttpwwwresearchattcomswtoolsThey are moving Vcdiff to the IETFstandard as a comprehensive platformfor transforming data Please refer tohttpwwwietforginternet-draftdraft-korn-vcdiff06txt
WHERE IN THE NET
Summarized by Amit Purohit
A PRECISE AND EFFICIENT EVALUATION OF
THE PROXIMITY BETWEEN WEB CLIENTS AND
THEIR LOCAL DNS SERVERS
Zhuoqing Morley Mao University ofCalifornia Berkeley Charles D CranorFred Douglis Michael RabinovichOliver Spatscheck and Jia Wang ATampTLabs ndash Research
Content Distribution Networks (CDNs)attempt to improve Web performance bydelivering Web content to end usersfrom servers located at the edge of thenetwork When a Web client requestscontent the CDN dynamically chooses aserver to route the request to usually the
73October 2002 login
C
ON
FERE
NC
ERE
PORT
Sone that is nearest the client As CDNshave access only to the IP address of thelocal DNS server (LDNS) of the clientthe CDNrsquos authoritative DNS servermaps the clientrsquos LDNS to a geographicregion within a particular network andcombines that with network and serverload information to perform CDNserver selection
This method has two limitations First itis based on the implicit assumption thatclients are close to their LDNS This maynot always be valid Second a singlerequest from an LDNS can represent adiffering number of Web clients ndash calledthe hidden load factor Knowledge of thehidden load factor can be used toachieve better load distribution
The extent of the first limitation and itsimpact on the CDN server selection isdealt with To determine the associationsbetween clients and their LDNS a sim-ple non-intrusive and efficient map-ping technique was developed The datacollected was used to study the impact ofproximity on DNS-based server selec-tion using four different proximity met-rics (1) autonomous system (AS)clustering ndash observing whether a client isin the same AS as its LDNS ndash concludedthat LDNS is very good for coarse-grained server selection as 64 of theassociations belong to the same AS(2) network clustering ndash observingwhether a client is in the same network-aware cluster (NAC) ndash implied that DNSis less useful for finer-grained serverselection since only 16 of the clientand LDNS are in the same NAC (3)traceroute divergence ndash the length of thedivergent paths to the client and itsLDNS from a probe point using tracer-oute ndash implies that most clients aretopologically close to their LDNS asviewed from a randomly chosen probesite (4) round-trip time (RTT) correla-tion (some CDNs select severs based onRTT between the CDN server and theclientrsquos LDNS) examines the correlationbetween the message RTTs from a probepoint to the client and its local DNS
results imply this is a reasonable metricto use to avoid really distant servers
A study of the impact that client-LDNSassociations have on DNS-based serverselection concludes that knowing theclientrsquos IP address would allow moreaccurate server selection in a large num-ber of cases The optimality of the serverselection also depends on the serverdensity placement and selection algo-rithms
Further information can be found athttpwwweecsberkeleyedu~zmaomyresearchhtml or by contactingzmaoeecsberkeleyedu
GEOGRAPHIC PROPERTIES OF INTERNET
ROUTING
Lakshminarayanan SubramanianVenkata N Padmanabhan MicrosoftResearch Randy H Katz University ofCalifornia at Berkeley
Geographic information can provideinsights into the structure and function-ing of the Internet including interac-tions between different autonomoussystems by analyzing certain propertiesof Internet routing It can be used tomeasure and quantify certain routingproperties such as circuitous routinghot-potato routing and geographic faulttolerance
Traceroute has been used to gather therequired data and Geotrack tool hasbeen used to determine the location ofthe nodes along each network path Thisenables the computation of ldquolinearizeddistancesrdquo which is the sum of the geo-graphic distances between successivepairs of routers along the path
In order to measure the circuitousnessof a path a metric ldquodistance ratiordquo hasbeen defined as the ratio of the lin-earized distance of a path to the geo-graphic distance between the source anddestination of the path From the data ithas been observed that the circuitous-ness of a route depends on both geo-graphic and network location of the endhosts A large value of the distance ratioenables us to flag paths that are highly
USENIX 2002
circuitous possibly (though not neces-sarily) because of routing anomalies Itis also shown that the minimum delaybetween end hosts is strongly correlatedwith the linearized distance of the path
Geographic information can be used tostudy various aspects of wide-area Inter-net paths that traverse multiple ISPs Itwas found that end-to-end Internetpaths tend to be more circuitous thanintra-ISP paths the cause for this beingthe peering relationships between ISPsAlso paths traversing substantial dis-tances within two or more ISPs tend tobe more circuitous than paths largelytraversing only a single ISP Anotherfinding is that ISPs generally employhot-potato routing
Geographic information can also beused to capture the fact that two seem-ingly unrelated routers can be suscepti-ble to correlated failures By using thegeographic information of routers wecan construct a geographic topology ofan ISP Using this we can find the toler-ance of an ISPrsquos network to the total net-work failure in a geographic region
For further information contactlakmecsberkeleyedu
PROVIDING PROCESS ORIGIN INFORMATION
TO AID IN NETWORK TRACEBACK
Florian P Buchholz Purdue UniversityClay Shields Georgetown University
Network traceback is currently limitedbecause host audit systems do not main-tain enough information to matchincoming network traffic to outgoingnetwork traffic The talk presented analternative assigning origin informationto every process and logging it duringinteractive login creation
The current implementation concen-trates mainly on interactive sessions inwhich an event is logged when a newconnection is established using SSH orTelnet The method is effective andcould successfully determine steppingstones and reliably detect the source of aDDoS attack The speaker then talkedabout the related work done in this area
74 Vol 27 No 5 login
which led to discussion of the latestdevelopment in the area of ldquohostcausalityrdquo
The speaker ended with the limitationsand future directions of the frameworkThis framework could be extended andcould find many applications in the areaof security Current implementationdoesnrsquot take care of startup scripts andcron jobs but incorporating the origininformation in FS could solve this prob-lem In the current implementation log-ging is just implemented as a proof ofconcept It could be made safe in manyways and this could be another impor-tant aspect of future work
PROGRAMMING
Summarized by Amit Purohit
CYCLONE A SAFE DIALECT OF C
Trevor Jim ATampT Labs ndash Research GregMorrisett Dan Grossman MichaelHicks James Cheney Yanling WangCornell University
Cyclone is designed to provide protec-tion against attacks such as buffer over-flows format string attacks andmemory management errors The cur-rent structure of C allows programmersto write vulnerable programs Cycloneextends C so that it has the safety guar-antee of Java while keeping C syntaxtypes and semantics untouched
The Cyclone compiler performs staticanalysis of the source code and insertsruntime checks into the compiled out-put at places where the analysis cannotdetermine that the code execution willnot violate safety constraints Cycloneimposes many restrictions to preservesafety such as NULL checksThesechecks do not exist in normal C
The speaker then talked about somesample code written in Cyclone and howit tackles safety problems Porting anexisting C application to Cyclone ispretty easy with fewer than 10 changerequired The current implementationconcentrates more on safety than onperformance ndash hence the performance
penalty is substantial Cyclone was ableto find many lingering bugs The finalstep of the project will be to write acompiler to convert a normal C programto an equivalent Cyclone program
COOPERATIVE TASK MANAGEMENT WITHOUT
MANUAL STACK MANAGEMENT
Atul Adya Jon Howell MarvinTheimer William J Bolosky John RDouceur Microsoft Research
The speaker described the definitionsand motivations behind the distinctconcepts of task management stackmanagement IO management conflictmanagement and data partitioningConventional concurrent programminguses preemptive task management andexploits the automatic stack manage-ment of a standard language In the sec-ond approach cooperative tasks areorganized as event handlers that yieldcontrol by returning control to the eventscheduler manually unrolling theirstacks In this project they have adopteda hybrid approach that makes it possiblefor both stack management styles tocoexist Thus programmers can codeassuming one or the other of these stackmanagement styles will operate depend-ing upon the application The speakeralso gave a detailed example of how touse their mechanism to switch betweenthe two styles
The talk then continued with the imple-mentation They were able to preemptmany subtle concurrency problems byusing cooperative task managementPaying a cost up-front to reduce a subtlerace condition proved to be a goodinvestment
Though the choice of task managementis fundamental the choice of stack man-agement can be left to individual tasteThis project enables use of any type ofstack management in conjunction withcooperative task management
IMPROVING WAIT-FREE ALGORITHMS FOR
INTERPROCESS COMMUNICATION IN
EMBEDDED REAL-TIME SYSTEMS
Hai Huang Padmanabhan Pillai KangG Shin University of Michigan
The main characteristic of the real-timetime-sensitive system is its pre-dictable response time But concurrencymanagement creates hurdles to achiev-ing this goal because of the use of locksto maintain consistency To solve thisproblem many wait-free algorithmshave been developed but these are typi-cally high-cost solutions By takingadvantage of the temporal characteris-tics of the system however the time andspace overhead can be reduced
The speaker presented an algorithm fortemporal concurrency control andapplied this technique to improve threewait-free algorithms A single-writermultiple-reader wait-free algorithm anda double-buffer algorithm were pro-posed
Using their transformation mechanismthey achieved an improvement of17ndash66 in ACET and a 14ndash70 reduc-tion in memory requirements for IPCalgorithms This mechanism is extensi-ble and can be applied to other non-blocking IPC algorithms as well Futurework involves reducing the synchroniz-ing overheads in more general IPC algo-rithms with multiple-writer semantics
MOBILITY
Summarized by Praveen Yalagandula
ROBUST POSITIONING ALGORITHMS FOR
DISTRIBUTED AD-HOC WIRELESS SENSOR
NETWORKS
Chris Savarese Jan Rabaey Universityof California Berkeley Koen Langen-doen Delft University of Technology
The Pico Radio Network comprisesmore than a hundred sensors monitorsand actuators equipped with wirelessconnectivity with the following proper-ties (1) no infrastructure (2) computa-tion in a distributed fashion instead ofcentralized computation (3) dynamictopology (4) limited radio range
75October 2002 login
C
ON
FERE
NC
ERE
PORT
S(5) nodes that can act as repeaters(6) devices of low cost and with lowpower and (7) sparse anchor nodes ndashnodes with GPS capability A positioningalgorithm determines the geographicalposition of each node in the network
In two-dimensional space each nodeneeds three reference positions to esti-mate its geographical position There aretwo problems that make positioning dif-ficult in the Pico Radio Network typesetting (1) a sparse anchor node prob-lem and (2) a range error problem
The two-phase approach that theauthors have taken in solving the prob-lem consists of (1) a hop-terrain algo-rithm ndash in this first phase each noderoughly guesses its location by the dis-tance calculated using multi-hops to theanchor points and (2) a refinementalgorithm ndash in this second phase eachnode uses its neighborsrsquo positions torefine its own position estimate Toguarantee the convergence thisapproach uses confidence metricsassigning a value of 10 for anchor nodesand 01 to start for other nodes andincreasing these with each iteration
A simulation tool OMNet++ was usedfor both phases and in various scenariosThe results show that they achievedposition errors of less than 33 in a sce-nario with 5 anchor nodes an averageconnectivity of 7 and 5 range mea-surement error
Guidelines for anchor node deploymentare high connectivity (gt10) a reason-able fraction of anchor nodes (gt 5)and for anchor placement coverededges
APPLICATION-SPECIFIC NETWORK
MANAGEMENT FOR ENERGY-AWARE
STREAMING OF POPULAR MULTIMEDIA
FORMATS
Surendar Chandra University of Geor-gia and Amin Vahdat Duke University
The main hindrance in supporting theincreasing demand for mobile multime-dia on PDA is the battery capacity of the
small devices The idle-time power con-sumption is almost the same as thereceive power consumption on typicalwireless network cards (1319mW vs1425mW) while the sleep state con-sumes far less power (177mW) To letthe system consume energy proportionalto the stream quality the network cardshould transition to sleep state aggres-sively between each packet
The speaker presented previous workwhere different multimedia streamswere studied for PDAs MS Media RealMedia and Quicktime It was found thatif the inter-packet gap is predictablethen huge savings are possible forexample about 80 savings in MSmedia streams with few packet lossesThe limitation of the IEEE 80211bpower-save mode comes into play whenthere are two or more nodes producinga delay between beacon time and whenthe node receivessends packets Thisbadly affects the streams since higherenergy is consumed while waiting
The authors propose traffic shaping forenergy conservation where this is doneby proxy such that packets arrive at reg-ular intervals This is achieved using alocal proxy in the access point and aclient-side proxy in the mobile hostSimulations show that traffic shapingreduces energy consumption and alsoreveals that the higher bandwidthstreams have a lower energy metric(mJkB)
More information is at httpgreenhousecsugaedu
CHARACTERIZING ALERT AND BROWSE SER-
VICES OF MOBILE CLIENTS
Atul Adya Paramvir Bahl and Lili QiuMicrosoft Research
Even though there is a dramatic increasein Internet access from wireless devicesthere are not many studies done oncharacterizing this traffic In this paperthe authors characterize the trafficobserved on the MSN Web site withboth notification and browse traces
USENIX 2002
Around 33 million browsing accessesand about 325 million notificationentries are present in the traces Threetypes of analyses are done for each oneof these two services content analysisconcerning the most popular contentcategories and their distribution popu-larity analysis or the popularity distri-bution of documents and user behavioranalysis
The analysis of the notification logsshows that document access rates followa Zipf-like distribution with most of theaccesses concentrated on a small num-ber of messages and the accesses exhibitgeographical locality ndash users from samelocality tend to receive similar notifica-tion content The browser log analysisshows that a smaller set of URLs areaccessed most times though the accesspattern does not fit any Zipf curve andthe highly accessed URLS remain stableA correlation study between the notifi-cation and browsing services shows thatwireless users have a moderate correla-tion of 012
FREENIX TRACK SESSIONS
BUILDING APPLICATIONS
Summarized by Matt Butner
INTERACTIVE 3-D GRAPHICS APPLICATIONS
FOR TCL
Oliver Kersting Juumlrgen Doumlllner HassoPlattner Institute for Software SystemsEngineering University of Potsdam
The integration of 3-D image renderingfunctionality into a scripting languagepermits interactive and animated 3-Ddevelopment and application withoutthe formalities and precision demandedby low-level CC++ graphics and visual-ization libraries The large and complexC++ API of the Virtual Rendering Sys-tem (VRS) can be combined with theconveniences of the Tcl scripting lan-guage The mapping of class interfaces isdone via an automated process and gen-erates respective wrapper classes all ofwhich ensures complete API accessibilityand functionality without any signifi-
76 Vol 27 No 5 login
cant performance retribution The gen-eral C++-to-Tcl mapper SWIG grantsthe necessary object and hierarchal abili-ties without the use of object-orientedTcl extensions The final API mappingtechniques address C++ features such asclasses overloaded methods enumera-tions and inheritance relations All areimplemented in a proof-of-concept thatmaps the complete C++ API of the VRSto Tcl and are showcased in a completeinteractive 3-D-map system
THE AGFL GRAMMAR WORK LAB
Cornelis HA Coster Erik VerbruggenUniversity of Nijmegen (KUN)
The growth and implementation of Nat-ural Language Processing (NLP) is thecornerstone of the continued evolutionand implementation of truly intelligentsearch machines and services In partdue to the growing collections of com-puter-stored human-readable docu-ments in the public domain theimplementation of linguistic analysiswill become necessary for desirable pre-cision and resolution Subsequentlysuch tools and linguistic resources mustbe released into the public domain andthey have done so with the release of theAGFL Grammar Work Lab under theGNU Public License as a tool for lin-guistic research and the development ofNLP-based applications
The AGFL (Affix Grammars over aFinite Lattice) Grammar Work Labmeshes context-free grammars withfinite set-valued features that are accept-able to a range of languages In com-puter science terms ldquoSyntax rules areprocedures with parameters and a non-deterministic executionrdquo The EnglishPhrases for Information Retrieval(EP4IR) released with the AGFL-GWLas a robust grammar of English is anAGFL-GWL generated English parserthat outputs ldquoHeadModifiedrdquo framesThe sentences ldquoCompanyX sponsoredthis conferencerdquo and ldquoThis conferencewas sponsored by CompanyXrdquo both gen-erate [CompanyX[sponsored confer-
ence]] The hope is that public availabil-ity of such tools will encourage furtherdevelopment of grammar and lexicalsoftware systems
SWILL A SIMPLE EMBEDDED WEB SERVER
LIBRARY
Sotiria Lampoudi David M BeazleyUniversity of Chicago
This paper won the FREENIX Best Stu-dent Paper award
SWILL (Simple Web Interface LinkLibrary) is a simple Web server in theform of a C library whose developmentwas motivated by a wish to give coolapplications an interface to the Web TheSWILL library provides a simple inter-face that can be efficiently implementedfor tasks that vary from flexible Web-based monitoring to software debuggingand diagnostics Though originallydesigned to be integrated with high-per-formance scientific simulation softwarethe interface is generic enough to allowfor unbounded uses SWILL is a single-threaded server relying upon non-blocking IO through the creation of atemporary server which services IOrequests SWILL does not provide SSL orcryptographic authentication but doeshave HTTP authentication abilities
A fantastic feature of SWILL is its sup-port for SPMD-style parallel applica-tions which utilize MPI provingvaluable for Beowulf clusters and largeparallel machines Another practicalapplication was the implementation ofSWILL in a modified Yalnix emulator byUniversity of Chicago Operating Sys-tems courses which utilized the addedYalnix functionality for OS developmentand debugging SWILL requires minimalmemory overhead and relies upon theHTTP10 protocol
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
architecture As David Reed arguescapacity can scale with the number ofusers ndash assuming an effective technologyand an open architecture
Developments over the last three yearsare disturbing and can be summarizedas two layers of corruption intellectual-property rights constraining technicalinnovation and proprietary hardwareplatforms constraining software innova-tion
Issues have been framed by the courtslargely in two mistaken ways (1) itrsquostheir property ndash a new technologyshouldnrsquot interfere with the rights ofcurrent technology owners (2) itrsquos justtheft ndash copyright laws should be upheldby suppressing certain new technologies
In fact we must reframe the debate fromldquoitrsquos their propertyrdquo to a ldquohighwaysrdquometaphor that acts as a neutral platformfor innovation without discriminationldquoTheftrdquo should be reframed as ldquoWaltDisneyrdquo who built upon works from thepast in richly creative ways that demon-strate the utility of allowing work toreach the public domain
In the end creativity depends upon thebalance between property and accesspublic and private controlled and com-mon access Free code builds a free cul-ture and open architectures are whatgive rise to the freedom to innovate Allof us need to become involved inreframing the debate on this issue asRachel Carsonrsquos Silent Spring points outan entire ecology can be undermined bysmall changes from within
INVITED TALKS
THE IETF OR WHERE DO ALL THOSE RFCS
COME FROM ANYWAY
Steve Bellovin ATampT Labs ndash Research
Summarized by Josh Simon
The Internet Engineering Task Force(IETF) is a standards body but not alegal entity consisting of individuals(not organizations) and driven by a con-sensus-based decision model Anyonewho ldquoshows uprdquo ndash be it at the thrice-
63October 2002 login
C
ON
FERE
NC
ERE
PORT
Sannual in-person meetings or on theemail lists for the various groups ndash canjoin and be a member The IETF is con-cerned with Internet protocols and openstandards not LAN-specific (such asAppletalk) or layer-1 or -2 (like copperversus fiber)
The organizational structure is looseThere are many working groups eachwith a specific focus within severalareas Each area has an area directorwho collectively form the Internet Engi-neering Steering Group (IESG) The sixpermanent areas are Internet (withworking groups for IPv6 DNS andICMP) Transport (TCP QoS VoIP andSCTP) Applications (mail some WebLDAP) Routing (OSPF BGP) Opera-tions and Management (SNMP) andSecurity (IPSec TLS SMIME) Thereare also two other areas SubIP is a tem-porary area for things underneath the IPprotocol stack (such as MPLS IP overwireless and traffic engineering) andtherersquos a General area for miscellaneousand process-based working groups
Internet Requests for Comments (RFCs)fall into three tracks Standard Informa-tional and Experimental Note that thismeans that not all RFCs are standardsThe RFCs in the Informational track aregenerally for proprietary protocols orApril first jokes those in the Experimen-tal track are results ideas or theories
The RFCs in the Standard track comefrom working groups in the variousareas through a time-consuming com-plex process Working groups are createdwith an agenda a problem statement anemail list some draft RFCs and a chairThey typically start out as a BoF sessionThe working group and the IESG makea charter to define the scope milestonesand deadlines the Internet AdvisoryBoard (IAB) ensures that the workinggroup proposals are architecturallysound Working groups are narrowlyfocused and are supposed to die off oncethe problem is solved and all milestonesachieved Working groups meet andwork mainly through the email list
though there are three in-person high-bandwidth meetings per year Howeverdecisions reached in person must be rat-ified by the mailing list since not every-body can get to three meetings per yearThey produce RFCs which go throughthe Standard track these need to gothrough the entire working group beforebeing submitted for comment to theentire IETF and then to the IESG MostRFCs wind up going back to the work-ing group at least once from the areadirector or IESG level
The format of an RFC is well-definedand requires it be published in plain 7-bit ASCII Theyrsquore freely redistributableand the IETF reserves the right ofchange control on all Standard-trackRFCs
The big problems the IETF is currentlyfacing are security internationalizationand congestion control Security has tobe designed into protocols from thestart Internationalization has shown usthat 7-bit-only ASCII is bad and doesnrsquotwork especially for those character setsthat require more than 7 bits (likeKanji) UTF-8 is a reasonable compro-mise But what about domain namesWhile not specified as requiring 7-bitASCII in the specifications most DNSapplications assume a 7-bit character setin the namespace This is a hard prob-lem Finally congestion control isanother hard problem since the Internetis not the same as a really big LAN
INTRODUCTION TO AIR TRAFFIC
MANAGEMENT SYSTEMS
Ron Reisman NASA Ames ResearchCenter Rob Savoye Seneca Software
Summarized by Josh Simon
Air traffic control is organized into fourdomains surface which runs out of theairport control tower and controls theaircraft on the ground (eg taxi andtakeoff) terminal area which covers air-craft at 11000 feet and below handledby the Terminal Radar Approach Con-trol (TRACON) facilities en routewhich covers between 11000 and 40000feet including climb descent and at-
USENIX 2002
altitude flight runs from the 20 AirRoute Traffic Control Centers (ARTCCpronounced ldquoartsyrdquo) and traffic flowmanagement which is the strategic armEach area has sectors for low high andvery-high flight Each sector has a con-troller team including one person onthe microphone and handles between12 and 16 aircraft at a time Since thenumber of sectors and areas is limitedand fixed therersquos limited system capac-ity The events of September 11 2001gave us a respite in terms of systemusage but based on path growth pat-terns the air traffic system will be over-subscribed within two to three yearsHow do we handle this oversubscrip-tion
Air Traffic Management (ATM) Deci-sion Support Tools (DST) use physicsaeronautics heuristics (expert systems)fuzzy logic and neural nets to help the(human) aircraft controllers route air-craft around The rest of the talk focusedon capacity issues but the DST alsohandle safety and security issues Thesoftware follows open standards (ISOPOSIX and ANSI) The team at NASAAmes made Center-TRACON Automa-tion System (CTAS) which is softwarefor each of the ARTCCs portable fromSolaris to HP-UX and Linux as wellUnlike just about every other major soft-ware project this one really is standardand portable co-presenter Rob Savoyehas experience in maintaining gcc onmultiple platforms and is the projectlead on the portability and standardsissues for the code CTAS allows theARTCCs to upgrade and enhance indi-vidual aspects or parts of the system itisnrsquot a monolithic all-or-nothing entitylike the old ATM systems
Some future areas of research include ahead-mounted augmented reality devicefor tower operators to improve their sit-uational awareness by automatinghuman factors and new digital globalpositioning system (DGPS) technologieswhich are accurate within inches insteadof feet
64 Vol 27 No 5 login
Questions centered around advancedavionics (eg getting rid of ground con-trol) cooperation between the US andEurope for software development (wersquoreworking together on software develop-ment but the various European coun-triesrsquo controllers donrsquot talk well to eachother) and privatization
ADVENTURES IN DNS
Bill Manning ISI
Summarized by Brennan Reynolds
Manning began by posing a simplequestion is DNS really a critical infra-structure The answer is not simple Per-haps seven years ago when the Internetwas just starting to become popular theanswer was a definite no But today withIPv6 being implemented and so manytransactions being conducted over theInternet the question does not have aclear-cut answer The Internet Engineer-ing Task Force (IETF) has several newmodifications to the DNS service thatmay be used to protect and extend itsusability
The first extension Manning discussedwas Internationalized Domain Names(IDN) To date all DNS records arebased on the ASCII character set butmany addresses in the Internetrsquos globalnetwork cannot be easily written inASCII characters The goal of IDN is toprovide encoding for hostnames that isfair efficient and allows for a smoothtransition from the current scheme Thework has resulted in two encodingschemes ACE and UTF-8 Each encod-ing is independent of the other but theycan be used together in various combi-nations Manning expressed his opinionthat while neither is an ideal solutionACE appears to be the lesser of two evilsA major hindrance getting IDN rolledout into the Internetrsquos DNS root struc-ture is the increase in zone file complex-ity
Manningrsquos next topic was the inclusionof IPv6 records In an IPv4 world it ispossible for administrators to rememberthe numeric representation of anaddress IPv6 makes the addresses too
long and complex to be easily remem-bered This means that DNS will play avital role in getting IPv6 deployed Sev-eral new resource records (RR) havebeen proposed to handle the translationincluding AAAA A6 DNAME and BIT-STRING Manning commented on thedifference between IPv4 and v6 as atransport protocol and that systemstuned for v4 traffic will suffer a perfor-mance hit when using v6 This is largelyattributed to the increase in data trans-mitted per DNS request
The final extension Manning discussedwas DNSSec He introduced this exten-sion as a mechanism that protects thesystem from itself DNSSec protectsagainst data spoofing and providesauthentication between servers Itincludes a cryptographic signature ofthe RR set to ensure authenticity andintegrity By signing the entire set theamount of computation is kept to aminimum The information itself isstored in a new RR within each zone fileon the DNS server
Manning briefly commented on the useof DNS to provide a PKI infrastructurestating that that was not the purpose ofDNS and therefore it should not be usedin that fashion The signing of the RRsets can be done hierarchically resultingin the use of a single trusted key at theroot of the DNS tree to sign all sets tothe leafs However the job of keyreplacement and rollover is extremelydifficult for a system that is distributedacross the globe
Manning stated that in an operation testbed with all of these extensions enabledthe packet size for a single queryresponse grew from 518 bytes to largerthan 18000 This results in a largeincrease of bandwidth usage for highvolume name servers and puts There-fore in Manningrsquos opinion not all ofthese features will be deployed in thenear future For those looking for moreinformation Billrsquos Web site can be foundat httpwwwisieduotdr
THE JOY OF BREAKING THINGS
Pat Parseghian Transmeta
Summarized by Juan Navarro
Pat Parseghian described her experiencein testing the Crusoe microprocessor atthe Transmeta Lab for Compatibility(TLC) where the motto is ldquoYou make it we break itrdquo and the goal is to makeengineers miserable
She first gave an overview of Crusoewhose most distinctive feature is a codemorphing layer that translates x86instructions into native VLIW Suchpeculiarity makes the testing processparticularly challenging because itinvolves testing two processors (theldquoexternalrdquo x86 and the ldquointernalrdquo VLIW)and also because of variations on howthe code-morphing layer works (it mayinterpret code the first few times it seesan instruction sequence and translateand save in a translation cache after-wards) There are also reproducibilityissues since the VLIW processor mayrun at different speeds to save energy
Pat then described the tests that theysubject the processor to (hardware com-patibility tests common applicationsoperating systems envelope-pushinggames and hard-to-acquire legacy appli-cations) and the issues that must be con-sidered when defining testing policiesShe gave some testing tips includingorganizational issues like trackingresources and results
To illustrate the testing process Pat gavea list of possible causes of a system crashthat must be investigated If it is the sili-con then it might be because it is dam-aged or because of a manufacturingproblem or a design flaw If the problemis in the code-morphing layer is thefault with the interpreter or with thetranslator The fault could also be exter-nal to the processor it could be thehomemade system BIOS a faulty main-board or an operator error Or Trans-meta might not be at fault at all thecrash might be due to a bug in the OS orthe application The key to pinpointing
65October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe cause of a problem is to isolate it byidentifying the conditions that repro-duce the problem and repeating thoseconditions in other test platforms andnon-Crusoe systems Then relevant fac-tors must be identified based on productknowledge past experience and com-mon sense
To conclude Pat suggested that some ofTLCrsquos testing lessons can be applied toproducts that the audience was involvedin and assured us that breaking things isfun
TECHNOLOGY LIBERTY FREEDOM AND
WASHINGTON
Alan Davidson Center for Democracyand Technology
Summarized by Steve Bauer
ldquoExperience should teach us to be moston our guard to protect liberty when theGovernmentrsquos purposes are beneficentMen born to freedom are naturally alertto repel invasion of their liberty by evil-minded rulers The greatest dangers toliberty lurk in insidious encroachmentby men of zeal well-meaning but with-out understandingrdquo ndash Louis BrandeisOlmstead v US
Alan Davison an associate director atthe CDT (httpwwwcdtorg) since1996 concluded his talk with this quoteIn many ways it aptly characterizes theimportance to the USENIX communityof the topics he covered The majorthemes of the talk were the impact oflaw on individual liberty system archi-tectures and appropriate responses bythe technical community
The first part of the talk provided theaudience with an overview of legislationeither already introduced or likely to beintroduced in the US Congress Thisincluded various proposals to protectchildren such as relegating all sexualcontent to a xxx domain or providingkid-safe zones such as kidsus Othersimilar laws discussed were the Chil-drenrsquos Online Protection Act and legisla-tion dealing with virtual child pornography
Other lower-profile pieces covered werelaws prohibiting false contact informa-tion in emails and domain registrationdatabases Similar proposals exist thatwould prohibit misleading subject linesin emails and require messages to clearlyidentify if they are advertisementsBriefly mentioned were laws impactingonline gambling
Bills and laws affecting the architecturaldesign of systems and networks and var-ious efforts to establish boundaries andborders in cyberspace were then dis-cussed These included the ConsumerBroadband and Television PromotionAct and the French cases against Yahooand its CEO for allowing the sale of Nazimaterials on the Yahoo auction site
A major part of the talk focused on thetechnological aspects of the USA-PatriotAct passed in the wake of the September11 attacks Topics included ldquopen regis-tersrdquo for the Internet expanded govern-ment power to conduct secret searchesroving wiretaps nationwide service ofwarrants sharing grand jury and intelli-gence information and the establish-ment of an identification system forvisitors to the US
The technological community needs tobe aware and care about the impact oflaw on individual liberty and the systemsthe community builds Specific sugges-tions included designing privacy andanonymity into architectures and limit-ing information collection particularlyany personally identifiable data to onlywhat is essential Finally Alan empha-sized that many people simply do notunderstand the implications of lawThus the technology community has aimportant role in helping make theseimplications clear
TAKING AN OPEN SOURCE PROJECT TO
MARKET
Eric Allman Sendmail Inc
Summarized by Scott Kilroy
Eric started the talk with the upsides anddownsides to open source software Theupsides include rapid feedback high-
USENIX 2002
quality work (mostly) lots of hands andan intensely engineering-drivenapproach The downsides include nosupport structure limited project mar-keting expertise and volunteers havinglimited time
In 1998 Sendmail was becoming a suc-cess disaster A success disaster includestwo key elements (1) volunteers startspending all their time supporting cur-rent functionality instead of enhancing(this heavy burden can lead to projectstagnation and eventually death)(2) competition for the better-fundedsources can lead to FUD (fear uncer-tainty and doubt) about the project
Eric then led listeners down the path hetook to turn Sendmail into a businessEric envisioned Sendmail Inc as a small-ish and close-knit company but soonrealized that the family atmosphere hedesired could not last as the companygrew He observed that maybe only thefirst 10 people will buy into your visionafter which each individual is primarilyinterested in something else (eg powerwealth or status) He warns that therewill be people working for you that youdo not like Eric particularly did not carefor sales people but he emphasized howimportant sales marketing finance andlegal people are to companies
The nature of companies in infancy canbe summed up as you need money andyou need investors in order to getmoney Investors want a return oninvestment so money managementbecomes critical Companies thereforemust optimize money functions Ericrsquosexperience with investors were lessonsall in themselves More than givingmoney good investors can provide con-nections and insight so you donrsquot wantinvestors who donrsquot believe in you
A company must have bug historychange management and people whounderstand the product and have a senseof the target market Eric now sees theimportance of marketing targetresearch A company can never forget
66 Vol 27 No 5 login
the simple equation that drives businessprofit = revenue - expense
Finally the concept of value should bebased on what is valuable to the cus-tomer Eric observed that his customersneeded documentation extra value inthe product that they couldnrsquot get else-where and technical support
Eric learned hard lessons along the wayIf you want to do interesting opensource it might be best not to be toosuccessful You donrsquot want to let thelarger dragons (companies) notice youIf you want a commercial user base youhave to manage process watch corpo-rate culture provide value to customerswatch bottom lines and develop a thickskin
Starting a company is easy but perpetu-ating it is extremely difficult
INFORMATION VISUALIZATION FOR
SYSTEMS PEOPLE
Tamara Munzner University of BritishColumbia
Summarized by JD Welch
Munzner presented interesting evidencethat information visualization throughinteractive representations of data canhelp people to perform tasks more effec-tively and reduce the load on workingmemory
A popular technique in visualizing datais abstracting it through node-link rep-resentation ndash for example a list of cross-references in a book Representing therelationships between references in agraphical (node-link) manner ldquooffloadscognition to the human perceptual sys-temrdquo These graphs can be producedmanually but automated drawing allowsfor increased complexity and quickerproduction times
The way in which people interpret visualdata is important in designing effectivevisualization systems Attention to pre-attentive visual cues are key to successPicking out a red dot among a field ofidentically shaped blue dots is signifi-cantly faster than the same data repre-
sented in a mixture of hue and shapevariations There are many pre-attentivevisual cues including size orientationintersection and intensity The accuracyof these cues can be ranked with posi-tion triggering high accuracy and coloror density on the low end
Visual cues combined with differentdata types ndash quantitative (eg 10inches) ordered (small medium large)and categorical (apples oranges) ndash canalso be ranked with spatial positionbeing best for all types Beyond thiscommonality however accuracy variedwidely with length ranked second forquantitative data eighth for ordinal andninth for categorical data types
These guidelines about visual perceptioncan be applied to information visualiza-tion which unlike scientific visualiza-tion focuses on abstract data and choiceof specialization Several techniques areused to clarify abstract data includingmulti-part glyphs where changes inindividual parts are incorporated into aneasier to understand gestalt interactiv-ity motion and animation In large datasets techniques like ldquofocus + contextrdquowhere a zoomed portion of the graph isshown along with a thumbnail view ofthe entire graph are used to minimizeuser disorientation
Future problems include dealing withhuge databases such as the HumanGenome reckoning dynamic data likethe changing structure of the Web orreal-time network monitoring andtransforming ldquopixel boundrdquo displays intolarge ldquodigital wallpaperrdquo-type systems
FIXING NETWORK SECURITY BY HACKING
THE BUSINESS CLIMATE
Bruce Schneier Counterpane InternetSecurity
Summarized by Florian Buchholz
Bruce Schneier identified security as oneof the fundamental building blocks ofthe Internet A certain degree of securityis needed for all things on the Net andthe limits of security will become limitsof the Net itself However companies are
hesitant to employ security measuresScience is doing well but one cannot seethe benefits in the real world Anincreasing number of users means moreproblems affecting more people Oldproblems such as buffer overflowshavenrsquot gone away and new problemsshow up Furthermore the amount ofexpertise needed to launch attacks isdecreasing
Schneier argues that complexity is theenemy of security and that while secu-rity is getting better the growth in com-plexity is outpacing it As security isfundamentally a people problem oneshouldnrsquot focus on technologies butrather should look at businesses busi-ness motivations and business costsTraditionally one can distinguishbetween two security models threatavoidance where security is absoluteand risk management where security isrelative and one has to mitigate the riskwith technologies and procedures In thelatter model one has to find a pointwith reasonable security at an acceptablecost
After a brief discussion on how securityis handled in businesses today in whichhe concluded that businesses ldquotalk bigabout it but do as little as possiblerdquoSchneier identified four necessary stepsto fix the problems
First enforce liabilities Today no realconsequences have to be feared fromsecurity incidents Holding peopleaccountable will increase the trans-parency of security and give an incentiveto make processes public As possibleoptions to achieve this he listed indus-try-defined standards federal regula-tion and lawsuits The problems withenforcement however lie in difficultiesassociated with the international natureof the problem and the fact that com-plexity makes assigning liabilities diffi-cult Furthermore fear of liability couldhave a chilling effect on free softwaredevelopment and could stifle new com-panies
67October 2002 login
C
ON
FERE
NC
ERE
PORT
SAs a second step Schneier identified theneed to allow partners to transfer liabil-ity Insurance would spread liability riskamong a group and would be a CEOrsquosprimary risk analysis tool There is aneed for standardized risk models andprotection profiles and for more securi-ties as opposed to empty press releasesSchneier predicts that insurance will cre-ate a marketplace for security where cus-tomers and vendors will have the abilityto accept liabilities from each otherComputer-security insurance shouldsoon be as common as household or fireinsurance and from that developmentinsurance will become the driving factorof the security business
The next step is to provide mechanismsto reduce risks both before and aftersoftware is released techniques andprocesses to improve software quality aswell as an evolution of security manage-ment are therefore needed Currentlysecurity products try to rebuild the wallsndash such as physical badges and entrydoorways ndash that were lost when gettingconnected Schneier believes thisldquofortress metaphorrdquo is bad one shouldthink of the problem more in terms of acity Since most businesses cannot affordproper security outsourcing is the onlyway to make security scalable With out-sourcing the concenpt of best practicesbecomes important and insurance com-panies can be tied to them outsourcingwill level the playing field
As a final step Schneier predicts thatrational prosecution and education willlead to deterrence He claims that peoplefeel safe because they live in a lawfulsociety whereas the Internet is classifiedas lawless very much like the AmericanWest in the 1800s or a society ruled bywarlords This is because it is difficult toprove who an attacker is prosecution ishampered by complicated evidencegathering and irrational prosecutionSchneier believes however that educa-tion will play a major role in turning theInternet into a lawful society Specifi-cally he pointed out that we need lawsthat can be explained
Risks wonrsquot go away the best we can dois manage them A company able tomanage risk better will be more prof-itable and we need to give CEOs thenecessary risk-management tools
There were numerous questions Listedbelow are the more complicated ones ina paraphrased QampA format
Q Will liability be effective will insur-ance companies be willing to accept therisks A The government might have tostep in it needs to be seen how it playsout
Q Is there an analogy to the real worldin the fact that in a lawless society onlythe rich can afford security Securitysolutions differentiate between classesA Schneier disagreed with the state-ment giving an analogy to front-doorlocks but conceded that there might bespecial cases
Q Doesnrsquot homogeneity hurt securityA Homogeneity is oversold Diversetypes can be more survivable but giventhe limited number of options the dif-ference will be negligible
Q Regulations in airbag protection haveled to deaths in some cases How can wekeep the pendulum from swinging theother way A Lobbying will not be pre-vented An imperfect solution is proba-ble there might be reversals ofrequirements such as the airbag one
Q What about personal liability A Thiswill be analogous to auto insurance lia-bility comes with computerNet access
Q If the rule of law is to become realitythere must be a law enforcement func-tion that applies to a physical space Youcannot do that with any existing govern-ment agency for the whole worldShould an organization like the UNassume that role A Schneier was notconvinced global enforcement is possi-ble
Q What advice should we take awayfrom this talk A Liability is comingSince the network is important to ourinfrastructure eventually the problems
USENIX 2002
will be solved in a legal environmentYou need to start thinking about how tosolve the problems and how the solu-tions will affect us
The entire speech (including slides) isavailable at httpwwwcounterpanecompresentation4pdf
LIFE IN AN OPEN SOURCE STARTUP
Daryll Strauss Consultant
Summarized by Teri Lampoudi
This talk was packed with morsels ofinsight and tidbits of information aboutwhat life in an open source startup islike (though ldquowas likerdquo might be moreappropriate) what the issues are instarting maintaining and commercial-izing an open source project and theway hardware vendors treat open sourcedevelopers Strauss began by tracing thetimeline of his involvement with devel-oping 3-D graphics support for Linuxfrom the 3dfx voodoo1 driver for Linuxin 1997 to the establishment of Preci-sion Insight in 1998 to its buyout by VALinux to the dismantling of his groupby VA to what the future may hold
Obviously there are benefits from doingopen source development both for thedeveloper and for the end user Animportant point was that open sourcedevelops entire technologies not justproducts A misapprehension is theinherent difficulty of managing a projectthat accepts code from a large number ofdevelopers some volunteer and somepaid The notion that code can just bethrown over the wall into the world is aBig Lie
How does one start and keep afloat anopen source development company Inthe case of Precision Insight the subjectmatter ndash developing graphics andOpenGL for Linux ndash required a consid-erable amount of expertise intimateknowledge of the hardware the librariesX and the Linux kernel as well as theend applications Expertise is mar-ketable What helps even further is hav-ing a visible virtual team of expertspeople who have an established trackrecord of contributions to open source
68 Vol 27 No 5 login
in the particular area of expertise Sup-port from the hardware and softwarevendors came next While everyone waswilling to contract PI to write the por-tions of code that were specific to theirhardware no one wanted to pay the billfor developing the underlying founda-tion that was necessary for the drivers tobe useful In the PI model the infra-structure cost was shared by several cus-tomers at once and the technology keptmoving Once a driver is written for avendor the vendor is perfectly capableof figuring out how to write the nextdriver necessary thereby obviating theneed to contract the job This and ques-tions of protecting intellectual propertyndash hardware design in this case ndash aredeeply problematic with the open sourcemode of development This is not to saythat there is no way to get over thembut that they are likely to arise moreoften than not
Management issues seem equally impor-tant In the PI setting contracts wereflexible ndash both a good and a bad thing ndashand developers overworked Furtherdevelopment ldquoin a fishbowlrdquo that isunder public scrutiny is not easyStrauss stressed the value of good com-munication and the use of various out-of-sync communication methods likeIRC and mailing lists
Finally Strauss discussed what portionsof software can be free and what can beproprietary His suggestion was that hor-izontal markets want to be free whereasvertically one can develop proprietarysolutions The talk closed with a smallstroll through the problems that projectpolitics brings up from things likemerging branches to mailing list andsource tree readability
GENERAL TRACK SESSIONS
FILE SYSTEMS
Summarized by Haijin Yan
STRUCTURE AND PERFORMANCE OF THE
DIRECT ACCESS FILE SYSTEM
Kostas Magoutis Salimah AddetiaAlexandra Fedorova and Margo ISeltzer Harvard University JeffreyChase Andrew Gallatin Richard Kisleyand Rajiv Wickremesinghe Duke Uni-versity and Eran Gabber Lucent Tech-nologies
This paper won the Best Paper award
The Direct Access File System (DAFS) isa new standard for network-attachedstorage over direct-access transport net-works DAFS takes advantage of user-level network interface standards thatenable client-side functionality in whichremote data access resides in a libraryrather than in the kernel This reducesthe overhead of memory copy for datamovement and protocol overheadRemote Direct Memory Access (RDMA)is a direct access transport network thatallows the network adapter to reducecopy overhead by accessing applicationbuffers directly
This paper explores the fundamentalstructural and performance characteris-tics of network file access using a user-level file system structure on adirect-access transport network withRDMA It describes DAFS-based clientand server reference implementationsfor FreeBSD and reports experimentalresults comparing DAFS to a zero-copyNFS implementation It illustrates thebenefits and trade-offs of these tech-niques to provide a basis for informedchoices about deployment of DAFS-based systems and similar extensions toother network file protocols such asNFS Experiments show that DAFS givesapplications direct control over an IOsystem and increases the client CPUrsquosusage while the client is doing IOFuture work includes how to addresslongstanding problems related to theintegration of the application and filesystem for high-performance applica-tions
CONQUEST BETTER PERFORMANCE THROUGH
A DISKPERSISTENT-RAM HYBRID FILE
SYSTEM
An-I A Wang Peter Reiher and GeraldJ Popek UCLA Geoffrey M Kuenning Harvey Mudd College
Motivated by the declining cost of per-sistent RAM the authors propose theConquest file system which stores allsmall files metadata executables andshared libraries in persistent RAM diskshold only the data content of remaininglarge files Compared to alternativessuch as caching and RAM file systemsConquest has the advantages of effi-ciency consistency and reliability at areduced cost Using popular bench-marks experiments show that Conquestincurs little overhead while achievingfaster performance Future workincludes designing mechanisms foradjusting file-size threshold dynamicallyand finding a better disk layout for largedata blocks
EXPLOITING GRAY-BOX KNOWLEDGE OF
BUFFER-CACHE MANAGEMENT
Nathan C Burnett John Bent AndreaC Arpaci-Dusseau and Remzi HArpaci-Dusseau University of Wisconsin Madison
Knowing what algorithm is used tomanage the buffer cache is very impor-tant for improving application perfor-mance However there is currently nointerface for finding this algorithm Thispaper introduces Dust a simple finger-printing tool that is able to identify thebuffer-cache replacement policy Dustautomatically identifies the cache sizeand replacement policy based on theconfiguring attributes of access ordersrecency frequency and long-term his-tory Through simulation Dust was ableto distinguish between a variety ofreplacement algorithm policies found inthe literature FIFO LRU LFU ClockSegmented FIFO 2Qand LRU-K Fur-ther experiments of fingerprinting realoperating system such as NetBSDLinux and Solaris show that Dust is ableto identify the hidden cache replacementalgorithm
69October 2002 login
C
ON
FERE
NC
ERE
PORT
SBy knowing the underlying cachereplacement policy a cache-aware Webserver can reschedule the requests on acached request-first policy to obtain per-formance improvement Experimentsshow that cache-aware schedulingimproves average response time and sys-tem throughput
OPERATING SYSTEMS
(AND DANCING BEARS)
Summarized by Matt Butner
THE JX OPERATING SYSTEM
Michael Golm Meik Felser ChristianWawersich and Juumlrgen Kleinoumlder University of Erlagen-Nuumlrnberg
The talk opened by emphasizing theneed for and the practicality of a func-tional Java OS Such implementation inthe JX OS attempts to mimic the recenttrend toward using highly abstractedlanguages in application development inorder to create an OS as functional andpowerful as one written in a lower-levellanguage
The JX is a micro-kernel solution thatuses separate JVMs for each entity of thekernel and in some cases for eachapplication The separated domains donot share objects and have no threadmigration and each domain implementsits own code The interesting part of thepresentation was the discussion of sys-tem-level programming with Java keyareas discussed were memory manage-ment and interrupt handlers Theauthors concluded by noting that theirtype-safe and modular system resultedin a robust system with great configura-tion flexibility and acceptable perfor-mance
DESIGN EVOLUTION OF THE EROS
SINGLE-LEVEL STORE
Jonathan S Shapiro Johns HopkinsUniversity Jonathan Adams Universityof Pennsylvania
This presentation outlined current char-acteristics of file systems and some oftheir least desirable characteristics Itwas based on the revival of the EROS
single-level-store approach for currentattempts to capitalize on system consis-tency and efficiency Their solution didnot relate directly to EROS but ratherto the design and use of the constructsand notions on which EROS is based Byextending the mapping of the memorysystem to include the disk systems arefurther able to ensure global persistencewithout regard for a disk structure Inexplaining the need for absolute systemconsistency and an environment whichby definition is not allowed to crash thegoal of having such an exhaustive designbecomes clear The cool part of thiswork is the availability of its design andarchitecture to the public
THINK A SOFTWARE FRAMEWORK FOR
COMPONENT-BASED OPERATING SYSTEM
KERNELS
Jean-Philippe Fassino France TeacuteleacutecomRampD Jean-Bernard Stefani INRIA JuliaLawall DIKU Gilles Muller INRIA
This presentation discussed the need forcomponent-based operating systemsand respective structures to ensure flexi-bility for arbitrary-sized systems Thinkprovides a binding model that mapsuniformed components for OS develop-ers and architects to follow ensuringconsistent implementations for an arbi-trary system size However the goal isnot to force developers into a predefinedkernel but to promote the use of certaincomponents in varied ways
Thinkrsquos concentration is primarily onembedded systems where short devel-opment time is necessary but is con-strained by rigorous needs and limitedresources The need to build flexible sys-tems in such an environment can becostly and implementation-specific butThink creates an environment sup-ported by the ability to dynamically loadtype-safe components This allows formore flexible systems that retain func-tionality because of Thinkrsquos ability tobind fine-grained components Bench-marks revealed that dedicated micro-kernels can show performanceimprovements in comparison to mono-
USENIX 2002
lithic kernels Notable future workincludes building components for low-end appliances and the development of areal-time OS component library
BUILDING SERVICES
Summarized by Pradipta De
NINJA A FRAMEWORK FOR NETWORK
SERVICES
J Robert von Behren Eric A BrewerNikita Borisov Michael Chen MattWelsh Josh MacDonald Jeremy Lauand David Culler University of Califor-nia Berkeley
Robert von Behren presented Ninja anongoing project that aims to provide aframework for building robust and scal-able Internet-based network serviceslike Web-hosting instant-messagingemail and file-sharing applicationsRobert drew attention to the difficultiesof writing cluster applications One hasto take care of data consistency andissues of concurrency and for robustapplications there are problems relatedto fault tolerance Ninja works as awrapper to relieve the user of theseproblems The goal of the project isbuilding network services that are scala-ble and highly available maintainingpersistent data providing gracefuldegradation and supporting online evo-lution
The use of clusters distinguishes thissetup from generic distributed systemsin terms of reliability and security aswell as providing a partition-free net-work The next important feature of thisproject is the new programming modelwhich is more restrictive than a generalprogramming model but still expressiveenough to write most of the applica-tions This model described as ldquosingleprogram multiple connectionrdquo usesintertask parallelism instead of multi-threaded concurrency and is character-ized by all nodes running the sameprogram with connections delegated tothe nodes by a centralized connectionmanager (CM) A CM takes care of hid-ing the details of mapping an externalconnection to an internal node Ninja
70 Vol 27 No 5 login
also uses a design called ldquostaged event-driven architecturerdquo where each serviceis broken down into a set of stages con-nected by event queues This architec-ture is suitable for a modular design andhelps in graceful degradation by adap-tive load shedding from the eventqueues
The presentation concluded with exam-ples of implementation and evaluationof a Web server and an email systemcalled NinjaMail to show the ease ofauthoring and the efficacy of using theNinja framework for service develop-ment
USING COHORT-SCHEDULING TO ENHANCE
SERVER PERFORMANCE
James R Larus and Michael ParkesMicrosoft Research
James Larus presented a new schedulingpolicy for increasing the throughput ofserver applications Cohort schedulingbatches execution of similar operationsarising in different server requests Theusual programming paradigm in han-dling server requests is to spawn multi-ple concurrent threads and switch fromone thread to another whenever a threadgets blocked for IO Since the threads ina server mostly execute unrelated piecesof code the locality of reference isreduced hence the effectiveness of dif-ferent caching mechanisms One way tosolve this problem is to throw in morehardware But Larus presented a com-plementary view where the programbehavior is investigated and used toimprove the performance
The problem in this scheme is to iden-tify pieces of code that can be batchedtogether for processing One simple wayis to look at the next program countervalues and use them to club threadstogether However this talk presented anew programming abstraction ldquostagedcomputationrdquo which replaces the threadmodel with ldquostagesrdquo A stage is anabstraction for operations with similarbehavior and locality The StagedServerlibrary can be used for programming inthis model It is a collection of C++
classes that implement staged computa-tion and cohort scheduling on either auniprocessor or multiprocessor It canbe used to define stages
The presentation showed the experi-mental evaluation of the cohort schedul-ing over the thread-based model byimplementing two servers a Web serverwhich is mainly IO bound and a pub-lish-subscribe server which is mainlycompute bound The SURGE bench-mark was used for the first experimentand the Fabret workload for the secondThe results showed that cohort-schedul-ing-based implementation gave a betterthroughput than the thread-basedimplementation at high loads
NETWORK PERFORMANCE
Summarized by Xiaobo Fan
ETE PASSIVE END-TO-END INTERNET SERVICE
PERFORMANCE MONITORING
Yun Fu Amin Vahdat Duke UniversityLudmila Cherkasova Wenting Tang HPLabs
This paper won the Best Student Paperaward
Ludmila Cherkasova began by listingseveral questions most Web serviceproviders want answered in order toimprove service quality She reviewedthe difficulties of making accurate andefficient end-to-end Web service mea-surement and the shortcomings of cur-rently available approaches Theypropose a passive trace-based architec-ture called EtE to monitor Web serverperformance on behalf of end users
The first step is to collect network pack-ets passively The second module recon-structs all TCP connections and extractsHTTP transactions To reconstruct Webpage accesses they first build a knowl-edge base indexed by client IP and URLand then group objects to the relatedWeb pages they are embedded in Statis-tical analysis is used to handle non-matched objects EtE Monitor cangenerate three groups of metrics to mea-sure Web service performance responsetime Web caching and page abortion
To demonstrate the benefits of EtE mon-itor Cherkasova talked about two casestudies and based on the calculatedmetrics gave some insightful explana-tions about whatrsquos happening behind the variations of Web performanceThrough validation they claim theirapproach provides a very close approxi-mation to the real scenario
THE PERFORMANCE OF REMOTE DISPLAY
MECHANISMS FOR THIN-CLIENT COMPUTING
S Jae Yang Jason Nieh Matt SelskyNikhil Tiwari Columbia University
Noting the trend toward thin-clientcomputing the authors compared dif-ferent techniques and design choices inmeasuring the performance of six popu-lar thin-client platforms ndash CitrixMetaFrame Microsoft Terminal Ser-vices Sun Ray Tarantella VNC and XAfter pointing out several challenges inbenchmarking thin clients Yang pro-posed slow-motion benchmarking toachieve non-invasive packet monitoringand consistent visual quality Basicallythey insert delays between separatevisual events in the benchmark applica-tions of the server side so that theclientrsquos display update can catch up withthe serverrsquos processing speed The exper-iments are conducted on an emulatednetwork over a range of network band-widths
Their results show that thin clients canprovide good performance for Webapplications in LAN environments butonly some platforms performed well forvideo benchmark Pixel-based encodingmay achieve better performance andbandwidth efficiency than high-levelgraphics Display caching and compres-sion should be used with care
A MECHANISM FOR TCP-FRIENDLY
TRANSPORT-LEVEL PROTOCOL
COORDINATION
David E Ott and Ketan Mayer-PatelUniversity of North Carolina ChapelHill
A revised transport-level protocol opti-mized for cluster-to-cluster (C-to-C)
71October 2002 login
C
ON
FERE
NC
ERE
PORT
Sapplications is what David Ott tries toexplain in his talk A C-to-C applicationclass is identified as one set of processescommunicating to another set ofprocesses across a common Internetpath The fundamental problem with C-to-C applications is how to coordinateall C-to-C communication flows so thatthey share a consistent view of the com-mon C-to-C network adapt to changingnetwork conditions and cooperate tomeet specific requirements Aggregationpoints (AP) are placed at the first andlast hop of the common data path toprobe network conditions (latencybandwidth loss rate etc) To carry andtransfer this information a new protocolndash Coordination Protocol (CP) ndash isinserted between the network layer (IP)and the transport layer (TCP UCP etc)Ott illustrated how the CP header isupdated and used when packets origi-nate from source and traverse throughlocal and remote AP to arrive at theirdestination and how AP maintains aper-cluster state table and detects net-work conditions Through simulationresults this coordination mechanismappears effective in sharing commonnetwork resources among C-to-C com-munication flows
STORAGE SYSTEMS
Summarized by Praveen Yalagandula
MY CACHE OR YOURS MAKING STORAGE
MORE EXCLUSIVE
Theodore M Wong Carnegie MellonUniversity John Wilkes HP Labs
Theodore Wong explained the ineffi-ciency of current ldquoinclusiverdquo cachingschemes in storage area networks ndashwhen a client accesses a block the blockis read from the disk and is cached atboth the disk array cache and at theclientrsquos cache Then he presented theconcept of ldquoexclusiverdquo caching where ablock accessed by a client is only cachedin that clientrsquos cache On eviction fromthe clientrsquos cache they have come upwith a DEMOTE operation to move thedata block to the tail of the array cache
The new exclusion caching schemes areevaluated on both single-client and mul-tiple-client systems The single-clientresults are presented for two differenttypes of synthetic workloads Random(transaction-processing type workloads)and Zipf (typical Web workloads) Theexclusive policy was quite effective inachieving higher hit rates and lower readlatencies They also showed a 22 timeshit-rate improvement over inclusivetechniques in the case of a real-lifeworkload httpd with a single-client set-ting
The DEMOTE scheme also performedwell in the case of multiple-client sys-tems when the data accessed by clients isdisjointed In the case of shared dataworkloads this scheme performed worsethan the inclusive schemes Theodorethen presented an adaptive exclusivecaching scheme ndash a block accessed by aclient is placed at the tail in the clientrsquoscache and also in the array cache at anappropriate place determined by thepopularity of the block The popularityof the block is measured by maintaininga ghost cache to accumulate the numberof times each block is accessed This newadaptive scheme has achieved a maxi-mum of 52 speedup in the meanlatency in the experiments with real-lifeworkloads
For more information visit httpwwwcscmuedu~tmwongresearch and httpwwwhplhpcomresearchitccslssp
BRIDGING THE INFORMATION GAP IN
STORAGE PROTOCOL STACKS
Timothy E Denehy Andrea C Arpaci-Dusseau Remzi H Arpaci-DusseauUniversity of Wisconsin Madison
Currently there is a huge informationgap between storage systems and file sys-tems The interface exposed by storagesystems to file systems based on blocksand providing only readwrite interfacesis very narrow This leads to poor per-formance as a whole because of dupli-cated functionality in both systems and
USENIX 2002
reduced functionality resulting fromstorage systemsrsquo lack of file information
The speaker presented two enhance-ments ExRaid an exposed RAID andILFS Informed Log-Structured File Sys-tem ExRAID is an enhancement to theblock-based RAID storage system thatexposes the following three types ofinformation to the file system (1)regions ndash contiguous portions of theaddress space comprising one or multi-ple disks (2) performance informationabout queue lengths and throughput ofthe regions revealing disk heterogeneityto the file systems and (3) failure infor-mation ndash dynamically updated informa-tion conveying the number of tolerablefailures in each region
ILFS allows online incremental expan-sion of the storage space performsdynamic parallelism using ExRAIDrsquosperformance information to performwell on heterogeneous storage systemsand provides a range of different mecha-nisms with different granularities forredundancy of files using ExRAIDrsquosregion failure characteristics These newfeatures are added to LFS with only a19 increase in the code size
More informationhttpwwwcswisceduwind
MAXIMIZING THROUGHPUT IN REPLICATED
DISK STRIPING OF VARIABLE
BIT-RATE STREAMS
Stergios V Anastasiadis Duke Univer-sity Kenneth C Sevcik and MichaelStumm University of Toronto
There is an increasing demand for thecontinuous real-time streaming ofmedia files Disk striping is a commontechnique used for supporting manyconnections on the media servers Buteven with 12 million hours of meantime between failures there will be morethan one disk failure per week with 7000disks Fault tolerance can be achieved byusing data replication and reservingextra bandwidth during normal opera-tion This work focused on supportingthe most common variable bit-ratestream formats (eg MPEG)
72 Vol 27 No 5 login
The authors used prototype mediaserver EXEDRA to implement and eval-uate different replication and bandwidthreservation schemes This system sup-ports a variable bit-rate scheme doesstride-based disk allocation and is capa-ble of supporting variable-grain diskstriping Two replication policies arepresented deterministic ndash data blocksare replicated in a round-robin fashionon the secondary replicas random ndashdata blocks are replicated on the ran-domly chosen secondary replicas Twobandwidth reservation techniques arealso presented mirroring reservation ndashdisk bandwidth is reserved for both primary and replicas of the media fileduring the playback and minimumreservation ndash a more efficient scheme inwhich bandwidth is reserved only for thesum of primary data access time and themaximum of the backup data accesstimes required in each round
Experimental results showed that deter-ministic replica placement is better thanrandom placement for small disks mini-mum disk bandwidth reservation istwice as good as mirroring in thethroughput achieved and fault tolerancecan be achieved with a minimal impacton the throughput
TOOLS
Summarized by Li Xiao
SIMPLE AND GENERAL STATISTICAL PROFILING
WITH PCT
Charles Blake and Steven Bauer MIT
This talk introduced a Profile CollectionToolkit (PCT) ndash a sampling-based CPUprofiling facility A novel aspect of PCTis that it allows sampling of semanticallyrich data such as function call stacksfunction parameters local or globalvariables CPU registers or other execu-tion contexts The design objectives ofPCT were driven by user needs and theinadequacies or inaccessibility of priorsystems
The rich data collection capability isachieved via a debugger-controller pro-gram dbct1 The talk then introducedthe profiling collector and data collec-
tion activation PCT file format optionsprovide flexibility and various ways tostore exact states PCT also has analysisoptions two examples were given ndashldquoSimplerdquo and ldquoMixed User and LinuxCoderdquo
The talk then went into how generalsampling helps in the following areas adebugger-controller state machineexample script fragments a simpleexample program and general valuereports in terms of call-path histogramsnumeric value histograms and data gen-erality Related work such as gprof andExpect was then summarized The con-tribution of this work is to provide aportable general value sampling toolThe limitations are that PCT does notsupport much of strip and count inac-curacies happen because of its statisticalnature Based on this limitation -g ispreferred over strip
Possible future directions included sup-porting more debuggers such as dbxand xdb script generators for otherkinds of traces more canned reports forgeneral values and a libgdb-basedlibrary sampler PCT is available athttppdoslcsmitedupct PCT runs onalmost any UNIX-like system
ENGINEERING A DIFFERENCING AND
COMPRESSION DATA FORMAT
David G Korn and Kiem Phong VoATampT Laboratories ndash Research
This talk began with an equation ldquoDif-ferencing + Compression = Delta Com-pressionrdquo Compression removesinformation redundancy in a data setExamples are gzip bzip compress andpack Differencing encodes differencesbetween two data sets Examples are diff -e fdelta bdiff and xdelta Deltacompression compresses a data set givenanother combining differencing andcompression and reduces to pure com-pression when therersquos no commonality
After presenting an overall scheme for adelta compressor and showing deltacompression performance this talk dis-cussed the encoding format of a newlydesigned Vcdiff for delta compression A
Vcdiff instruction code table consists of256 entries of each coding up to a pair ofinstructions and recodes I-byte indicesand any additional data
The talk then showed Vcdiff rsquos perfor-mance with Web data collected fromCNN compared with W3C standardGdiff Gdiff+gzip and diff+gzip whereGdiff was computed using Vcdiff deltainstructions The results of two experi-ments ldquoFirstrdquo and ldquoSuccessiverdquo werepresented In ldquoFirstrdquo each file is com-pressed against the first file collected inldquoSuccessiverdquo each file is compressedagainst the one in the previous hourThe diff+gzip did not work well becausediff was line-oriented Vcdiff performedfavorably compared with other formats
Vcdiff is part of the Vcodex packageVcodex is a platform for all commondata transformations delta compres-sion plain compression encryptiontranscoding (eg uuencode base64) Itis structured in three layers for maxi-mum usability Base library uses Dis-plines and Methods interfaces
The code for Vcdiff can be found athttpwwwresearchattcomswtoolsThey are moving Vcdiff to the IETFstandard as a comprehensive platformfor transforming data Please refer tohttpwwwietforginternet-draftdraft-korn-vcdiff06txt
WHERE IN THE NET
Summarized by Amit Purohit
A PRECISE AND EFFICIENT EVALUATION OF
THE PROXIMITY BETWEEN WEB CLIENTS AND
THEIR LOCAL DNS SERVERS
Zhuoqing Morley Mao University ofCalifornia Berkeley Charles D CranorFred Douglis Michael RabinovichOliver Spatscheck and Jia Wang ATampTLabs ndash Research
Content Distribution Networks (CDNs)attempt to improve Web performance bydelivering Web content to end usersfrom servers located at the edge of thenetwork When a Web client requestscontent the CDN dynamically chooses aserver to route the request to usually the
73October 2002 login
C
ON
FERE
NC
ERE
PORT
Sone that is nearest the client As CDNshave access only to the IP address of thelocal DNS server (LDNS) of the clientthe CDNrsquos authoritative DNS servermaps the clientrsquos LDNS to a geographicregion within a particular network andcombines that with network and serverload information to perform CDNserver selection
This method has two limitations First itis based on the implicit assumption thatclients are close to their LDNS This maynot always be valid Second a singlerequest from an LDNS can represent adiffering number of Web clients ndash calledthe hidden load factor Knowledge of thehidden load factor can be used toachieve better load distribution
The extent of the first limitation and itsimpact on the CDN server selection isdealt with To determine the associationsbetween clients and their LDNS a sim-ple non-intrusive and efficient map-ping technique was developed The datacollected was used to study the impact ofproximity on DNS-based server selec-tion using four different proximity met-rics (1) autonomous system (AS)clustering ndash observing whether a client isin the same AS as its LDNS ndash concludedthat LDNS is very good for coarse-grained server selection as 64 of theassociations belong to the same AS(2) network clustering ndash observingwhether a client is in the same network-aware cluster (NAC) ndash implied that DNSis less useful for finer-grained serverselection since only 16 of the clientand LDNS are in the same NAC (3)traceroute divergence ndash the length of thedivergent paths to the client and itsLDNS from a probe point using tracer-oute ndash implies that most clients aretopologically close to their LDNS asviewed from a randomly chosen probesite (4) round-trip time (RTT) correla-tion (some CDNs select severs based onRTT between the CDN server and theclientrsquos LDNS) examines the correlationbetween the message RTTs from a probepoint to the client and its local DNS
results imply this is a reasonable metricto use to avoid really distant servers
A study of the impact that client-LDNSassociations have on DNS-based serverselection concludes that knowing theclientrsquos IP address would allow moreaccurate server selection in a large num-ber of cases The optimality of the serverselection also depends on the serverdensity placement and selection algo-rithms
Further information can be found athttpwwweecsberkeleyedu~zmaomyresearchhtml or by contactingzmaoeecsberkeleyedu
GEOGRAPHIC PROPERTIES OF INTERNET
ROUTING
Lakshminarayanan SubramanianVenkata N Padmanabhan MicrosoftResearch Randy H Katz University ofCalifornia at Berkeley
Geographic information can provideinsights into the structure and function-ing of the Internet including interac-tions between different autonomoussystems by analyzing certain propertiesof Internet routing It can be used tomeasure and quantify certain routingproperties such as circuitous routinghot-potato routing and geographic faulttolerance
Traceroute has been used to gather therequired data and Geotrack tool hasbeen used to determine the location ofthe nodes along each network path Thisenables the computation of ldquolinearizeddistancesrdquo which is the sum of the geo-graphic distances between successivepairs of routers along the path
In order to measure the circuitousnessof a path a metric ldquodistance ratiordquo hasbeen defined as the ratio of the lin-earized distance of a path to the geo-graphic distance between the source anddestination of the path From the data ithas been observed that the circuitous-ness of a route depends on both geo-graphic and network location of the endhosts A large value of the distance ratioenables us to flag paths that are highly
USENIX 2002
circuitous possibly (though not neces-sarily) because of routing anomalies Itis also shown that the minimum delaybetween end hosts is strongly correlatedwith the linearized distance of the path
Geographic information can be used tostudy various aspects of wide-area Inter-net paths that traverse multiple ISPs Itwas found that end-to-end Internetpaths tend to be more circuitous thanintra-ISP paths the cause for this beingthe peering relationships between ISPsAlso paths traversing substantial dis-tances within two or more ISPs tend tobe more circuitous than paths largelytraversing only a single ISP Anotherfinding is that ISPs generally employhot-potato routing
Geographic information can also beused to capture the fact that two seem-ingly unrelated routers can be suscepti-ble to correlated failures By using thegeographic information of routers wecan construct a geographic topology ofan ISP Using this we can find the toler-ance of an ISPrsquos network to the total net-work failure in a geographic region
For further information contactlakmecsberkeleyedu
PROVIDING PROCESS ORIGIN INFORMATION
TO AID IN NETWORK TRACEBACK
Florian P Buchholz Purdue UniversityClay Shields Georgetown University
Network traceback is currently limitedbecause host audit systems do not main-tain enough information to matchincoming network traffic to outgoingnetwork traffic The talk presented analternative assigning origin informationto every process and logging it duringinteractive login creation
The current implementation concen-trates mainly on interactive sessions inwhich an event is logged when a newconnection is established using SSH orTelnet The method is effective andcould successfully determine steppingstones and reliably detect the source of aDDoS attack The speaker then talkedabout the related work done in this area
74 Vol 27 No 5 login
which led to discussion of the latestdevelopment in the area of ldquohostcausalityrdquo
The speaker ended with the limitationsand future directions of the frameworkThis framework could be extended andcould find many applications in the areaof security Current implementationdoesnrsquot take care of startup scripts andcron jobs but incorporating the origininformation in FS could solve this prob-lem In the current implementation log-ging is just implemented as a proof ofconcept It could be made safe in manyways and this could be another impor-tant aspect of future work
PROGRAMMING
Summarized by Amit Purohit
CYCLONE A SAFE DIALECT OF C
Trevor Jim ATampT Labs ndash Research GregMorrisett Dan Grossman MichaelHicks James Cheney Yanling WangCornell University
Cyclone is designed to provide protec-tion against attacks such as buffer over-flows format string attacks andmemory management errors The cur-rent structure of C allows programmersto write vulnerable programs Cycloneextends C so that it has the safety guar-antee of Java while keeping C syntaxtypes and semantics untouched
The Cyclone compiler performs staticanalysis of the source code and insertsruntime checks into the compiled out-put at places where the analysis cannotdetermine that the code execution willnot violate safety constraints Cycloneimposes many restrictions to preservesafety such as NULL checksThesechecks do not exist in normal C
The speaker then talked about somesample code written in Cyclone and howit tackles safety problems Porting anexisting C application to Cyclone ispretty easy with fewer than 10 changerequired The current implementationconcentrates more on safety than onperformance ndash hence the performance
penalty is substantial Cyclone was ableto find many lingering bugs The finalstep of the project will be to write acompiler to convert a normal C programto an equivalent Cyclone program
COOPERATIVE TASK MANAGEMENT WITHOUT
MANUAL STACK MANAGEMENT
Atul Adya Jon Howell MarvinTheimer William J Bolosky John RDouceur Microsoft Research
The speaker described the definitionsand motivations behind the distinctconcepts of task management stackmanagement IO management conflictmanagement and data partitioningConventional concurrent programminguses preemptive task management andexploits the automatic stack manage-ment of a standard language In the sec-ond approach cooperative tasks areorganized as event handlers that yieldcontrol by returning control to the eventscheduler manually unrolling theirstacks In this project they have adopteda hybrid approach that makes it possiblefor both stack management styles tocoexist Thus programmers can codeassuming one or the other of these stackmanagement styles will operate depend-ing upon the application The speakeralso gave a detailed example of how touse their mechanism to switch betweenthe two styles
The talk then continued with the imple-mentation They were able to preemptmany subtle concurrency problems byusing cooperative task managementPaying a cost up-front to reduce a subtlerace condition proved to be a goodinvestment
Though the choice of task managementis fundamental the choice of stack man-agement can be left to individual tasteThis project enables use of any type ofstack management in conjunction withcooperative task management
IMPROVING WAIT-FREE ALGORITHMS FOR
INTERPROCESS COMMUNICATION IN
EMBEDDED REAL-TIME SYSTEMS
Hai Huang Padmanabhan Pillai KangG Shin University of Michigan
The main characteristic of the real-timetime-sensitive system is its pre-dictable response time But concurrencymanagement creates hurdles to achiev-ing this goal because of the use of locksto maintain consistency To solve thisproblem many wait-free algorithmshave been developed but these are typi-cally high-cost solutions By takingadvantage of the temporal characteris-tics of the system however the time andspace overhead can be reduced
The speaker presented an algorithm fortemporal concurrency control andapplied this technique to improve threewait-free algorithms A single-writermultiple-reader wait-free algorithm anda double-buffer algorithm were pro-posed
Using their transformation mechanismthey achieved an improvement of17ndash66 in ACET and a 14ndash70 reduc-tion in memory requirements for IPCalgorithms This mechanism is extensi-ble and can be applied to other non-blocking IPC algorithms as well Futurework involves reducing the synchroniz-ing overheads in more general IPC algo-rithms with multiple-writer semantics
MOBILITY
Summarized by Praveen Yalagandula
ROBUST POSITIONING ALGORITHMS FOR
DISTRIBUTED AD-HOC WIRELESS SENSOR
NETWORKS
Chris Savarese Jan Rabaey Universityof California Berkeley Koen Langen-doen Delft University of Technology
The Pico Radio Network comprisesmore than a hundred sensors monitorsand actuators equipped with wirelessconnectivity with the following proper-ties (1) no infrastructure (2) computa-tion in a distributed fashion instead ofcentralized computation (3) dynamictopology (4) limited radio range
75October 2002 login
C
ON
FERE
NC
ERE
PORT
S(5) nodes that can act as repeaters(6) devices of low cost and with lowpower and (7) sparse anchor nodes ndashnodes with GPS capability A positioningalgorithm determines the geographicalposition of each node in the network
In two-dimensional space each nodeneeds three reference positions to esti-mate its geographical position There aretwo problems that make positioning dif-ficult in the Pico Radio Network typesetting (1) a sparse anchor node prob-lem and (2) a range error problem
The two-phase approach that theauthors have taken in solving the prob-lem consists of (1) a hop-terrain algo-rithm ndash in this first phase each noderoughly guesses its location by the dis-tance calculated using multi-hops to theanchor points and (2) a refinementalgorithm ndash in this second phase eachnode uses its neighborsrsquo positions torefine its own position estimate Toguarantee the convergence thisapproach uses confidence metricsassigning a value of 10 for anchor nodesand 01 to start for other nodes andincreasing these with each iteration
A simulation tool OMNet++ was usedfor both phases and in various scenariosThe results show that they achievedposition errors of less than 33 in a sce-nario with 5 anchor nodes an averageconnectivity of 7 and 5 range mea-surement error
Guidelines for anchor node deploymentare high connectivity (gt10) a reason-able fraction of anchor nodes (gt 5)and for anchor placement coverededges
APPLICATION-SPECIFIC NETWORK
MANAGEMENT FOR ENERGY-AWARE
STREAMING OF POPULAR MULTIMEDIA
FORMATS
Surendar Chandra University of Geor-gia and Amin Vahdat Duke University
The main hindrance in supporting theincreasing demand for mobile multime-dia on PDA is the battery capacity of the
small devices The idle-time power con-sumption is almost the same as thereceive power consumption on typicalwireless network cards (1319mW vs1425mW) while the sleep state con-sumes far less power (177mW) To letthe system consume energy proportionalto the stream quality the network cardshould transition to sleep state aggres-sively between each packet
The speaker presented previous workwhere different multimedia streamswere studied for PDAs MS Media RealMedia and Quicktime It was found thatif the inter-packet gap is predictablethen huge savings are possible forexample about 80 savings in MSmedia streams with few packet lossesThe limitation of the IEEE 80211bpower-save mode comes into play whenthere are two or more nodes producinga delay between beacon time and whenthe node receivessends packets Thisbadly affects the streams since higherenergy is consumed while waiting
The authors propose traffic shaping forenergy conservation where this is doneby proxy such that packets arrive at reg-ular intervals This is achieved using alocal proxy in the access point and aclient-side proxy in the mobile hostSimulations show that traffic shapingreduces energy consumption and alsoreveals that the higher bandwidthstreams have a lower energy metric(mJkB)
More information is at httpgreenhousecsugaedu
CHARACTERIZING ALERT AND BROWSE SER-
VICES OF MOBILE CLIENTS
Atul Adya Paramvir Bahl and Lili QiuMicrosoft Research
Even though there is a dramatic increasein Internet access from wireless devicesthere are not many studies done oncharacterizing this traffic In this paperthe authors characterize the trafficobserved on the MSN Web site withboth notification and browse traces
USENIX 2002
Around 33 million browsing accessesand about 325 million notificationentries are present in the traces Threetypes of analyses are done for each oneof these two services content analysisconcerning the most popular contentcategories and their distribution popu-larity analysis or the popularity distri-bution of documents and user behavioranalysis
The analysis of the notification logsshows that document access rates followa Zipf-like distribution with most of theaccesses concentrated on a small num-ber of messages and the accesses exhibitgeographical locality ndash users from samelocality tend to receive similar notifica-tion content The browser log analysisshows that a smaller set of URLs areaccessed most times though the accesspattern does not fit any Zipf curve andthe highly accessed URLS remain stableA correlation study between the notifi-cation and browsing services shows thatwireless users have a moderate correla-tion of 012
FREENIX TRACK SESSIONS
BUILDING APPLICATIONS
Summarized by Matt Butner
INTERACTIVE 3-D GRAPHICS APPLICATIONS
FOR TCL
Oliver Kersting Juumlrgen Doumlllner HassoPlattner Institute for Software SystemsEngineering University of Potsdam
The integration of 3-D image renderingfunctionality into a scripting languagepermits interactive and animated 3-Ddevelopment and application withoutthe formalities and precision demandedby low-level CC++ graphics and visual-ization libraries The large and complexC++ API of the Virtual Rendering Sys-tem (VRS) can be combined with theconveniences of the Tcl scripting lan-guage The mapping of class interfaces isdone via an automated process and gen-erates respective wrapper classes all ofwhich ensures complete API accessibilityand functionality without any signifi-
76 Vol 27 No 5 login
cant performance retribution The gen-eral C++-to-Tcl mapper SWIG grantsthe necessary object and hierarchal abili-ties without the use of object-orientedTcl extensions The final API mappingtechniques address C++ features such asclasses overloaded methods enumera-tions and inheritance relations All areimplemented in a proof-of-concept thatmaps the complete C++ API of the VRSto Tcl and are showcased in a completeinteractive 3-D-map system
THE AGFL GRAMMAR WORK LAB
Cornelis HA Coster Erik VerbruggenUniversity of Nijmegen (KUN)
The growth and implementation of Nat-ural Language Processing (NLP) is thecornerstone of the continued evolutionand implementation of truly intelligentsearch machines and services In partdue to the growing collections of com-puter-stored human-readable docu-ments in the public domain theimplementation of linguistic analysiswill become necessary for desirable pre-cision and resolution Subsequentlysuch tools and linguistic resources mustbe released into the public domain andthey have done so with the release of theAGFL Grammar Work Lab under theGNU Public License as a tool for lin-guistic research and the development ofNLP-based applications
The AGFL (Affix Grammars over aFinite Lattice) Grammar Work Labmeshes context-free grammars withfinite set-valued features that are accept-able to a range of languages In com-puter science terms ldquoSyntax rules areprocedures with parameters and a non-deterministic executionrdquo The EnglishPhrases for Information Retrieval(EP4IR) released with the AGFL-GWLas a robust grammar of English is anAGFL-GWL generated English parserthat outputs ldquoHeadModifiedrdquo framesThe sentences ldquoCompanyX sponsoredthis conferencerdquo and ldquoThis conferencewas sponsored by CompanyXrdquo both gen-erate [CompanyX[sponsored confer-
ence]] The hope is that public availabil-ity of such tools will encourage furtherdevelopment of grammar and lexicalsoftware systems
SWILL A SIMPLE EMBEDDED WEB SERVER
LIBRARY
Sotiria Lampoudi David M BeazleyUniversity of Chicago
This paper won the FREENIX Best Stu-dent Paper award
SWILL (Simple Web Interface LinkLibrary) is a simple Web server in theform of a C library whose developmentwas motivated by a wish to give coolapplications an interface to the Web TheSWILL library provides a simple inter-face that can be efficiently implementedfor tasks that vary from flexible Web-based monitoring to software debuggingand diagnostics Though originallydesigned to be integrated with high-per-formance scientific simulation softwarethe interface is generic enough to allowfor unbounded uses SWILL is a single-threaded server relying upon non-blocking IO through the creation of atemporary server which services IOrequests SWILL does not provide SSL orcryptographic authentication but doeshave HTTP authentication abilities
A fantastic feature of SWILL is its sup-port for SPMD-style parallel applica-tions which utilize MPI provingvaluable for Beowulf clusters and largeparallel machines Another practicalapplication was the implementation ofSWILL in a modified Yalnix emulator byUniversity of Chicago Operating Sys-tems courses which utilized the addedYalnix functionality for OS developmentand debugging SWILL requires minimalmemory overhead and relies upon theHTTP10 protocol
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
altitude flight runs from the 20 AirRoute Traffic Control Centers (ARTCCpronounced ldquoartsyrdquo) and traffic flowmanagement which is the strategic armEach area has sectors for low high andvery-high flight Each sector has a con-troller team including one person onthe microphone and handles between12 and 16 aircraft at a time Since thenumber of sectors and areas is limitedand fixed therersquos limited system capac-ity The events of September 11 2001gave us a respite in terms of systemusage but based on path growth pat-terns the air traffic system will be over-subscribed within two to three yearsHow do we handle this oversubscrip-tion
Air Traffic Management (ATM) Deci-sion Support Tools (DST) use physicsaeronautics heuristics (expert systems)fuzzy logic and neural nets to help the(human) aircraft controllers route air-craft around The rest of the talk focusedon capacity issues but the DST alsohandle safety and security issues Thesoftware follows open standards (ISOPOSIX and ANSI) The team at NASAAmes made Center-TRACON Automa-tion System (CTAS) which is softwarefor each of the ARTCCs portable fromSolaris to HP-UX and Linux as wellUnlike just about every other major soft-ware project this one really is standardand portable co-presenter Rob Savoyehas experience in maintaining gcc onmultiple platforms and is the projectlead on the portability and standardsissues for the code CTAS allows theARTCCs to upgrade and enhance indi-vidual aspects or parts of the system itisnrsquot a monolithic all-or-nothing entitylike the old ATM systems
Some future areas of research include ahead-mounted augmented reality devicefor tower operators to improve their sit-uational awareness by automatinghuman factors and new digital globalpositioning system (DGPS) technologieswhich are accurate within inches insteadof feet
64 Vol 27 No 5 login
Questions centered around advancedavionics (eg getting rid of ground con-trol) cooperation between the US andEurope for software development (wersquoreworking together on software develop-ment but the various European coun-triesrsquo controllers donrsquot talk well to eachother) and privatization
ADVENTURES IN DNS
Bill Manning ISI
Summarized by Brennan Reynolds
Manning began by posing a simplequestion is DNS really a critical infra-structure The answer is not simple Per-haps seven years ago when the Internetwas just starting to become popular theanswer was a definite no But today withIPv6 being implemented and so manytransactions being conducted over theInternet the question does not have aclear-cut answer The Internet Engineer-ing Task Force (IETF) has several newmodifications to the DNS service thatmay be used to protect and extend itsusability
The first extension Manning discussedwas Internationalized Domain Names(IDN) To date all DNS records arebased on the ASCII character set butmany addresses in the Internetrsquos globalnetwork cannot be easily written inASCII characters The goal of IDN is toprovide encoding for hostnames that isfair efficient and allows for a smoothtransition from the current scheme Thework has resulted in two encodingschemes ACE and UTF-8 Each encod-ing is independent of the other but theycan be used together in various combi-nations Manning expressed his opinionthat while neither is an ideal solutionACE appears to be the lesser of two evilsA major hindrance getting IDN rolledout into the Internetrsquos DNS root struc-ture is the increase in zone file complex-ity
Manningrsquos next topic was the inclusionof IPv6 records In an IPv4 world it ispossible for administrators to rememberthe numeric representation of anaddress IPv6 makes the addresses too
long and complex to be easily remem-bered This means that DNS will play avital role in getting IPv6 deployed Sev-eral new resource records (RR) havebeen proposed to handle the translationincluding AAAA A6 DNAME and BIT-STRING Manning commented on thedifference between IPv4 and v6 as atransport protocol and that systemstuned for v4 traffic will suffer a perfor-mance hit when using v6 This is largelyattributed to the increase in data trans-mitted per DNS request
The final extension Manning discussedwas DNSSec He introduced this exten-sion as a mechanism that protects thesystem from itself DNSSec protectsagainst data spoofing and providesauthentication between servers Itincludes a cryptographic signature ofthe RR set to ensure authenticity andintegrity By signing the entire set theamount of computation is kept to aminimum The information itself isstored in a new RR within each zone fileon the DNS server
Manning briefly commented on the useof DNS to provide a PKI infrastructurestating that that was not the purpose ofDNS and therefore it should not be usedin that fashion The signing of the RRsets can be done hierarchically resultingin the use of a single trusted key at theroot of the DNS tree to sign all sets tothe leafs However the job of keyreplacement and rollover is extremelydifficult for a system that is distributedacross the globe
Manning stated that in an operation testbed with all of these extensions enabledthe packet size for a single queryresponse grew from 518 bytes to largerthan 18000 This results in a largeincrease of bandwidth usage for highvolume name servers and puts There-fore in Manningrsquos opinion not all ofthese features will be deployed in thenear future For those looking for moreinformation Billrsquos Web site can be foundat httpwwwisieduotdr
THE JOY OF BREAKING THINGS
Pat Parseghian Transmeta
Summarized by Juan Navarro
Pat Parseghian described her experiencein testing the Crusoe microprocessor atthe Transmeta Lab for Compatibility(TLC) where the motto is ldquoYou make it we break itrdquo and the goal is to makeengineers miserable
She first gave an overview of Crusoewhose most distinctive feature is a codemorphing layer that translates x86instructions into native VLIW Suchpeculiarity makes the testing processparticularly challenging because itinvolves testing two processors (theldquoexternalrdquo x86 and the ldquointernalrdquo VLIW)and also because of variations on howthe code-morphing layer works (it mayinterpret code the first few times it seesan instruction sequence and translateand save in a translation cache after-wards) There are also reproducibilityissues since the VLIW processor mayrun at different speeds to save energy
Pat then described the tests that theysubject the processor to (hardware com-patibility tests common applicationsoperating systems envelope-pushinggames and hard-to-acquire legacy appli-cations) and the issues that must be con-sidered when defining testing policiesShe gave some testing tips includingorganizational issues like trackingresources and results
To illustrate the testing process Pat gavea list of possible causes of a system crashthat must be investigated If it is the sili-con then it might be because it is dam-aged or because of a manufacturingproblem or a design flaw If the problemis in the code-morphing layer is thefault with the interpreter or with thetranslator The fault could also be exter-nal to the processor it could be thehomemade system BIOS a faulty main-board or an operator error Or Trans-meta might not be at fault at all thecrash might be due to a bug in the OS orthe application The key to pinpointing
65October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe cause of a problem is to isolate it byidentifying the conditions that repro-duce the problem and repeating thoseconditions in other test platforms andnon-Crusoe systems Then relevant fac-tors must be identified based on productknowledge past experience and com-mon sense
To conclude Pat suggested that some ofTLCrsquos testing lessons can be applied toproducts that the audience was involvedin and assured us that breaking things isfun
TECHNOLOGY LIBERTY FREEDOM AND
WASHINGTON
Alan Davidson Center for Democracyand Technology
Summarized by Steve Bauer
ldquoExperience should teach us to be moston our guard to protect liberty when theGovernmentrsquos purposes are beneficentMen born to freedom are naturally alertto repel invasion of their liberty by evil-minded rulers The greatest dangers toliberty lurk in insidious encroachmentby men of zeal well-meaning but with-out understandingrdquo ndash Louis BrandeisOlmstead v US
Alan Davison an associate director atthe CDT (httpwwwcdtorg) since1996 concluded his talk with this quoteIn many ways it aptly characterizes theimportance to the USENIX communityof the topics he covered The majorthemes of the talk were the impact oflaw on individual liberty system archi-tectures and appropriate responses bythe technical community
The first part of the talk provided theaudience with an overview of legislationeither already introduced or likely to beintroduced in the US Congress Thisincluded various proposals to protectchildren such as relegating all sexualcontent to a xxx domain or providingkid-safe zones such as kidsus Othersimilar laws discussed were the Chil-drenrsquos Online Protection Act and legisla-tion dealing with virtual child pornography
Other lower-profile pieces covered werelaws prohibiting false contact informa-tion in emails and domain registrationdatabases Similar proposals exist thatwould prohibit misleading subject linesin emails and require messages to clearlyidentify if they are advertisementsBriefly mentioned were laws impactingonline gambling
Bills and laws affecting the architecturaldesign of systems and networks and var-ious efforts to establish boundaries andborders in cyberspace were then dis-cussed These included the ConsumerBroadband and Television PromotionAct and the French cases against Yahooand its CEO for allowing the sale of Nazimaterials on the Yahoo auction site
A major part of the talk focused on thetechnological aspects of the USA-PatriotAct passed in the wake of the September11 attacks Topics included ldquopen regis-tersrdquo for the Internet expanded govern-ment power to conduct secret searchesroving wiretaps nationwide service ofwarrants sharing grand jury and intelli-gence information and the establish-ment of an identification system forvisitors to the US
The technological community needs tobe aware and care about the impact oflaw on individual liberty and the systemsthe community builds Specific sugges-tions included designing privacy andanonymity into architectures and limit-ing information collection particularlyany personally identifiable data to onlywhat is essential Finally Alan empha-sized that many people simply do notunderstand the implications of lawThus the technology community has aimportant role in helping make theseimplications clear
TAKING AN OPEN SOURCE PROJECT TO
MARKET
Eric Allman Sendmail Inc
Summarized by Scott Kilroy
Eric started the talk with the upsides anddownsides to open source software Theupsides include rapid feedback high-
USENIX 2002
quality work (mostly) lots of hands andan intensely engineering-drivenapproach The downsides include nosupport structure limited project mar-keting expertise and volunteers havinglimited time
In 1998 Sendmail was becoming a suc-cess disaster A success disaster includestwo key elements (1) volunteers startspending all their time supporting cur-rent functionality instead of enhancing(this heavy burden can lead to projectstagnation and eventually death)(2) competition for the better-fundedsources can lead to FUD (fear uncer-tainty and doubt) about the project
Eric then led listeners down the path hetook to turn Sendmail into a businessEric envisioned Sendmail Inc as a small-ish and close-knit company but soonrealized that the family atmosphere hedesired could not last as the companygrew He observed that maybe only thefirst 10 people will buy into your visionafter which each individual is primarilyinterested in something else (eg powerwealth or status) He warns that therewill be people working for you that youdo not like Eric particularly did not carefor sales people but he emphasized howimportant sales marketing finance andlegal people are to companies
The nature of companies in infancy canbe summed up as you need money andyou need investors in order to getmoney Investors want a return oninvestment so money managementbecomes critical Companies thereforemust optimize money functions Ericrsquosexperience with investors were lessonsall in themselves More than givingmoney good investors can provide con-nections and insight so you donrsquot wantinvestors who donrsquot believe in you
A company must have bug historychange management and people whounderstand the product and have a senseof the target market Eric now sees theimportance of marketing targetresearch A company can never forget
66 Vol 27 No 5 login
the simple equation that drives businessprofit = revenue - expense
Finally the concept of value should bebased on what is valuable to the cus-tomer Eric observed that his customersneeded documentation extra value inthe product that they couldnrsquot get else-where and technical support
Eric learned hard lessons along the wayIf you want to do interesting opensource it might be best not to be toosuccessful You donrsquot want to let thelarger dragons (companies) notice youIf you want a commercial user base youhave to manage process watch corpo-rate culture provide value to customerswatch bottom lines and develop a thickskin
Starting a company is easy but perpetu-ating it is extremely difficult
INFORMATION VISUALIZATION FOR
SYSTEMS PEOPLE
Tamara Munzner University of BritishColumbia
Summarized by JD Welch
Munzner presented interesting evidencethat information visualization throughinteractive representations of data canhelp people to perform tasks more effec-tively and reduce the load on workingmemory
A popular technique in visualizing datais abstracting it through node-link rep-resentation ndash for example a list of cross-references in a book Representing therelationships between references in agraphical (node-link) manner ldquooffloadscognition to the human perceptual sys-temrdquo These graphs can be producedmanually but automated drawing allowsfor increased complexity and quickerproduction times
The way in which people interpret visualdata is important in designing effectivevisualization systems Attention to pre-attentive visual cues are key to successPicking out a red dot among a field ofidentically shaped blue dots is signifi-cantly faster than the same data repre-
sented in a mixture of hue and shapevariations There are many pre-attentivevisual cues including size orientationintersection and intensity The accuracyof these cues can be ranked with posi-tion triggering high accuracy and coloror density on the low end
Visual cues combined with differentdata types ndash quantitative (eg 10inches) ordered (small medium large)and categorical (apples oranges) ndash canalso be ranked with spatial positionbeing best for all types Beyond thiscommonality however accuracy variedwidely with length ranked second forquantitative data eighth for ordinal andninth for categorical data types
These guidelines about visual perceptioncan be applied to information visualiza-tion which unlike scientific visualiza-tion focuses on abstract data and choiceof specialization Several techniques areused to clarify abstract data includingmulti-part glyphs where changes inindividual parts are incorporated into aneasier to understand gestalt interactiv-ity motion and animation In large datasets techniques like ldquofocus + contextrdquowhere a zoomed portion of the graph isshown along with a thumbnail view ofthe entire graph are used to minimizeuser disorientation
Future problems include dealing withhuge databases such as the HumanGenome reckoning dynamic data likethe changing structure of the Web orreal-time network monitoring andtransforming ldquopixel boundrdquo displays intolarge ldquodigital wallpaperrdquo-type systems
FIXING NETWORK SECURITY BY HACKING
THE BUSINESS CLIMATE
Bruce Schneier Counterpane InternetSecurity
Summarized by Florian Buchholz
Bruce Schneier identified security as oneof the fundamental building blocks ofthe Internet A certain degree of securityis needed for all things on the Net andthe limits of security will become limitsof the Net itself However companies are
hesitant to employ security measuresScience is doing well but one cannot seethe benefits in the real world Anincreasing number of users means moreproblems affecting more people Oldproblems such as buffer overflowshavenrsquot gone away and new problemsshow up Furthermore the amount ofexpertise needed to launch attacks isdecreasing
Schneier argues that complexity is theenemy of security and that while secu-rity is getting better the growth in com-plexity is outpacing it As security isfundamentally a people problem oneshouldnrsquot focus on technologies butrather should look at businesses busi-ness motivations and business costsTraditionally one can distinguishbetween two security models threatavoidance where security is absoluteand risk management where security isrelative and one has to mitigate the riskwith technologies and procedures In thelatter model one has to find a pointwith reasonable security at an acceptablecost
After a brief discussion on how securityis handled in businesses today in whichhe concluded that businesses ldquotalk bigabout it but do as little as possiblerdquoSchneier identified four necessary stepsto fix the problems
First enforce liabilities Today no realconsequences have to be feared fromsecurity incidents Holding peopleaccountable will increase the trans-parency of security and give an incentiveto make processes public As possibleoptions to achieve this he listed indus-try-defined standards federal regula-tion and lawsuits The problems withenforcement however lie in difficultiesassociated with the international natureof the problem and the fact that com-plexity makes assigning liabilities diffi-cult Furthermore fear of liability couldhave a chilling effect on free softwaredevelopment and could stifle new com-panies
67October 2002 login
C
ON
FERE
NC
ERE
PORT
SAs a second step Schneier identified theneed to allow partners to transfer liabil-ity Insurance would spread liability riskamong a group and would be a CEOrsquosprimary risk analysis tool There is aneed for standardized risk models andprotection profiles and for more securi-ties as opposed to empty press releasesSchneier predicts that insurance will cre-ate a marketplace for security where cus-tomers and vendors will have the abilityto accept liabilities from each otherComputer-security insurance shouldsoon be as common as household or fireinsurance and from that developmentinsurance will become the driving factorof the security business
The next step is to provide mechanismsto reduce risks both before and aftersoftware is released techniques andprocesses to improve software quality aswell as an evolution of security manage-ment are therefore needed Currentlysecurity products try to rebuild the wallsndash such as physical badges and entrydoorways ndash that were lost when gettingconnected Schneier believes thisldquofortress metaphorrdquo is bad one shouldthink of the problem more in terms of acity Since most businesses cannot affordproper security outsourcing is the onlyway to make security scalable With out-sourcing the concenpt of best practicesbecomes important and insurance com-panies can be tied to them outsourcingwill level the playing field
As a final step Schneier predicts thatrational prosecution and education willlead to deterrence He claims that peoplefeel safe because they live in a lawfulsociety whereas the Internet is classifiedas lawless very much like the AmericanWest in the 1800s or a society ruled bywarlords This is because it is difficult toprove who an attacker is prosecution ishampered by complicated evidencegathering and irrational prosecutionSchneier believes however that educa-tion will play a major role in turning theInternet into a lawful society Specifi-cally he pointed out that we need lawsthat can be explained
Risks wonrsquot go away the best we can dois manage them A company able tomanage risk better will be more prof-itable and we need to give CEOs thenecessary risk-management tools
There were numerous questions Listedbelow are the more complicated ones ina paraphrased QampA format
Q Will liability be effective will insur-ance companies be willing to accept therisks A The government might have tostep in it needs to be seen how it playsout
Q Is there an analogy to the real worldin the fact that in a lawless society onlythe rich can afford security Securitysolutions differentiate between classesA Schneier disagreed with the state-ment giving an analogy to front-doorlocks but conceded that there might bespecial cases
Q Doesnrsquot homogeneity hurt securityA Homogeneity is oversold Diversetypes can be more survivable but giventhe limited number of options the dif-ference will be negligible
Q Regulations in airbag protection haveled to deaths in some cases How can wekeep the pendulum from swinging theother way A Lobbying will not be pre-vented An imperfect solution is proba-ble there might be reversals ofrequirements such as the airbag one
Q What about personal liability A Thiswill be analogous to auto insurance lia-bility comes with computerNet access
Q If the rule of law is to become realitythere must be a law enforcement func-tion that applies to a physical space Youcannot do that with any existing govern-ment agency for the whole worldShould an organization like the UNassume that role A Schneier was notconvinced global enforcement is possi-ble
Q What advice should we take awayfrom this talk A Liability is comingSince the network is important to ourinfrastructure eventually the problems
USENIX 2002
will be solved in a legal environmentYou need to start thinking about how tosolve the problems and how the solu-tions will affect us
The entire speech (including slides) isavailable at httpwwwcounterpanecompresentation4pdf
LIFE IN AN OPEN SOURCE STARTUP
Daryll Strauss Consultant
Summarized by Teri Lampoudi
This talk was packed with morsels ofinsight and tidbits of information aboutwhat life in an open source startup islike (though ldquowas likerdquo might be moreappropriate) what the issues are instarting maintaining and commercial-izing an open source project and theway hardware vendors treat open sourcedevelopers Strauss began by tracing thetimeline of his involvement with devel-oping 3-D graphics support for Linuxfrom the 3dfx voodoo1 driver for Linuxin 1997 to the establishment of Preci-sion Insight in 1998 to its buyout by VALinux to the dismantling of his groupby VA to what the future may hold
Obviously there are benefits from doingopen source development both for thedeveloper and for the end user Animportant point was that open sourcedevelops entire technologies not justproducts A misapprehension is theinherent difficulty of managing a projectthat accepts code from a large number ofdevelopers some volunteer and somepaid The notion that code can just bethrown over the wall into the world is aBig Lie
How does one start and keep afloat anopen source development company Inthe case of Precision Insight the subjectmatter ndash developing graphics andOpenGL for Linux ndash required a consid-erable amount of expertise intimateknowledge of the hardware the librariesX and the Linux kernel as well as theend applications Expertise is mar-ketable What helps even further is hav-ing a visible virtual team of expertspeople who have an established trackrecord of contributions to open source
68 Vol 27 No 5 login
in the particular area of expertise Sup-port from the hardware and softwarevendors came next While everyone waswilling to contract PI to write the por-tions of code that were specific to theirhardware no one wanted to pay the billfor developing the underlying founda-tion that was necessary for the drivers tobe useful In the PI model the infra-structure cost was shared by several cus-tomers at once and the technology keptmoving Once a driver is written for avendor the vendor is perfectly capableof figuring out how to write the nextdriver necessary thereby obviating theneed to contract the job This and ques-tions of protecting intellectual propertyndash hardware design in this case ndash aredeeply problematic with the open sourcemode of development This is not to saythat there is no way to get over thembut that they are likely to arise moreoften than not
Management issues seem equally impor-tant In the PI setting contracts wereflexible ndash both a good and a bad thing ndashand developers overworked Furtherdevelopment ldquoin a fishbowlrdquo that isunder public scrutiny is not easyStrauss stressed the value of good com-munication and the use of various out-of-sync communication methods likeIRC and mailing lists
Finally Strauss discussed what portionsof software can be free and what can beproprietary His suggestion was that hor-izontal markets want to be free whereasvertically one can develop proprietarysolutions The talk closed with a smallstroll through the problems that projectpolitics brings up from things likemerging branches to mailing list andsource tree readability
GENERAL TRACK SESSIONS
FILE SYSTEMS
Summarized by Haijin Yan
STRUCTURE AND PERFORMANCE OF THE
DIRECT ACCESS FILE SYSTEM
Kostas Magoutis Salimah AddetiaAlexandra Fedorova and Margo ISeltzer Harvard University JeffreyChase Andrew Gallatin Richard Kisleyand Rajiv Wickremesinghe Duke Uni-versity and Eran Gabber Lucent Tech-nologies
This paper won the Best Paper award
The Direct Access File System (DAFS) isa new standard for network-attachedstorage over direct-access transport net-works DAFS takes advantage of user-level network interface standards thatenable client-side functionality in whichremote data access resides in a libraryrather than in the kernel This reducesthe overhead of memory copy for datamovement and protocol overheadRemote Direct Memory Access (RDMA)is a direct access transport network thatallows the network adapter to reducecopy overhead by accessing applicationbuffers directly
This paper explores the fundamentalstructural and performance characteris-tics of network file access using a user-level file system structure on adirect-access transport network withRDMA It describes DAFS-based clientand server reference implementationsfor FreeBSD and reports experimentalresults comparing DAFS to a zero-copyNFS implementation It illustrates thebenefits and trade-offs of these tech-niques to provide a basis for informedchoices about deployment of DAFS-based systems and similar extensions toother network file protocols such asNFS Experiments show that DAFS givesapplications direct control over an IOsystem and increases the client CPUrsquosusage while the client is doing IOFuture work includes how to addresslongstanding problems related to theintegration of the application and filesystem for high-performance applica-tions
CONQUEST BETTER PERFORMANCE THROUGH
A DISKPERSISTENT-RAM HYBRID FILE
SYSTEM
An-I A Wang Peter Reiher and GeraldJ Popek UCLA Geoffrey M Kuenning Harvey Mudd College
Motivated by the declining cost of per-sistent RAM the authors propose theConquest file system which stores allsmall files metadata executables andshared libraries in persistent RAM diskshold only the data content of remaininglarge files Compared to alternativessuch as caching and RAM file systemsConquest has the advantages of effi-ciency consistency and reliability at areduced cost Using popular bench-marks experiments show that Conquestincurs little overhead while achievingfaster performance Future workincludes designing mechanisms foradjusting file-size threshold dynamicallyand finding a better disk layout for largedata blocks
EXPLOITING GRAY-BOX KNOWLEDGE OF
BUFFER-CACHE MANAGEMENT
Nathan C Burnett John Bent AndreaC Arpaci-Dusseau and Remzi HArpaci-Dusseau University of Wisconsin Madison
Knowing what algorithm is used tomanage the buffer cache is very impor-tant for improving application perfor-mance However there is currently nointerface for finding this algorithm Thispaper introduces Dust a simple finger-printing tool that is able to identify thebuffer-cache replacement policy Dustautomatically identifies the cache sizeand replacement policy based on theconfiguring attributes of access ordersrecency frequency and long-term his-tory Through simulation Dust was ableto distinguish between a variety ofreplacement algorithm policies found inthe literature FIFO LRU LFU ClockSegmented FIFO 2Qand LRU-K Fur-ther experiments of fingerprinting realoperating system such as NetBSDLinux and Solaris show that Dust is ableto identify the hidden cache replacementalgorithm
69October 2002 login
C
ON
FERE
NC
ERE
PORT
SBy knowing the underlying cachereplacement policy a cache-aware Webserver can reschedule the requests on acached request-first policy to obtain per-formance improvement Experimentsshow that cache-aware schedulingimproves average response time and sys-tem throughput
OPERATING SYSTEMS
(AND DANCING BEARS)
Summarized by Matt Butner
THE JX OPERATING SYSTEM
Michael Golm Meik Felser ChristianWawersich and Juumlrgen Kleinoumlder University of Erlagen-Nuumlrnberg
The talk opened by emphasizing theneed for and the practicality of a func-tional Java OS Such implementation inthe JX OS attempts to mimic the recenttrend toward using highly abstractedlanguages in application development inorder to create an OS as functional andpowerful as one written in a lower-levellanguage
The JX is a micro-kernel solution thatuses separate JVMs for each entity of thekernel and in some cases for eachapplication The separated domains donot share objects and have no threadmigration and each domain implementsits own code The interesting part of thepresentation was the discussion of sys-tem-level programming with Java keyareas discussed were memory manage-ment and interrupt handlers Theauthors concluded by noting that theirtype-safe and modular system resultedin a robust system with great configura-tion flexibility and acceptable perfor-mance
DESIGN EVOLUTION OF THE EROS
SINGLE-LEVEL STORE
Jonathan S Shapiro Johns HopkinsUniversity Jonathan Adams Universityof Pennsylvania
This presentation outlined current char-acteristics of file systems and some oftheir least desirable characteristics Itwas based on the revival of the EROS
single-level-store approach for currentattempts to capitalize on system consis-tency and efficiency Their solution didnot relate directly to EROS but ratherto the design and use of the constructsand notions on which EROS is based Byextending the mapping of the memorysystem to include the disk systems arefurther able to ensure global persistencewithout regard for a disk structure Inexplaining the need for absolute systemconsistency and an environment whichby definition is not allowed to crash thegoal of having such an exhaustive designbecomes clear The cool part of thiswork is the availability of its design andarchitecture to the public
THINK A SOFTWARE FRAMEWORK FOR
COMPONENT-BASED OPERATING SYSTEM
KERNELS
Jean-Philippe Fassino France TeacuteleacutecomRampD Jean-Bernard Stefani INRIA JuliaLawall DIKU Gilles Muller INRIA
This presentation discussed the need forcomponent-based operating systemsand respective structures to ensure flexi-bility for arbitrary-sized systems Thinkprovides a binding model that mapsuniformed components for OS develop-ers and architects to follow ensuringconsistent implementations for an arbi-trary system size However the goal isnot to force developers into a predefinedkernel but to promote the use of certaincomponents in varied ways
Thinkrsquos concentration is primarily onembedded systems where short devel-opment time is necessary but is con-strained by rigorous needs and limitedresources The need to build flexible sys-tems in such an environment can becostly and implementation-specific butThink creates an environment sup-ported by the ability to dynamically loadtype-safe components This allows formore flexible systems that retain func-tionality because of Thinkrsquos ability tobind fine-grained components Bench-marks revealed that dedicated micro-kernels can show performanceimprovements in comparison to mono-
USENIX 2002
lithic kernels Notable future workincludes building components for low-end appliances and the development of areal-time OS component library
BUILDING SERVICES
Summarized by Pradipta De
NINJA A FRAMEWORK FOR NETWORK
SERVICES
J Robert von Behren Eric A BrewerNikita Borisov Michael Chen MattWelsh Josh MacDonald Jeremy Lauand David Culler University of Califor-nia Berkeley
Robert von Behren presented Ninja anongoing project that aims to provide aframework for building robust and scal-able Internet-based network serviceslike Web-hosting instant-messagingemail and file-sharing applicationsRobert drew attention to the difficultiesof writing cluster applications One hasto take care of data consistency andissues of concurrency and for robustapplications there are problems relatedto fault tolerance Ninja works as awrapper to relieve the user of theseproblems The goal of the project isbuilding network services that are scala-ble and highly available maintainingpersistent data providing gracefuldegradation and supporting online evo-lution
The use of clusters distinguishes thissetup from generic distributed systemsin terms of reliability and security aswell as providing a partition-free net-work The next important feature of thisproject is the new programming modelwhich is more restrictive than a generalprogramming model but still expressiveenough to write most of the applica-tions This model described as ldquosingleprogram multiple connectionrdquo usesintertask parallelism instead of multi-threaded concurrency and is character-ized by all nodes running the sameprogram with connections delegated tothe nodes by a centralized connectionmanager (CM) A CM takes care of hid-ing the details of mapping an externalconnection to an internal node Ninja
70 Vol 27 No 5 login
also uses a design called ldquostaged event-driven architecturerdquo where each serviceis broken down into a set of stages con-nected by event queues This architec-ture is suitable for a modular design andhelps in graceful degradation by adap-tive load shedding from the eventqueues
The presentation concluded with exam-ples of implementation and evaluationof a Web server and an email systemcalled NinjaMail to show the ease ofauthoring and the efficacy of using theNinja framework for service develop-ment
USING COHORT-SCHEDULING TO ENHANCE
SERVER PERFORMANCE
James R Larus and Michael ParkesMicrosoft Research
James Larus presented a new schedulingpolicy for increasing the throughput ofserver applications Cohort schedulingbatches execution of similar operationsarising in different server requests Theusual programming paradigm in han-dling server requests is to spawn multi-ple concurrent threads and switch fromone thread to another whenever a threadgets blocked for IO Since the threads ina server mostly execute unrelated piecesof code the locality of reference isreduced hence the effectiveness of dif-ferent caching mechanisms One way tosolve this problem is to throw in morehardware But Larus presented a com-plementary view where the programbehavior is investigated and used toimprove the performance
The problem in this scheme is to iden-tify pieces of code that can be batchedtogether for processing One simple wayis to look at the next program countervalues and use them to club threadstogether However this talk presented anew programming abstraction ldquostagedcomputationrdquo which replaces the threadmodel with ldquostagesrdquo A stage is anabstraction for operations with similarbehavior and locality The StagedServerlibrary can be used for programming inthis model It is a collection of C++
classes that implement staged computa-tion and cohort scheduling on either auniprocessor or multiprocessor It canbe used to define stages
The presentation showed the experi-mental evaluation of the cohort schedul-ing over the thread-based model byimplementing two servers a Web serverwhich is mainly IO bound and a pub-lish-subscribe server which is mainlycompute bound The SURGE bench-mark was used for the first experimentand the Fabret workload for the secondThe results showed that cohort-schedul-ing-based implementation gave a betterthroughput than the thread-basedimplementation at high loads
NETWORK PERFORMANCE
Summarized by Xiaobo Fan
ETE PASSIVE END-TO-END INTERNET SERVICE
PERFORMANCE MONITORING
Yun Fu Amin Vahdat Duke UniversityLudmila Cherkasova Wenting Tang HPLabs
This paper won the Best Student Paperaward
Ludmila Cherkasova began by listingseveral questions most Web serviceproviders want answered in order toimprove service quality She reviewedthe difficulties of making accurate andefficient end-to-end Web service mea-surement and the shortcomings of cur-rently available approaches Theypropose a passive trace-based architec-ture called EtE to monitor Web serverperformance on behalf of end users
The first step is to collect network pack-ets passively The second module recon-structs all TCP connections and extractsHTTP transactions To reconstruct Webpage accesses they first build a knowl-edge base indexed by client IP and URLand then group objects to the relatedWeb pages they are embedded in Statis-tical analysis is used to handle non-matched objects EtE Monitor cangenerate three groups of metrics to mea-sure Web service performance responsetime Web caching and page abortion
To demonstrate the benefits of EtE mon-itor Cherkasova talked about two casestudies and based on the calculatedmetrics gave some insightful explana-tions about whatrsquos happening behind the variations of Web performanceThrough validation they claim theirapproach provides a very close approxi-mation to the real scenario
THE PERFORMANCE OF REMOTE DISPLAY
MECHANISMS FOR THIN-CLIENT COMPUTING
S Jae Yang Jason Nieh Matt SelskyNikhil Tiwari Columbia University
Noting the trend toward thin-clientcomputing the authors compared dif-ferent techniques and design choices inmeasuring the performance of six popu-lar thin-client platforms ndash CitrixMetaFrame Microsoft Terminal Ser-vices Sun Ray Tarantella VNC and XAfter pointing out several challenges inbenchmarking thin clients Yang pro-posed slow-motion benchmarking toachieve non-invasive packet monitoringand consistent visual quality Basicallythey insert delays between separatevisual events in the benchmark applica-tions of the server side so that theclientrsquos display update can catch up withthe serverrsquos processing speed The exper-iments are conducted on an emulatednetwork over a range of network band-widths
Their results show that thin clients canprovide good performance for Webapplications in LAN environments butonly some platforms performed well forvideo benchmark Pixel-based encodingmay achieve better performance andbandwidth efficiency than high-levelgraphics Display caching and compres-sion should be used with care
A MECHANISM FOR TCP-FRIENDLY
TRANSPORT-LEVEL PROTOCOL
COORDINATION
David E Ott and Ketan Mayer-PatelUniversity of North Carolina ChapelHill
A revised transport-level protocol opti-mized for cluster-to-cluster (C-to-C)
71October 2002 login
C
ON
FERE
NC
ERE
PORT
Sapplications is what David Ott tries toexplain in his talk A C-to-C applicationclass is identified as one set of processescommunicating to another set ofprocesses across a common Internetpath The fundamental problem with C-to-C applications is how to coordinateall C-to-C communication flows so thatthey share a consistent view of the com-mon C-to-C network adapt to changingnetwork conditions and cooperate tomeet specific requirements Aggregationpoints (AP) are placed at the first andlast hop of the common data path toprobe network conditions (latencybandwidth loss rate etc) To carry andtransfer this information a new protocolndash Coordination Protocol (CP) ndash isinserted between the network layer (IP)and the transport layer (TCP UCP etc)Ott illustrated how the CP header isupdated and used when packets origi-nate from source and traverse throughlocal and remote AP to arrive at theirdestination and how AP maintains aper-cluster state table and detects net-work conditions Through simulationresults this coordination mechanismappears effective in sharing commonnetwork resources among C-to-C com-munication flows
STORAGE SYSTEMS
Summarized by Praveen Yalagandula
MY CACHE OR YOURS MAKING STORAGE
MORE EXCLUSIVE
Theodore M Wong Carnegie MellonUniversity John Wilkes HP Labs
Theodore Wong explained the ineffi-ciency of current ldquoinclusiverdquo cachingschemes in storage area networks ndashwhen a client accesses a block the blockis read from the disk and is cached atboth the disk array cache and at theclientrsquos cache Then he presented theconcept of ldquoexclusiverdquo caching where ablock accessed by a client is only cachedin that clientrsquos cache On eviction fromthe clientrsquos cache they have come upwith a DEMOTE operation to move thedata block to the tail of the array cache
The new exclusion caching schemes areevaluated on both single-client and mul-tiple-client systems The single-clientresults are presented for two differenttypes of synthetic workloads Random(transaction-processing type workloads)and Zipf (typical Web workloads) Theexclusive policy was quite effective inachieving higher hit rates and lower readlatencies They also showed a 22 timeshit-rate improvement over inclusivetechniques in the case of a real-lifeworkload httpd with a single-client set-ting
The DEMOTE scheme also performedwell in the case of multiple-client sys-tems when the data accessed by clients isdisjointed In the case of shared dataworkloads this scheme performed worsethan the inclusive schemes Theodorethen presented an adaptive exclusivecaching scheme ndash a block accessed by aclient is placed at the tail in the clientrsquoscache and also in the array cache at anappropriate place determined by thepopularity of the block The popularityof the block is measured by maintaininga ghost cache to accumulate the numberof times each block is accessed This newadaptive scheme has achieved a maxi-mum of 52 speedup in the meanlatency in the experiments with real-lifeworkloads
For more information visit httpwwwcscmuedu~tmwongresearch and httpwwwhplhpcomresearchitccslssp
BRIDGING THE INFORMATION GAP IN
STORAGE PROTOCOL STACKS
Timothy E Denehy Andrea C Arpaci-Dusseau Remzi H Arpaci-DusseauUniversity of Wisconsin Madison
Currently there is a huge informationgap between storage systems and file sys-tems The interface exposed by storagesystems to file systems based on blocksand providing only readwrite interfacesis very narrow This leads to poor per-formance as a whole because of dupli-cated functionality in both systems and
USENIX 2002
reduced functionality resulting fromstorage systemsrsquo lack of file information
The speaker presented two enhance-ments ExRaid an exposed RAID andILFS Informed Log-Structured File Sys-tem ExRAID is an enhancement to theblock-based RAID storage system thatexposes the following three types ofinformation to the file system (1)regions ndash contiguous portions of theaddress space comprising one or multi-ple disks (2) performance informationabout queue lengths and throughput ofthe regions revealing disk heterogeneityto the file systems and (3) failure infor-mation ndash dynamically updated informa-tion conveying the number of tolerablefailures in each region
ILFS allows online incremental expan-sion of the storage space performsdynamic parallelism using ExRAIDrsquosperformance information to performwell on heterogeneous storage systemsand provides a range of different mecha-nisms with different granularities forredundancy of files using ExRAIDrsquosregion failure characteristics These newfeatures are added to LFS with only a19 increase in the code size
More informationhttpwwwcswisceduwind
MAXIMIZING THROUGHPUT IN REPLICATED
DISK STRIPING OF VARIABLE
BIT-RATE STREAMS
Stergios V Anastasiadis Duke Univer-sity Kenneth C Sevcik and MichaelStumm University of Toronto
There is an increasing demand for thecontinuous real-time streaming ofmedia files Disk striping is a commontechnique used for supporting manyconnections on the media servers Buteven with 12 million hours of meantime between failures there will be morethan one disk failure per week with 7000disks Fault tolerance can be achieved byusing data replication and reservingextra bandwidth during normal opera-tion This work focused on supportingthe most common variable bit-ratestream formats (eg MPEG)
72 Vol 27 No 5 login
The authors used prototype mediaserver EXEDRA to implement and eval-uate different replication and bandwidthreservation schemes This system sup-ports a variable bit-rate scheme doesstride-based disk allocation and is capa-ble of supporting variable-grain diskstriping Two replication policies arepresented deterministic ndash data blocksare replicated in a round-robin fashionon the secondary replicas random ndashdata blocks are replicated on the ran-domly chosen secondary replicas Twobandwidth reservation techniques arealso presented mirroring reservation ndashdisk bandwidth is reserved for both primary and replicas of the media fileduring the playback and minimumreservation ndash a more efficient scheme inwhich bandwidth is reserved only for thesum of primary data access time and themaximum of the backup data accesstimes required in each round
Experimental results showed that deter-ministic replica placement is better thanrandom placement for small disks mini-mum disk bandwidth reservation istwice as good as mirroring in thethroughput achieved and fault tolerancecan be achieved with a minimal impacton the throughput
TOOLS
Summarized by Li Xiao
SIMPLE AND GENERAL STATISTICAL PROFILING
WITH PCT
Charles Blake and Steven Bauer MIT
This talk introduced a Profile CollectionToolkit (PCT) ndash a sampling-based CPUprofiling facility A novel aspect of PCTis that it allows sampling of semanticallyrich data such as function call stacksfunction parameters local or globalvariables CPU registers or other execu-tion contexts The design objectives ofPCT were driven by user needs and theinadequacies or inaccessibility of priorsystems
The rich data collection capability isachieved via a debugger-controller pro-gram dbct1 The talk then introducedthe profiling collector and data collec-
tion activation PCT file format optionsprovide flexibility and various ways tostore exact states PCT also has analysisoptions two examples were given ndashldquoSimplerdquo and ldquoMixed User and LinuxCoderdquo
The talk then went into how generalsampling helps in the following areas adebugger-controller state machineexample script fragments a simpleexample program and general valuereports in terms of call-path histogramsnumeric value histograms and data gen-erality Related work such as gprof andExpect was then summarized The con-tribution of this work is to provide aportable general value sampling toolThe limitations are that PCT does notsupport much of strip and count inac-curacies happen because of its statisticalnature Based on this limitation -g ispreferred over strip
Possible future directions included sup-porting more debuggers such as dbxand xdb script generators for otherkinds of traces more canned reports forgeneral values and a libgdb-basedlibrary sampler PCT is available athttppdoslcsmitedupct PCT runs onalmost any UNIX-like system
ENGINEERING A DIFFERENCING AND
COMPRESSION DATA FORMAT
David G Korn and Kiem Phong VoATampT Laboratories ndash Research
This talk began with an equation ldquoDif-ferencing + Compression = Delta Com-pressionrdquo Compression removesinformation redundancy in a data setExamples are gzip bzip compress andpack Differencing encodes differencesbetween two data sets Examples are diff -e fdelta bdiff and xdelta Deltacompression compresses a data set givenanother combining differencing andcompression and reduces to pure com-pression when therersquos no commonality
After presenting an overall scheme for adelta compressor and showing deltacompression performance this talk dis-cussed the encoding format of a newlydesigned Vcdiff for delta compression A
Vcdiff instruction code table consists of256 entries of each coding up to a pair ofinstructions and recodes I-byte indicesand any additional data
The talk then showed Vcdiff rsquos perfor-mance with Web data collected fromCNN compared with W3C standardGdiff Gdiff+gzip and diff+gzip whereGdiff was computed using Vcdiff deltainstructions The results of two experi-ments ldquoFirstrdquo and ldquoSuccessiverdquo werepresented In ldquoFirstrdquo each file is com-pressed against the first file collected inldquoSuccessiverdquo each file is compressedagainst the one in the previous hourThe diff+gzip did not work well becausediff was line-oriented Vcdiff performedfavorably compared with other formats
Vcdiff is part of the Vcodex packageVcodex is a platform for all commondata transformations delta compres-sion plain compression encryptiontranscoding (eg uuencode base64) Itis structured in three layers for maxi-mum usability Base library uses Dis-plines and Methods interfaces
The code for Vcdiff can be found athttpwwwresearchattcomswtoolsThey are moving Vcdiff to the IETFstandard as a comprehensive platformfor transforming data Please refer tohttpwwwietforginternet-draftdraft-korn-vcdiff06txt
WHERE IN THE NET
Summarized by Amit Purohit
A PRECISE AND EFFICIENT EVALUATION OF
THE PROXIMITY BETWEEN WEB CLIENTS AND
THEIR LOCAL DNS SERVERS
Zhuoqing Morley Mao University ofCalifornia Berkeley Charles D CranorFred Douglis Michael RabinovichOliver Spatscheck and Jia Wang ATampTLabs ndash Research
Content Distribution Networks (CDNs)attempt to improve Web performance bydelivering Web content to end usersfrom servers located at the edge of thenetwork When a Web client requestscontent the CDN dynamically chooses aserver to route the request to usually the
73October 2002 login
C
ON
FERE
NC
ERE
PORT
Sone that is nearest the client As CDNshave access only to the IP address of thelocal DNS server (LDNS) of the clientthe CDNrsquos authoritative DNS servermaps the clientrsquos LDNS to a geographicregion within a particular network andcombines that with network and serverload information to perform CDNserver selection
This method has two limitations First itis based on the implicit assumption thatclients are close to their LDNS This maynot always be valid Second a singlerequest from an LDNS can represent adiffering number of Web clients ndash calledthe hidden load factor Knowledge of thehidden load factor can be used toachieve better load distribution
The extent of the first limitation and itsimpact on the CDN server selection isdealt with To determine the associationsbetween clients and their LDNS a sim-ple non-intrusive and efficient map-ping technique was developed The datacollected was used to study the impact ofproximity on DNS-based server selec-tion using four different proximity met-rics (1) autonomous system (AS)clustering ndash observing whether a client isin the same AS as its LDNS ndash concludedthat LDNS is very good for coarse-grained server selection as 64 of theassociations belong to the same AS(2) network clustering ndash observingwhether a client is in the same network-aware cluster (NAC) ndash implied that DNSis less useful for finer-grained serverselection since only 16 of the clientand LDNS are in the same NAC (3)traceroute divergence ndash the length of thedivergent paths to the client and itsLDNS from a probe point using tracer-oute ndash implies that most clients aretopologically close to their LDNS asviewed from a randomly chosen probesite (4) round-trip time (RTT) correla-tion (some CDNs select severs based onRTT between the CDN server and theclientrsquos LDNS) examines the correlationbetween the message RTTs from a probepoint to the client and its local DNS
results imply this is a reasonable metricto use to avoid really distant servers
A study of the impact that client-LDNSassociations have on DNS-based serverselection concludes that knowing theclientrsquos IP address would allow moreaccurate server selection in a large num-ber of cases The optimality of the serverselection also depends on the serverdensity placement and selection algo-rithms
Further information can be found athttpwwweecsberkeleyedu~zmaomyresearchhtml or by contactingzmaoeecsberkeleyedu
GEOGRAPHIC PROPERTIES OF INTERNET
ROUTING
Lakshminarayanan SubramanianVenkata N Padmanabhan MicrosoftResearch Randy H Katz University ofCalifornia at Berkeley
Geographic information can provideinsights into the structure and function-ing of the Internet including interac-tions between different autonomoussystems by analyzing certain propertiesof Internet routing It can be used tomeasure and quantify certain routingproperties such as circuitous routinghot-potato routing and geographic faulttolerance
Traceroute has been used to gather therequired data and Geotrack tool hasbeen used to determine the location ofthe nodes along each network path Thisenables the computation of ldquolinearizeddistancesrdquo which is the sum of the geo-graphic distances between successivepairs of routers along the path
In order to measure the circuitousnessof a path a metric ldquodistance ratiordquo hasbeen defined as the ratio of the lin-earized distance of a path to the geo-graphic distance between the source anddestination of the path From the data ithas been observed that the circuitous-ness of a route depends on both geo-graphic and network location of the endhosts A large value of the distance ratioenables us to flag paths that are highly
USENIX 2002
circuitous possibly (though not neces-sarily) because of routing anomalies Itis also shown that the minimum delaybetween end hosts is strongly correlatedwith the linearized distance of the path
Geographic information can be used tostudy various aspects of wide-area Inter-net paths that traverse multiple ISPs Itwas found that end-to-end Internetpaths tend to be more circuitous thanintra-ISP paths the cause for this beingthe peering relationships between ISPsAlso paths traversing substantial dis-tances within two or more ISPs tend tobe more circuitous than paths largelytraversing only a single ISP Anotherfinding is that ISPs generally employhot-potato routing
Geographic information can also beused to capture the fact that two seem-ingly unrelated routers can be suscepti-ble to correlated failures By using thegeographic information of routers wecan construct a geographic topology ofan ISP Using this we can find the toler-ance of an ISPrsquos network to the total net-work failure in a geographic region
For further information contactlakmecsberkeleyedu
PROVIDING PROCESS ORIGIN INFORMATION
TO AID IN NETWORK TRACEBACK
Florian P Buchholz Purdue UniversityClay Shields Georgetown University
Network traceback is currently limitedbecause host audit systems do not main-tain enough information to matchincoming network traffic to outgoingnetwork traffic The talk presented analternative assigning origin informationto every process and logging it duringinteractive login creation
The current implementation concen-trates mainly on interactive sessions inwhich an event is logged when a newconnection is established using SSH orTelnet The method is effective andcould successfully determine steppingstones and reliably detect the source of aDDoS attack The speaker then talkedabout the related work done in this area
74 Vol 27 No 5 login
which led to discussion of the latestdevelopment in the area of ldquohostcausalityrdquo
The speaker ended with the limitationsand future directions of the frameworkThis framework could be extended andcould find many applications in the areaof security Current implementationdoesnrsquot take care of startup scripts andcron jobs but incorporating the origininformation in FS could solve this prob-lem In the current implementation log-ging is just implemented as a proof ofconcept It could be made safe in manyways and this could be another impor-tant aspect of future work
PROGRAMMING
Summarized by Amit Purohit
CYCLONE A SAFE DIALECT OF C
Trevor Jim ATampT Labs ndash Research GregMorrisett Dan Grossman MichaelHicks James Cheney Yanling WangCornell University
Cyclone is designed to provide protec-tion against attacks such as buffer over-flows format string attacks andmemory management errors The cur-rent structure of C allows programmersto write vulnerable programs Cycloneextends C so that it has the safety guar-antee of Java while keeping C syntaxtypes and semantics untouched
The Cyclone compiler performs staticanalysis of the source code and insertsruntime checks into the compiled out-put at places where the analysis cannotdetermine that the code execution willnot violate safety constraints Cycloneimposes many restrictions to preservesafety such as NULL checksThesechecks do not exist in normal C
The speaker then talked about somesample code written in Cyclone and howit tackles safety problems Porting anexisting C application to Cyclone ispretty easy with fewer than 10 changerequired The current implementationconcentrates more on safety than onperformance ndash hence the performance
penalty is substantial Cyclone was ableto find many lingering bugs The finalstep of the project will be to write acompiler to convert a normal C programto an equivalent Cyclone program
COOPERATIVE TASK MANAGEMENT WITHOUT
MANUAL STACK MANAGEMENT
Atul Adya Jon Howell MarvinTheimer William J Bolosky John RDouceur Microsoft Research
The speaker described the definitionsand motivations behind the distinctconcepts of task management stackmanagement IO management conflictmanagement and data partitioningConventional concurrent programminguses preemptive task management andexploits the automatic stack manage-ment of a standard language In the sec-ond approach cooperative tasks areorganized as event handlers that yieldcontrol by returning control to the eventscheduler manually unrolling theirstacks In this project they have adopteda hybrid approach that makes it possiblefor both stack management styles tocoexist Thus programmers can codeassuming one or the other of these stackmanagement styles will operate depend-ing upon the application The speakeralso gave a detailed example of how touse their mechanism to switch betweenthe two styles
The talk then continued with the imple-mentation They were able to preemptmany subtle concurrency problems byusing cooperative task managementPaying a cost up-front to reduce a subtlerace condition proved to be a goodinvestment
Though the choice of task managementis fundamental the choice of stack man-agement can be left to individual tasteThis project enables use of any type ofstack management in conjunction withcooperative task management
IMPROVING WAIT-FREE ALGORITHMS FOR
INTERPROCESS COMMUNICATION IN
EMBEDDED REAL-TIME SYSTEMS
Hai Huang Padmanabhan Pillai KangG Shin University of Michigan
The main characteristic of the real-timetime-sensitive system is its pre-dictable response time But concurrencymanagement creates hurdles to achiev-ing this goal because of the use of locksto maintain consistency To solve thisproblem many wait-free algorithmshave been developed but these are typi-cally high-cost solutions By takingadvantage of the temporal characteris-tics of the system however the time andspace overhead can be reduced
The speaker presented an algorithm fortemporal concurrency control andapplied this technique to improve threewait-free algorithms A single-writermultiple-reader wait-free algorithm anda double-buffer algorithm were pro-posed
Using their transformation mechanismthey achieved an improvement of17ndash66 in ACET and a 14ndash70 reduc-tion in memory requirements for IPCalgorithms This mechanism is extensi-ble and can be applied to other non-blocking IPC algorithms as well Futurework involves reducing the synchroniz-ing overheads in more general IPC algo-rithms with multiple-writer semantics
MOBILITY
Summarized by Praveen Yalagandula
ROBUST POSITIONING ALGORITHMS FOR
DISTRIBUTED AD-HOC WIRELESS SENSOR
NETWORKS
Chris Savarese Jan Rabaey Universityof California Berkeley Koen Langen-doen Delft University of Technology
The Pico Radio Network comprisesmore than a hundred sensors monitorsand actuators equipped with wirelessconnectivity with the following proper-ties (1) no infrastructure (2) computa-tion in a distributed fashion instead ofcentralized computation (3) dynamictopology (4) limited radio range
75October 2002 login
C
ON
FERE
NC
ERE
PORT
S(5) nodes that can act as repeaters(6) devices of low cost and with lowpower and (7) sparse anchor nodes ndashnodes with GPS capability A positioningalgorithm determines the geographicalposition of each node in the network
In two-dimensional space each nodeneeds three reference positions to esti-mate its geographical position There aretwo problems that make positioning dif-ficult in the Pico Radio Network typesetting (1) a sparse anchor node prob-lem and (2) a range error problem
The two-phase approach that theauthors have taken in solving the prob-lem consists of (1) a hop-terrain algo-rithm ndash in this first phase each noderoughly guesses its location by the dis-tance calculated using multi-hops to theanchor points and (2) a refinementalgorithm ndash in this second phase eachnode uses its neighborsrsquo positions torefine its own position estimate Toguarantee the convergence thisapproach uses confidence metricsassigning a value of 10 for anchor nodesand 01 to start for other nodes andincreasing these with each iteration
A simulation tool OMNet++ was usedfor both phases and in various scenariosThe results show that they achievedposition errors of less than 33 in a sce-nario with 5 anchor nodes an averageconnectivity of 7 and 5 range mea-surement error
Guidelines for anchor node deploymentare high connectivity (gt10) a reason-able fraction of anchor nodes (gt 5)and for anchor placement coverededges
APPLICATION-SPECIFIC NETWORK
MANAGEMENT FOR ENERGY-AWARE
STREAMING OF POPULAR MULTIMEDIA
FORMATS
Surendar Chandra University of Geor-gia and Amin Vahdat Duke University
The main hindrance in supporting theincreasing demand for mobile multime-dia on PDA is the battery capacity of the
small devices The idle-time power con-sumption is almost the same as thereceive power consumption on typicalwireless network cards (1319mW vs1425mW) while the sleep state con-sumes far less power (177mW) To letthe system consume energy proportionalto the stream quality the network cardshould transition to sleep state aggres-sively between each packet
The speaker presented previous workwhere different multimedia streamswere studied for PDAs MS Media RealMedia and Quicktime It was found thatif the inter-packet gap is predictablethen huge savings are possible forexample about 80 savings in MSmedia streams with few packet lossesThe limitation of the IEEE 80211bpower-save mode comes into play whenthere are two or more nodes producinga delay between beacon time and whenthe node receivessends packets Thisbadly affects the streams since higherenergy is consumed while waiting
The authors propose traffic shaping forenergy conservation where this is doneby proxy such that packets arrive at reg-ular intervals This is achieved using alocal proxy in the access point and aclient-side proxy in the mobile hostSimulations show that traffic shapingreduces energy consumption and alsoreveals that the higher bandwidthstreams have a lower energy metric(mJkB)
More information is at httpgreenhousecsugaedu
CHARACTERIZING ALERT AND BROWSE SER-
VICES OF MOBILE CLIENTS
Atul Adya Paramvir Bahl and Lili QiuMicrosoft Research
Even though there is a dramatic increasein Internet access from wireless devicesthere are not many studies done oncharacterizing this traffic In this paperthe authors characterize the trafficobserved on the MSN Web site withboth notification and browse traces
USENIX 2002
Around 33 million browsing accessesand about 325 million notificationentries are present in the traces Threetypes of analyses are done for each oneof these two services content analysisconcerning the most popular contentcategories and their distribution popu-larity analysis or the popularity distri-bution of documents and user behavioranalysis
The analysis of the notification logsshows that document access rates followa Zipf-like distribution with most of theaccesses concentrated on a small num-ber of messages and the accesses exhibitgeographical locality ndash users from samelocality tend to receive similar notifica-tion content The browser log analysisshows that a smaller set of URLs areaccessed most times though the accesspattern does not fit any Zipf curve andthe highly accessed URLS remain stableA correlation study between the notifi-cation and browsing services shows thatwireless users have a moderate correla-tion of 012
FREENIX TRACK SESSIONS
BUILDING APPLICATIONS
Summarized by Matt Butner
INTERACTIVE 3-D GRAPHICS APPLICATIONS
FOR TCL
Oliver Kersting Juumlrgen Doumlllner HassoPlattner Institute for Software SystemsEngineering University of Potsdam
The integration of 3-D image renderingfunctionality into a scripting languagepermits interactive and animated 3-Ddevelopment and application withoutthe formalities and precision demandedby low-level CC++ graphics and visual-ization libraries The large and complexC++ API of the Virtual Rendering Sys-tem (VRS) can be combined with theconveniences of the Tcl scripting lan-guage The mapping of class interfaces isdone via an automated process and gen-erates respective wrapper classes all ofwhich ensures complete API accessibilityand functionality without any signifi-
76 Vol 27 No 5 login
cant performance retribution The gen-eral C++-to-Tcl mapper SWIG grantsthe necessary object and hierarchal abili-ties without the use of object-orientedTcl extensions The final API mappingtechniques address C++ features such asclasses overloaded methods enumera-tions and inheritance relations All areimplemented in a proof-of-concept thatmaps the complete C++ API of the VRSto Tcl and are showcased in a completeinteractive 3-D-map system
THE AGFL GRAMMAR WORK LAB
Cornelis HA Coster Erik VerbruggenUniversity of Nijmegen (KUN)
The growth and implementation of Nat-ural Language Processing (NLP) is thecornerstone of the continued evolutionand implementation of truly intelligentsearch machines and services In partdue to the growing collections of com-puter-stored human-readable docu-ments in the public domain theimplementation of linguistic analysiswill become necessary for desirable pre-cision and resolution Subsequentlysuch tools and linguistic resources mustbe released into the public domain andthey have done so with the release of theAGFL Grammar Work Lab under theGNU Public License as a tool for lin-guistic research and the development ofNLP-based applications
The AGFL (Affix Grammars over aFinite Lattice) Grammar Work Labmeshes context-free grammars withfinite set-valued features that are accept-able to a range of languages In com-puter science terms ldquoSyntax rules areprocedures with parameters and a non-deterministic executionrdquo The EnglishPhrases for Information Retrieval(EP4IR) released with the AGFL-GWLas a robust grammar of English is anAGFL-GWL generated English parserthat outputs ldquoHeadModifiedrdquo framesThe sentences ldquoCompanyX sponsoredthis conferencerdquo and ldquoThis conferencewas sponsored by CompanyXrdquo both gen-erate [CompanyX[sponsored confer-
ence]] The hope is that public availabil-ity of such tools will encourage furtherdevelopment of grammar and lexicalsoftware systems
SWILL A SIMPLE EMBEDDED WEB SERVER
LIBRARY
Sotiria Lampoudi David M BeazleyUniversity of Chicago
This paper won the FREENIX Best Stu-dent Paper award
SWILL (Simple Web Interface LinkLibrary) is a simple Web server in theform of a C library whose developmentwas motivated by a wish to give coolapplications an interface to the Web TheSWILL library provides a simple inter-face that can be efficiently implementedfor tasks that vary from flexible Web-based monitoring to software debuggingand diagnostics Though originallydesigned to be integrated with high-per-formance scientific simulation softwarethe interface is generic enough to allowfor unbounded uses SWILL is a single-threaded server relying upon non-blocking IO through the creation of atemporary server which services IOrequests SWILL does not provide SSL orcryptographic authentication but doeshave HTTP authentication abilities
A fantastic feature of SWILL is its sup-port for SPMD-style parallel applica-tions which utilize MPI provingvaluable for Beowulf clusters and largeparallel machines Another practicalapplication was the implementation ofSWILL in a modified Yalnix emulator byUniversity of Chicago Operating Sys-tems courses which utilized the addedYalnix functionality for OS developmentand debugging SWILL requires minimalmemory overhead and relies upon theHTTP10 protocol
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
THE JOY OF BREAKING THINGS
Pat Parseghian Transmeta
Summarized by Juan Navarro
Pat Parseghian described her experiencein testing the Crusoe microprocessor atthe Transmeta Lab for Compatibility(TLC) where the motto is ldquoYou make it we break itrdquo and the goal is to makeengineers miserable
She first gave an overview of Crusoewhose most distinctive feature is a codemorphing layer that translates x86instructions into native VLIW Suchpeculiarity makes the testing processparticularly challenging because itinvolves testing two processors (theldquoexternalrdquo x86 and the ldquointernalrdquo VLIW)and also because of variations on howthe code-morphing layer works (it mayinterpret code the first few times it seesan instruction sequence and translateand save in a translation cache after-wards) There are also reproducibilityissues since the VLIW processor mayrun at different speeds to save energy
Pat then described the tests that theysubject the processor to (hardware com-patibility tests common applicationsoperating systems envelope-pushinggames and hard-to-acquire legacy appli-cations) and the issues that must be con-sidered when defining testing policiesShe gave some testing tips includingorganizational issues like trackingresources and results
To illustrate the testing process Pat gavea list of possible causes of a system crashthat must be investigated If it is the sili-con then it might be because it is dam-aged or because of a manufacturingproblem or a design flaw If the problemis in the code-morphing layer is thefault with the interpreter or with thetranslator The fault could also be exter-nal to the processor it could be thehomemade system BIOS a faulty main-board or an operator error Or Trans-meta might not be at fault at all thecrash might be due to a bug in the OS orthe application The key to pinpointing
65October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe cause of a problem is to isolate it byidentifying the conditions that repro-duce the problem and repeating thoseconditions in other test platforms andnon-Crusoe systems Then relevant fac-tors must be identified based on productknowledge past experience and com-mon sense
To conclude Pat suggested that some ofTLCrsquos testing lessons can be applied toproducts that the audience was involvedin and assured us that breaking things isfun
TECHNOLOGY LIBERTY FREEDOM AND
WASHINGTON
Alan Davidson Center for Democracyand Technology
Summarized by Steve Bauer
ldquoExperience should teach us to be moston our guard to protect liberty when theGovernmentrsquos purposes are beneficentMen born to freedom are naturally alertto repel invasion of their liberty by evil-minded rulers The greatest dangers toliberty lurk in insidious encroachmentby men of zeal well-meaning but with-out understandingrdquo ndash Louis BrandeisOlmstead v US
Alan Davison an associate director atthe CDT (httpwwwcdtorg) since1996 concluded his talk with this quoteIn many ways it aptly characterizes theimportance to the USENIX communityof the topics he covered The majorthemes of the talk were the impact oflaw on individual liberty system archi-tectures and appropriate responses bythe technical community
The first part of the talk provided theaudience with an overview of legislationeither already introduced or likely to beintroduced in the US Congress Thisincluded various proposals to protectchildren such as relegating all sexualcontent to a xxx domain or providingkid-safe zones such as kidsus Othersimilar laws discussed were the Chil-drenrsquos Online Protection Act and legisla-tion dealing with virtual child pornography
Other lower-profile pieces covered werelaws prohibiting false contact informa-tion in emails and domain registrationdatabases Similar proposals exist thatwould prohibit misleading subject linesin emails and require messages to clearlyidentify if they are advertisementsBriefly mentioned were laws impactingonline gambling
Bills and laws affecting the architecturaldesign of systems and networks and var-ious efforts to establish boundaries andborders in cyberspace were then dis-cussed These included the ConsumerBroadband and Television PromotionAct and the French cases against Yahooand its CEO for allowing the sale of Nazimaterials on the Yahoo auction site
A major part of the talk focused on thetechnological aspects of the USA-PatriotAct passed in the wake of the September11 attacks Topics included ldquopen regis-tersrdquo for the Internet expanded govern-ment power to conduct secret searchesroving wiretaps nationwide service ofwarrants sharing grand jury and intelli-gence information and the establish-ment of an identification system forvisitors to the US
The technological community needs tobe aware and care about the impact oflaw on individual liberty and the systemsthe community builds Specific sugges-tions included designing privacy andanonymity into architectures and limit-ing information collection particularlyany personally identifiable data to onlywhat is essential Finally Alan empha-sized that many people simply do notunderstand the implications of lawThus the technology community has aimportant role in helping make theseimplications clear
TAKING AN OPEN SOURCE PROJECT TO
MARKET
Eric Allman Sendmail Inc
Summarized by Scott Kilroy
Eric started the talk with the upsides anddownsides to open source software Theupsides include rapid feedback high-
USENIX 2002
quality work (mostly) lots of hands andan intensely engineering-drivenapproach The downsides include nosupport structure limited project mar-keting expertise and volunteers havinglimited time
In 1998 Sendmail was becoming a suc-cess disaster A success disaster includestwo key elements (1) volunteers startspending all their time supporting cur-rent functionality instead of enhancing(this heavy burden can lead to projectstagnation and eventually death)(2) competition for the better-fundedsources can lead to FUD (fear uncer-tainty and doubt) about the project
Eric then led listeners down the path hetook to turn Sendmail into a businessEric envisioned Sendmail Inc as a small-ish and close-knit company but soonrealized that the family atmosphere hedesired could not last as the companygrew He observed that maybe only thefirst 10 people will buy into your visionafter which each individual is primarilyinterested in something else (eg powerwealth or status) He warns that therewill be people working for you that youdo not like Eric particularly did not carefor sales people but he emphasized howimportant sales marketing finance andlegal people are to companies
The nature of companies in infancy canbe summed up as you need money andyou need investors in order to getmoney Investors want a return oninvestment so money managementbecomes critical Companies thereforemust optimize money functions Ericrsquosexperience with investors were lessonsall in themselves More than givingmoney good investors can provide con-nections and insight so you donrsquot wantinvestors who donrsquot believe in you
A company must have bug historychange management and people whounderstand the product and have a senseof the target market Eric now sees theimportance of marketing targetresearch A company can never forget
66 Vol 27 No 5 login
the simple equation that drives businessprofit = revenue - expense
Finally the concept of value should bebased on what is valuable to the cus-tomer Eric observed that his customersneeded documentation extra value inthe product that they couldnrsquot get else-where and technical support
Eric learned hard lessons along the wayIf you want to do interesting opensource it might be best not to be toosuccessful You donrsquot want to let thelarger dragons (companies) notice youIf you want a commercial user base youhave to manage process watch corpo-rate culture provide value to customerswatch bottom lines and develop a thickskin
Starting a company is easy but perpetu-ating it is extremely difficult
INFORMATION VISUALIZATION FOR
SYSTEMS PEOPLE
Tamara Munzner University of BritishColumbia
Summarized by JD Welch
Munzner presented interesting evidencethat information visualization throughinteractive representations of data canhelp people to perform tasks more effec-tively and reduce the load on workingmemory
A popular technique in visualizing datais abstracting it through node-link rep-resentation ndash for example a list of cross-references in a book Representing therelationships between references in agraphical (node-link) manner ldquooffloadscognition to the human perceptual sys-temrdquo These graphs can be producedmanually but automated drawing allowsfor increased complexity and quickerproduction times
The way in which people interpret visualdata is important in designing effectivevisualization systems Attention to pre-attentive visual cues are key to successPicking out a red dot among a field ofidentically shaped blue dots is signifi-cantly faster than the same data repre-
sented in a mixture of hue and shapevariations There are many pre-attentivevisual cues including size orientationintersection and intensity The accuracyof these cues can be ranked with posi-tion triggering high accuracy and coloror density on the low end
Visual cues combined with differentdata types ndash quantitative (eg 10inches) ordered (small medium large)and categorical (apples oranges) ndash canalso be ranked with spatial positionbeing best for all types Beyond thiscommonality however accuracy variedwidely with length ranked second forquantitative data eighth for ordinal andninth for categorical data types
These guidelines about visual perceptioncan be applied to information visualiza-tion which unlike scientific visualiza-tion focuses on abstract data and choiceof specialization Several techniques areused to clarify abstract data includingmulti-part glyphs where changes inindividual parts are incorporated into aneasier to understand gestalt interactiv-ity motion and animation In large datasets techniques like ldquofocus + contextrdquowhere a zoomed portion of the graph isshown along with a thumbnail view ofthe entire graph are used to minimizeuser disorientation
Future problems include dealing withhuge databases such as the HumanGenome reckoning dynamic data likethe changing structure of the Web orreal-time network monitoring andtransforming ldquopixel boundrdquo displays intolarge ldquodigital wallpaperrdquo-type systems
FIXING NETWORK SECURITY BY HACKING
THE BUSINESS CLIMATE
Bruce Schneier Counterpane InternetSecurity
Summarized by Florian Buchholz
Bruce Schneier identified security as oneof the fundamental building blocks ofthe Internet A certain degree of securityis needed for all things on the Net andthe limits of security will become limitsof the Net itself However companies are
hesitant to employ security measuresScience is doing well but one cannot seethe benefits in the real world Anincreasing number of users means moreproblems affecting more people Oldproblems such as buffer overflowshavenrsquot gone away and new problemsshow up Furthermore the amount ofexpertise needed to launch attacks isdecreasing
Schneier argues that complexity is theenemy of security and that while secu-rity is getting better the growth in com-plexity is outpacing it As security isfundamentally a people problem oneshouldnrsquot focus on technologies butrather should look at businesses busi-ness motivations and business costsTraditionally one can distinguishbetween two security models threatavoidance where security is absoluteand risk management where security isrelative and one has to mitigate the riskwith technologies and procedures In thelatter model one has to find a pointwith reasonable security at an acceptablecost
After a brief discussion on how securityis handled in businesses today in whichhe concluded that businesses ldquotalk bigabout it but do as little as possiblerdquoSchneier identified four necessary stepsto fix the problems
First enforce liabilities Today no realconsequences have to be feared fromsecurity incidents Holding peopleaccountable will increase the trans-parency of security and give an incentiveto make processes public As possibleoptions to achieve this he listed indus-try-defined standards federal regula-tion and lawsuits The problems withenforcement however lie in difficultiesassociated with the international natureof the problem and the fact that com-plexity makes assigning liabilities diffi-cult Furthermore fear of liability couldhave a chilling effect on free softwaredevelopment and could stifle new com-panies
67October 2002 login
C
ON
FERE
NC
ERE
PORT
SAs a second step Schneier identified theneed to allow partners to transfer liabil-ity Insurance would spread liability riskamong a group and would be a CEOrsquosprimary risk analysis tool There is aneed for standardized risk models andprotection profiles and for more securi-ties as opposed to empty press releasesSchneier predicts that insurance will cre-ate a marketplace for security where cus-tomers and vendors will have the abilityto accept liabilities from each otherComputer-security insurance shouldsoon be as common as household or fireinsurance and from that developmentinsurance will become the driving factorof the security business
The next step is to provide mechanismsto reduce risks both before and aftersoftware is released techniques andprocesses to improve software quality aswell as an evolution of security manage-ment are therefore needed Currentlysecurity products try to rebuild the wallsndash such as physical badges and entrydoorways ndash that were lost when gettingconnected Schneier believes thisldquofortress metaphorrdquo is bad one shouldthink of the problem more in terms of acity Since most businesses cannot affordproper security outsourcing is the onlyway to make security scalable With out-sourcing the concenpt of best practicesbecomes important and insurance com-panies can be tied to them outsourcingwill level the playing field
As a final step Schneier predicts thatrational prosecution and education willlead to deterrence He claims that peoplefeel safe because they live in a lawfulsociety whereas the Internet is classifiedas lawless very much like the AmericanWest in the 1800s or a society ruled bywarlords This is because it is difficult toprove who an attacker is prosecution ishampered by complicated evidencegathering and irrational prosecutionSchneier believes however that educa-tion will play a major role in turning theInternet into a lawful society Specifi-cally he pointed out that we need lawsthat can be explained
Risks wonrsquot go away the best we can dois manage them A company able tomanage risk better will be more prof-itable and we need to give CEOs thenecessary risk-management tools
There were numerous questions Listedbelow are the more complicated ones ina paraphrased QampA format
Q Will liability be effective will insur-ance companies be willing to accept therisks A The government might have tostep in it needs to be seen how it playsout
Q Is there an analogy to the real worldin the fact that in a lawless society onlythe rich can afford security Securitysolutions differentiate between classesA Schneier disagreed with the state-ment giving an analogy to front-doorlocks but conceded that there might bespecial cases
Q Doesnrsquot homogeneity hurt securityA Homogeneity is oversold Diversetypes can be more survivable but giventhe limited number of options the dif-ference will be negligible
Q Regulations in airbag protection haveled to deaths in some cases How can wekeep the pendulum from swinging theother way A Lobbying will not be pre-vented An imperfect solution is proba-ble there might be reversals ofrequirements such as the airbag one
Q What about personal liability A Thiswill be analogous to auto insurance lia-bility comes with computerNet access
Q If the rule of law is to become realitythere must be a law enforcement func-tion that applies to a physical space Youcannot do that with any existing govern-ment agency for the whole worldShould an organization like the UNassume that role A Schneier was notconvinced global enforcement is possi-ble
Q What advice should we take awayfrom this talk A Liability is comingSince the network is important to ourinfrastructure eventually the problems
USENIX 2002
will be solved in a legal environmentYou need to start thinking about how tosolve the problems and how the solu-tions will affect us
The entire speech (including slides) isavailable at httpwwwcounterpanecompresentation4pdf
LIFE IN AN OPEN SOURCE STARTUP
Daryll Strauss Consultant
Summarized by Teri Lampoudi
This talk was packed with morsels ofinsight and tidbits of information aboutwhat life in an open source startup islike (though ldquowas likerdquo might be moreappropriate) what the issues are instarting maintaining and commercial-izing an open source project and theway hardware vendors treat open sourcedevelopers Strauss began by tracing thetimeline of his involvement with devel-oping 3-D graphics support for Linuxfrom the 3dfx voodoo1 driver for Linuxin 1997 to the establishment of Preci-sion Insight in 1998 to its buyout by VALinux to the dismantling of his groupby VA to what the future may hold
Obviously there are benefits from doingopen source development both for thedeveloper and for the end user Animportant point was that open sourcedevelops entire technologies not justproducts A misapprehension is theinherent difficulty of managing a projectthat accepts code from a large number ofdevelopers some volunteer and somepaid The notion that code can just bethrown over the wall into the world is aBig Lie
How does one start and keep afloat anopen source development company Inthe case of Precision Insight the subjectmatter ndash developing graphics andOpenGL for Linux ndash required a consid-erable amount of expertise intimateknowledge of the hardware the librariesX and the Linux kernel as well as theend applications Expertise is mar-ketable What helps even further is hav-ing a visible virtual team of expertspeople who have an established trackrecord of contributions to open source
68 Vol 27 No 5 login
in the particular area of expertise Sup-port from the hardware and softwarevendors came next While everyone waswilling to contract PI to write the por-tions of code that were specific to theirhardware no one wanted to pay the billfor developing the underlying founda-tion that was necessary for the drivers tobe useful In the PI model the infra-structure cost was shared by several cus-tomers at once and the technology keptmoving Once a driver is written for avendor the vendor is perfectly capableof figuring out how to write the nextdriver necessary thereby obviating theneed to contract the job This and ques-tions of protecting intellectual propertyndash hardware design in this case ndash aredeeply problematic with the open sourcemode of development This is not to saythat there is no way to get over thembut that they are likely to arise moreoften than not
Management issues seem equally impor-tant In the PI setting contracts wereflexible ndash both a good and a bad thing ndashand developers overworked Furtherdevelopment ldquoin a fishbowlrdquo that isunder public scrutiny is not easyStrauss stressed the value of good com-munication and the use of various out-of-sync communication methods likeIRC and mailing lists
Finally Strauss discussed what portionsof software can be free and what can beproprietary His suggestion was that hor-izontal markets want to be free whereasvertically one can develop proprietarysolutions The talk closed with a smallstroll through the problems that projectpolitics brings up from things likemerging branches to mailing list andsource tree readability
GENERAL TRACK SESSIONS
FILE SYSTEMS
Summarized by Haijin Yan
STRUCTURE AND PERFORMANCE OF THE
DIRECT ACCESS FILE SYSTEM
Kostas Magoutis Salimah AddetiaAlexandra Fedorova and Margo ISeltzer Harvard University JeffreyChase Andrew Gallatin Richard Kisleyand Rajiv Wickremesinghe Duke Uni-versity and Eran Gabber Lucent Tech-nologies
This paper won the Best Paper award
The Direct Access File System (DAFS) isa new standard for network-attachedstorage over direct-access transport net-works DAFS takes advantage of user-level network interface standards thatenable client-side functionality in whichremote data access resides in a libraryrather than in the kernel This reducesthe overhead of memory copy for datamovement and protocol overheadRemote Direct Memory Access (RDMA)is a direct access transport network thatallows the network adapter to reducecopy overhead by accessing applicationbuffers directly
This paper explores the fundamentalstructural and performance characteris-tics of network file access using a user-level file system structure on adirect-access transport network withRDMA It describes DAFS-based clientand server reference implementationsfor FreeBSD and reports experimentalresults comparing DAFS to a zero-copyNFS implementation It illustrates thebenefits and trade-offs of these tech-niques to provide a basis for informedchoices about deployment of DAFS-based systems and similar extensions toother network file protocols such asNFS Experiments show that DAFS givesapplications direct control over an IOsystem and increases the client CPUrsquosusage while the client is doing IOFuture work includes how to addresslongstanding problems related to theintegration of the application and filesystem for high-performance applica-tions
CONQUEST BETTER PERFORMANCE THROUGH
A DISKPERSISTENT-RAM HYBRID FILE
SYSTEM
An-I A Wang Peter Reiher and GeraldJ Popek UCLA Geoffrey M Kuenning Harvey Mudd College
Motivated by the declining cost of per-sistent RAM the authors propose theConquest file system which stores allsmall files metadata executables andshared libraries in persistent RAM diskshold only the data content of remaininglarge files Compared to alternativessuch as caching and RAM file systemsConquest has the advantages of effi-ciency consistency and reliability at areduced cost Using popular bench-marks experiments show that Conquestincurs little overhead while achievingfaster performance Future workincludes designing mechanisms foradjusting file-size threshold dynamicallyand finding a better disk layout for largedata blocks
EXPLOITING GRAY-BOX KNOWLEDGE OF
BUFFER-CACHE MANAGEMENT
Nathan C Burnett John Bent AndreaC Arpaci-Dusseau and Remzi HArpaci-Dusseau University of Wisconsin Madison
Knowing what algorithm is used tomanage the buffer cache is very impor-tant for improving application perfor-mance However there is currently nointerface for finding this algorithm Thispaper introduces Dust a simple finger-printing tool that is able to identify thebuffer-cache replacement policy Dustautomatically identifies the cache sizeand replacement policy based on theconfiguring attributes of access ordersrecency frequency and long-term his-tory Through simulation Dust was ableto distinguish between a variety ofreplacement algorithm policies found inthe literature FIFO LRU LFU ClockSegmented FIFO 2Qand LRU-K Fur-ther experiments of fingerprinting realoperating system such as NetBSDLinux and Solaris show that Dust is ableto identify the hidden cache replacementalgorithm
69October 2002 login
C
ON
FERE
NC
ERE
PORT
SBy knowing the underlying cachereplacement policy a cache-aware Webserver can reschedule the requests on acached request-first policy to obtain per-formance improvement Experimentsshow that cache-aware schedulingimproves average response time and sys-tem throughput
OPERATING SYSTEMS
(AND DANCING BEARS)
Summarized by Matt Butner
THE JX OPERATING SYSTEM
Michael Golm Meik Felser ChristianWawersich and Juumlrgen Kleinoumlder University of Erlagen-Nuumlrnberg
The talk opened by emphasizing theneed for and the practicality of a func-tional Java OS Such implementation inthe JX OS attempts to mimic the recenttrend toward using highly abstractedlanguages in application development inorder to create an OS as functional andpowerful as one written in a lower-levellanguage
The JX is a micro-kernel solution thatuses separate JVMs for each entity of thekernel and in some cases for eachapplication The separated domains donot share objects and have no threadmigration and each domain implementsits own code The interesting part of thepresentation was the discussion of sys-tem-level programming with Java keyareas discussed were memory manage-ment and interrupt handlers Theauthors concluded by noting that theirtype-safe and modular system resultedin a robust system with great configura-tion flexibility and acceptable perfor-mance
DESIGN EVOLUTION OF THE EROS
SINGLE-LEVEL STORE
Jonathan S Shapiro Johns HopkinsUniversity Jonathan Adams Universityof Pennsylvania
This presentation outlined current char-acteristics of file systems and some oftheir least desirable characteristics Itwas based on the revival of the EROS
single-level-store approach for currentattempts to capitalize on system consis-tency and efficiency Their solution didnot relate directly to EROS but ratherto the design and use of the constructsand notions on which EROS is based Byextending the mapping of the memorysystem to include the disk systems arefurther able to ensure global persistencewithout regard for a disk structure Inexplaining the need for absolute systemconsistency and an environment whichby definition is not allowed to crash thegoal of having such an exhaustive designbecomes clear The cool part of thiswork is the availability of its design andarchitecture to the public
THINK A SOFTWARE FRAMEWORK FOR
COMPONENT-BASED OPERATING SYSTEM
KERNELS
Jean-Philippe Fassino France TeacuteleacutecomRampD Jean-Bernard Stefani INRIA JuliaLawall DIKU Gilles Muller INRIA
This presentation discussed the need forcomponent-based operating systemsand respective structures to ensure flexi-bility for arbitrary-sized systems Thinkprovides a binding model that mapsuniformed components for OS develop-ers and architects to follow ensuringconsistent implementations for an arbi-trary system size However the goal isnot to force developers into a predefinedkernel but to promote the use of certaincomponents in varied ways
Thinkrsquos concentration is primarily onembedded systems where short devel-opment time is necessary but is con-strained by rigorous needs and limitedresources The need to build flexible sys-tems in such an environment can becostly and implementation-specific butThink creates an environment sup-ported by the ability to dynamically loadtype-safe components This allows formore flexible systems that retain func-tionality because of Thinkrsquos ability tobind fine-grained components Bench-marks revealed that dedicated micro-kernels can show performanceimprovements in comparison to mono-
USENIX 2002
lithic kernels Notable future workincludes building components for low-end appliances and the development of areal-time OS component library
BUILDING SERVICES
Summarized by Pradipta De
NINJA A FRAMEWORK FOR NETWORK
SERVICES
J Robert von Behren Eric A BrewerNikita Borisov Michael Chen MattWelsh Josh MacDonald Jeremy Lauand David Culler University of Califor-nia Berkeley
Robert von Behren presented Ninja anongoing project that aims to provide aframework for building robust and scal-able Internet-based network serviceslike Web-hosting instant-messagingemail and file-sharing applicationsRobert drew attention to the difficultiesof writing cluster applications One hasto take care of data consistency andissues of concurrency and for robustapplications there are problems relatedto fault tolerance Ninja works as awrapper to relieve the user of theseproblems The goal of the project isbuilding network services that are scala-ble and highly available maintainingpersistent data providing gracefuldegradation and supporting online evo-lution
The use of clusters distinguishes thissetup from generic distributed systemsin terms of reliability and security aswell as providing a partition-free net-work The next important feature of thisproject is the new programming modelwhich is more restrictive than a generalprogramming model but still expressiveenough to write most of the applica-tions This model described as ldquosingleprogram multiple connectionrdquo usesintertask parallelism instead of multi-threaded concurrency and is character-ized by all nodes running the sameprogram with connections delegated tothe nodes by a centralized connectionmanager (CM) A CM takes care of hid-ing the details of mapping an externalconnection to an internal node Ninja
70 Vol 27 No 5 login
also uses a design called ldquostaged event-driven architecturerdquo where each serviceis broken down into a set of stages con-nected by event queues This architec-ture is suitable for a modular design andhelps in graceful degradation by adap-tive load shedding from the eventqueues
The presentation concluded with exam-ples of implementation and evaluationof a Web server and an email systemcalled NinjaMail to show the ease ofauthoring and the efficacy of using theNinja framework for service develop-ment
USING COHORT-SCHEDULING TO ENHANCE
SERVER PERFORMANCE
James R Larus and Michael ParkesMicrosoft Research
James Larus presented a new schedulingpolicy for increasing the throughput ofserver applications Cohort schedulingbatches execution of similar operationsarising in different server requests Theusual programming paradigm in han-dling server requests is to spawn multi-ple concurrent threads and switch fromone thread to another whenever a threadgets blocked for IO Since the threads ina server mostly execute unrelated piecesof code the locality of reference isreduced hence the effectiveness of dif-ferent caching mechanisms One way tosolve this problem is to throw in morehardware But Larus presented a com-plementary view where the programbehavior is investigated and used toimprove the performance
The problem in this scheme is to iden-tify pieces of code that can be batchedtogether for processing One simple wayis to look at the next program countervalues and use them to club threadstogether However this talk presented anew programming abstraction ldquostagedcomputationrdquo which replaces the threadmodel with ldquostagesrdquo A stage is anabstraction for operations with similarbehavior and locality The StagedServerlibrary can be used for programming inthis model It is a collection of C++
classes that implement staged computa-tion and cohort scheduling on either auniprocessor or multiprocessor It canbe used to define stages
The presentation showed the experi-mental evaluation of the cohort schedul-ing over the thread-based model byimplementing two servers a Web serverwhich is mainly IO bound and a pub-lish-subscribe server which is mainlycompute bound The SURGE bench-mark was used for the first experimentand the Fabret workload for the secondThe results showed that cohort-schedul-ing-based implementation gave a betterthroughput than the thread-basedimplementation at high loads
NETWORK PERFORMANCE
Summarized by Xiaobo Fan
ETE PASSIVE END-TO-END INTERNET SERVICE
PERFORMANCE MONITORING
Yun Fu Amin Vahdat Duke UniversityLudmila Cherkasova Wenting Tang HPLabs
This paper won the Best Student Paperaward
Ludmila Cherkasova began by listingseveral questions most Web serviceproviders want answered in order toimprove service quality She reviewedthe difficulties of making accurate andefficient end-to-end Web service mea-surement and the shortcomings of cur-rently available approaches Theypropose a passive trace-based architec-ture called EtE to monitor Web serverperformance on behalf of end users
The first step is to collect network pack-ets passively The second module recon-structs all TCP connections and extractsHTTP transactions To reconstruct Webpage accesses they first build a knowl-edge base indexed by client IP and URLand then group objects to the relatedWeb pages they are embedded in Statis-tical analysis is used to handle non-matched objects EtE Monitor cangenerate three groups of metrics to mea-sure Web service performance responsetime Web caching and page abortion
To demonstrate the benefits of EtE mon-itor Cherkasova talked about two casestudies and based on the calculatedmetrics gave some insightful explana-tions about whatrsquos happening behind the variations of Web performanceThrough validation they claim theirapproach provides a very close approxi-mation to the real scenario
THE PERFORMANCE OF REMOTE DISPLAY
MECHANISMS FOR THIN-CLIENT COMPUTING
S Jae Yang Jason Nieh Matt SelskyNikhil Tiwari Columbia University
Noting the trend toward thin-clientcomputing the authors compared dif-ferent techniques and design choices inmeasuring the performance of six popu-lar thin-client platforms ndash CitrixMetaFrame Microsoft Terminal Ser-vices Sun Ray Tarantella VNC and XAfter pointing out several challenges inbenchmarking thin clients Yang pro-posed slow-motion benchmarking toachieve non-invasive packet monitoringand consistent visual quality Basicallythey insert delays between separatevisual events in the benchmark applica-tions of the server side so that theclientrsquos display update can catch up withthe serverrsquos processing speed The exper-iments are conducted on an emulatednetwork over a range of network band-widths
Their results show that thin clients canprovide good performance for Webapplications in LAN environments butonly some platforms performed well forvideo benchmark Pixel-based encodingmay achieve better performance andbandwidth efficiency than high-levelgraphics Display caching and compres-sion should be used with care
A MECHANISM FOR TCP-FRIENDLY
TRANSPORT-LEVEL PROTOCOL
COORDINATION
David E Ott and Ketan Mayer-PatelUniversity of North Carolina ChapelHill
A revised transport-level protocol opti-mized for cluster-to-cluster (C-to-C)
71October 2002 login
C
ON
FERE
NC
ERE
PORT
Sapplications is what David Ott tries toexplain in his talk A C-to-C applicationclass is identified as one set of processescommunicating to another set ofprocesses across a common Internetpath The fundamental problem with C-to-C applications is how to coordinateall C-to-C communication flows so thatthey share a consistent view of the com-mon C-to-C network adapt to changingnetwork conditions and cooperate tomeet specific requirements Aggregationpoints (AP) are placed at the first andlast hop of the common data path toprobe network conditions (latencybandwidth loss rate etc) To carry andtransfer this information a new protocolndash Coordination Protocol (CP) ndash isinserted between the network layer (IP)and the transport layer (TCP UCP etc)Ott illustrated how the CP header isupdated and used when packets origi-nate from source and traverse throughlocal and remote AP to arrive at theirdestination and how AP maintains aper-cluster state table and detects net-work conditions Through simulationresults this coordination mechanismappears effective in sharing commonnetwork resources among C-to-C com-munication flows
STORAGE SYSTEMS
Summarized by Praveen Yalagandula
MY CACHE OR YOURS MAKING STORAGE
MORE EXCLUSIVE
Theodore M Wong Carnegie MellonUniversity John Wilkes HP Labs
Theodore Wong explained the ineffi-ciency of current ldquoinclusiverdquo cachingschemes in storage area networks ndashwhen a client accesses a block the blockis read from the disk and is cached atboth the disk array cache and at theclientrsquos cache Then he presented theconcept of ldquoexclusiverdquo caching where ablock accessed by a client is only cachedin that clientrsquos cache On eviction fromthe clientrsquos cache they have come upwith a DEMOTE operation to move thedata block to the tail of the array cache
The new exclusion caching schemes areevaluated on both single-client and mul-tiple-client systems The single-clientresults are presented for two differenttypes of synthetic workloads Random(transaction-processing type workloads)and Zipf (typical Web workloads) Theexclusive policy was quite effective inachieving higher hit rates and lower readlatencies They also showed a 22 timeshit-rate improvement over inclusivetechniques in the case of a real-lifeworkload httpd with a single-client set-ting
The DEMOTE scheme also performedwell in the case of multiple-client sys-tems when the data accessed by clients isdisjointed In the case of shared dataworkloads this scheme performed worsethan the inclusive schemes Theodorethen presented an adaptive exclusivecaching scheme ndash a block accessed by aclient is placed at the tail in the clientrsquoscache and also in the array cache at anappropriate place determined by thepopularity of the block The popularityof the block is measured by maintaininga ghost cache to accumulate the numberof times each block is accessed This newadaptive scheme has achieved a maxi-mum of 52 speedup in the meanlatency in the experiments with real-lifeworkloads
For more information visit httpwwwcscmuedu~tmwongresearch and httpwwwhplhpcomresearchitccslssp
BRIDGING THE INFORMATION GAP IN
STORAGE PROTOCOL STACKS
Timothy E Denehy Andrea C Arpaci-Dusseau Remzi H Arpaci-DusseauUniversity of Wisconsin Madison
Currently there is a huge informationgap between storage systems and file sys-tems The interface exposed by storagesystems to file systems based on blocksand providing only readwrite interfacesis very narrow This leads to poor per-formance as a whole because of dupli-cated functionality in both systems and
USENIX 2002
reduced functionality resulting fromstorage systemsrsquo lack of file information
The speaker presented two enhance-ments ExRaid an exposed RAID andILFS Informed Log-Structured File Sys-tem ExRAID is an enhancement to theblock-based RAID storage system thatexposes the following three types ofinformation to the file system (1)regions ndash contiguous portions of theaddress space comprising one or multi-ple disks (2) performance informationabout queue lengths and throughput ofthe regions revealing disk heterogeneityto the file systems and (3) failure infor-mation ndash dynamically updated informa-tion conveying the number of tolerablefailures in each region
ILFS allows online incremental expan-sion of the storage space performsdynamic parallelism using ExRAIDrsquosperformance information to performwell on heterogeneous storage systemsand provides a range of different mecha-nisms with different granularities forredundancy of files using ExRAIDrsquosregion failure characteristics These newfeatures are added to LFS with only a19 increase in the code size
More informationhttpwwwcswisceduwind
MAXIMIZING THROUGHPUT IN REPLICATED
DISK STRIPING OF VARIABLE
BIT-RATE STREAMS
Stergios V Anastasiadis Duke Univer-sity Kenneth C Sevcik and MichaelStumm University of Toronto
There is an increasing demand for thecontinuous real-time streaming ofmedia files Disk striping is a commontechnique used for supporting manyconnections on the media servers Buteven with 12 million hours of meantime between failures there will be morethan one disk failure per week with 7000disks Fault tolerance can be achieved byusing data replication and reservingextra bandwidth during normal opera-tion This work focused on supportingthe most common variable bit-ratestream formats (eg MPEG)
72 Vol 27 No 5 login
The authors used prototype mediaserver EXEDRA to implement and eval-uate different replication and bandwidthreservation schemes This system sup-ports a variable bit-rate scheme doesstride-based disk allocation and is capa-ble of supporting variable-grain diskstriping Two replication policies arepresented deterministic ndash data blocksare replicated in a round-robin fashionon the secondary replicas random ndashdata blocks are replicated on the ran-domly chosen secondary replicas Twobandwidth reservation techniques arealso presented mirroring reservation ndashdisk bandwidth is reserved for both primary and replicas of the media fileduring the playback and minimumreservation ndash a more efficient scheme inwhich bandwidth is reserved only for thesum of primary data access time and themaximum of the backup data accesstimes required in each round
Experimental results showed that deter-ministic replica placement is better thanrandom placement for small disks mini-mum disk bandwidth reservation istwice as good as mirroring in thethroughput achieved and fault tolerancecan be achieved with a minimal impacton the throughput
TOOLS
Summarized by Li Xiao
SIMPLE AND GENERAL STATISTICAL PROFILING
WITH PCT
Charles Blake and Steven Bauer MIT
This talk introduced a Profile CollectionToolkit (PCT) ndash a sampling-based CPUprofiling facility A novel aspect of PCTis that it allows sampling of semanticallyrich data such as function call stacksfunction parameters local or globalvariables CPU registers or other execu-tion contexts The design objectives ofPCT were driven by user needs and theinadequacies or inaccessibility of priorsystems
The rich data collection capability isachieved via a debugger-controller pro-gram dbct1 The talk then introducedthe profiling collector and data collec-
tion activation PCT file format optionsprovide flexibility and various ways tostore exact states PCT also has analysisoptions two examples were given ndashldquoSimplerdquo and ldquoMixed User and LinuxCoderdquo
The talk then went into how generalsampling helps in the following areas adebugger-controller state machineexample script fragments a simpleexample program and general valuereports in terms of call-path histogramsnumeric value histograms and data gen-erality Related work such as gprof andExpect was then summarized The con-tribution of this work is to provide aportable general value sampling toolThe limitations are that PCT does notsupport much of strip and count inac-curacies happen because of its statisticalnature Based on this limitation -g ispreferred over strip
Possible future directions included sup-porting more debuggers such as dbxand xdb script generators for otherkinds of traces more canned reports forgeneral values and a libgdb-basedlibrary sampler PCT is available athttppdoslcsmitedupct PCT runs onalmost any UNIX-like system
ENGINEERING A DIFFERENCING AND
COMPRESSION DATA FORMAT
David G Korn and Kiem Phong VoATampT Laboratories ndash Research
This talk began with an equation ldquoDif-ferencing + Compression = Delta Com-pressionrdquo Compression removesinformation redundancy in a data setExamples are gzip bzip compress andpack Differencing encodes differencesbetween two data sets Examples are diff -e fdelta bdiff and xdelta Deltacompression compresses a data set givenanother combining differencing andcompression and reduces to pure com-pression when therersquos no commonality
After presenting an overall scheme for adelta compressor and showing deltacompression performance this talk dis-cussed the encoding format of a newlydesigned Vcdiff for delta compression A
Vcdiff instruction code table consists of256 entries of each coding up to a pair ofinstructions and recodes I-byte indicesand any additional data
The talk then showed Vcdiff rsquos perfor-mance with Web data collected fromCNN compared with W3C standardGdiff Gdiff+gzip and diff+gzip whereGdiff was computed using Vcdiff deltainstructions The results of two experi-ments ldquoFirstrdquo and ldquoSuccessiverdquo werepresented In ldquoFirstrdquo each file is com-pressed against the first file collected inldquoSuccessiverdquo each file is compressedagainst the one in the previous hourThe diff+gzip did not work well becausediff was line-oriented Vcdiff performedfavorably compared with other formats
Vcdiff is part of the Vcodex packageVcodex is a platform for all commondata transformations delta compres-sion plain compression encryptiontranscoding (eg uuencode base64) Itis structured in three layers for maxi-mum usability Base library uses Dis-plines and Methods interfaces
The code for Vcdiff can be found athttpwwwresearchattcomswtoolsThey are moving Vcdiff to the IETFstandard as a comprehensive platformfor transforming data Please refer tohttpwwwietforginternet-draftdraft-korn-vcdiff06txt
WHERE IN THE NET
Summarized by Amit Purohit
A PRECISE AND EFFICIENT EVALUATION OF
THE PROXIMITY BETWEEN WEB CLIENTS AND
THEIR LOCAL DNS SERVERS
Zhuoqing Morley Mao University ofCalifornia Berkeley Charles D CranorFred Douglis Michael RabinovichOliver Spatscheck and Jia Wang ATampTLabs ndash Research
Content Distribution Networks (CDNs)attempt to improve Web performance bydelivering Web content to end usersfrom servers located at the edge of thenetwork When a Web client requestscontent the CDN dynamically chooses aserver to route the request to usually the
73October 2002 login
C
ON
FERE
NC
ERE
PORT
Sone that is nearest the client As CDNshave access only to the IP address of thelocal DNS server (LDNS) of the clientthe CDNrsquos authoritative DNS servermaps the clientrsquos LDNS to a geographicregion within a particular network andcombines that with network and serverload information to perform CDNserver selection
This method has two limitations First itis based on the implicit assumption thatclients are close to their LDNS This maynot always be valid Second a singlerequest from an LDNS can represent adiffering number of Web clients ndash calledthe hidden load factor Knowledge of thehidden load factor can be used toachieve better load distribution
The extent of the first limitation and itsimpact on the CDN server selection isdealt with To determine the associationsbetween clients and their LDNS a sim-ple non-intrusive and efficient map-ping technique was developed The datacollected was used to study the impact ofproximity on DNS-based server selec-tion using four different proximity met-rics (1) autonomous system (AS)clustering ndash observing whether a client isin the same AS as its LDNS ndash concludedthat LDNS is very good for coarse-grained server selection as 64 of theassociations belong to the same AS(2) network clustering ndash observingwhether a client is in the same network-aware cluster (NAC) ndash implied that DNSis less useful for finer-grained serverselection since only 16 of the clientand LDNS are in the same NAC (3)traceroute divergence ndash the length of thedivergent paths to the client and itsLDNS from a probe point using tracer-oute ndash implies that most clients aretopologically close to their LDNS asviewed from a randomly chosen probesite (4) round-trip time (RTT) correla-tion (some CDNs select severs based onRTT between the CDN server and theclientrsquos LDNS) examines the correlationbetween the message RTTs from a probepoint to the client and its local DNS
results imply this is a reasonable metricto use to avoid really distant servers
A study of the impact that client-LDNSassociations have on DNS-based serverselection concludes that knowing theclientrsquos IP address would allow moreaccurate server selection in a large num-ber of cases The optimality of the serverselection also depends on the serverdensity placement and selection algo-rithms
Further information can be found athttpwwweecsberkeleyedu~zmaomyresearchhtml or by contactingzmaoeecsberkeleyedu
GEOGRAPHIC PROPERTIES OF INTERNET
ROUTING
Lakshminarayanan SubramanianVenkata N Padmanabhan MicrosoftResearch Randy H Katz University ofCalifornia at Berkeley
Geographic information can provideinsights into the structure and function-ing of the Internet including interac-tions between different autonomoussystems by analyzing certain propertiesof Internet routing It can be used tomeasure and quantify certain routingproperties such as circuitous routinghot-potato routing and geographic faulttolerance
Traceroute has been used to gather therequired data and Geotrack tool hasbeen used to determine the location ofthe nodes along each network path Thisenables the computation of ldquolinearizeddistancesrdquo which is the sum of the geo-graphic distances between successivepairs of routers along the path
In order to measure the circuitousnessof a path a metric ldquodistance ratiordquo hasbeen defined as the ratio of the lin-earized distance of a path to the geo-graphic distance between the source anddestination of the path From the data ithas been observed that the circuitous-ness of a route depends on both geo-graphic and network location of the endhosts A large value of the distance ratioenables us to flag paths that are highly
USENIX 2002
circuitous possibly (though not neces-sarily) because of routing anomalies Itis also shown that the minimum delaybetween end hosts is strongly correlatedwith the linearized distance of the path
Geographic information can be used tostudy various aspects of wide-area Inter-net paths that traverse multiple ISPs Itwas found that end-to-end Internetpaths tend to be more circuitous thanintra-ISP paths the cause for this beingthe peering relationships between ISPsAlso paths traversing substantial dis-tances within two or more ISPs tend tobe more circuitous than paths largelytraversing only a single ISP Anotherfinding is that ISPs generally employhot-potato routing
Geographic information can also beused to capture the fact that two seem-ingly unrelated routers can be suscepti-ble to correlated failures By using thegeographic information of routers wecan construct a geographic topology ofan ISP Using this we can find the toler-ance of an ISPrsquos network to the total net-work failure in a geographic region
For further information contactlakmecsberkeleyedu
PROVIDING PROCESS ORIGIN INFORMATION
TO AID IN NETWORK TRACEBACK
Florian P Buchholz Purdue UniversityClay Shields Georgetown University
Network traceback is currently limitedbecause host audit systems do not main-tain enough information to matchincoming network traffic to outgoingnetwork traffic The talk presented analternative assigning origin informationto every process and logging it duringinteractive login creation
The current implementation concen-trates mainly on interactive sessions inwhich an event is logged when a newconnection is established using SSH orTelnet The method is effective andcould successfully determine steppingstones and reliably detect the source of aDDoS attack The speaker then talkedabout the related work done in this area
74 Vol 27 No 5 login
which led to discussion of the latestdevelopment in the area of ldquohostcausalityrdquo
The speaker ended with the limitationsand future directions of the frameworkThis framework could be extended andcould find many applications in the areaof security Current implementationdoesnrsquot take care of startup scripts andcron jobs but incorporating the origininformation in FS could solve this prob-lem In the current implementation log-ging is just implemented as a proof ofconcept It could be made safe in manyways and this could be another impor-tant aspect of future work
PROGRAMMING
Summarized by Amit Purohit
CYCLONE A SAFE DIALECT OF C
Trevor Jim ATampT Labs ndash Research GregMorrisett Dan Grossman MichaelHicks James Cheney Yanling WangCornell University
Cyclone is designed to provide protec-tion against attacks such as buffer over-flows format string attacks andmemory management errors The cur-rent structure of C allows programmersto write vulnerable programs Cycloneextends C so that it has the safety guar-antee of Java while keeping C syntaxtypes and semantics untouched
The Cyclone compiler performs staticanalysis of the source code and insertsruntime checks into the compiled out-put at places where the analysis cannotdetermine that the code execution willnot violate safety constraints Cycloneimposes many restrictions to preservesafety such as NULL checksThesechecks do not exist in normal C
The speaker then talked about somesample code written in Cyclone and howit tackles safety problems Porting anexisting C application to Cyclone ispretty easy with fewer than 10 changerequired The current implementationconcentrates more on safety than onperformance ndash hence the performance
penalty is substantial Cyclone was ableto find many lingering bugs The finalstep of the project will be to write acompiler to convert a normal C programto an equivalent Cyclone program
COOPERATIVE TASK MANAGEMENT WITHOUT
MANUAL STACK MANAGEMENT
Atul Adya Jon Howell MarvinTheimer William J Bolosky John RDouceur Microsoft Research
The speaker described the definitionsand motivations behind the distinctconcepts of task management stackmanagement IO management conflictmanagement and data partitioningConventional concurrent programminguses preemptive task management andexploits the automatic stack manage-ment of a standard language In the sec-ond approach cooperative tasks areorganized as event handlers that yieldcontrol by returning control to the eventscheduler manually unrolling theirstacks In this project they have adopteda hybrid approach that makes it possiblefor both stack management styles tocoexist Thus programmers can codeassuming one or the other of these stackmanagement styles will operate depend-ing upon the application The speakeralso gave a detailed example of how touse their mechanism to switch betweenthe two styles
The talk then continued with the imple-mentation They were able to preemptmany subtle concurrency problems byusing cooperative task managementPaying a cost up-front to reduce a subtlerace condition proved to be a goodinvestment
Though the choice of task managementis fundamental the choice of stack man-agement can be left to individual tasteThis project enables use of any type ofstack management in conjunction withcooperative task management
IMPROVING WAIT-FREE ALGORITHMS FOR
INTERPROCESS COMMUNICATION IN
EMBEDDED REAL-TIME SYSTEMS
Hai Huang Padmanabhan Pillai KangG Shin University of Michigan
The main characteristic of the real-timetime-sensitive system is its pre-dictable response time But concurrencymanagement creates hurdles to achiev-ing this goal because of the use of locksto maintain consistency To solve thisproblem many wait-free algorithmshave been developed but these are typi-cally high-cost solutions By takingadvantage of the temporal characteris-tics of the system however the time andspace overhead can be reduced
The speaker presented an algorithm fortemporal concurrency control andapplied this technique to improve threewait-free algorithms A single-writermultiple-reader wait-free algorithm anda double-buffer algorithm were pro-posed
Using their transformation mechanismthey achieved an improvement of17ndash66 in ACET and a 14ndash70 reduc-tion in memory requirements for IPCalgorithms This mechanism is extensi-ble and can be applied to other non-blocking IPC algorithms as well Futurework involves reducing the synchroniz-ing overheads in more general IPC algo-rithms with multiple-writer semantics
MOBILITY
Summarized by Praveen Yalagandula
ROBUST POSITIONING ALGORITHMS FOR
DISTRIBUTED AD-HOC WIRELESS SENSOR
NETWORKS
Chris Savarese Jan Rabaey Universityof California Berkeley Koen Langen-doen Delft University of Technology
The Pico Radio Network comprisesmore than a hundred sensors monitorsand actuators equipped with wirelessconnectivity with the following proper-ties (1) no infrastructure (2) computa-tion in a distributed fashion instead ofcentralized computation (3) dynamictopology (4) limited radio range
75October 2002 login
C
ON
FERE
NC
ERE
PORT
S(5) nodes that can act as repeaters(6) devices of low cost and with lowpower and (7) sparse anchor nodes ndashnodes with GPS capability A positioningalgorithm determines the geographicalposition of each node in the network
In two-dimensional space each nodeneeds three reference positions to esti-mate its geographical position There aretwo problems that make positioning dif-ficult in the Pico Radio Network typesetting (1) a sparse anchor node prob-lem and (2) a range error problem
The two-phase approach that theauthors have taken in solving the prob-lem consists of (1) a hop-terrain algo-rithm ndash in this first phase each noderoughly guesses its location by the dis-tance calculated using multi-hops to theanchor points and (2) a refinementalgorithm ndash in this second phase eachnode uses its neighborsrsquo positions torefine its own position estimate Toguarantee the convergence thisapproach uses confidence metricsassigning a value of 10 for anchor nodesand 01 to start for other nodes andincreasing these with each iteration
A simulation tool OMNet++ was usedfor both phases and in various scenariosThe results show that they achievedposition errors of less than 33 in a sce-nario with 5 anchor nodes an averageconnectivity of 7 and 5 range mea-surement error
Guidelines for anchor node deploymentare high connectivity (gt10) a reason-able fraction of anchor nodes (gt 5)and for anchor placement coverededges
APPLICATION-SPECIFIC NETWORK
MANAGEMENT FOR ENERGY-AWARE
STREAMING OF POPULAR MULTIMEDIA
FORMATS
Surendar Chandra University of Geor-gia and Amin Vahdat Duke University
The main hindrance in supporting theincreasing demand for mobile multime-dia on PDA is the battery capacity of the
small devices The idle-time power con-sumption is almost the same as thereceive power consumption on typicalwireless network cards (1319mW vs1425mW) while the sleep state con-sumes far less power (177mW) To letthe system consume energy proportionalto the stream quality the network cardshould transition to sleep state aggres-sively between each packet
The speaker presented previous workwhere different multimedia streamswere studied for PDAs MS Media RealMedia and Quicktime It was found thatif the inter-packet gap is predictablethen huge savings are possible forexample about 80 savings in MSmedia streams with few packet lossesThe limitation of the IEEE 80211bpower-save mode comes into play whenthere are two or more nodes producinga delay between beacon time and whenthe node receivessends packets Thisbadly affects the streams since higherenergy is consumed while waiting
The authors propose traffic shaping forenergy conservation where this is doneby proxy such that packets arrive at reg-ular intervals This is achieved using alocal proxy in the access point and aclient-side proxy in the mobile hostSimulations show that traffic shapingreduces energy consumption and alsoreveals that the higher bandwidthstreams have a lower energy metric(mJkB)
More information is at httpgreenhousecsugaedu
CHARACTERIZING ALERT AND BROWSE SER-
VICES OF MOBILE CLIENTS
Atul Adya Paramvir Bahl and Lili QiuMicrosoft Research
Even though there is a dramatic increasein Internet access from wireless devicesthere are not many studies done oncharacterizing this traffic In this paperthe authors characterize the trafficobserved on the MSN Web site withboth notification and browse traces
USENIX 2002
Around 33 million browsing accessesand about 325 million notificationentries are present in the traces Threetypes of analyses are done for each oneof these two services content analysisconcerning the most popular contentcategories and their distribution popu-larity analysis or the popularity distri-bution of documents and user behavioranalysis
The analysis of the notification logsshows that document access rates followa Zipf-like distribution with most of theaccesses concentrated on a small num-ber of messages and the accesses exhibitgeographical locality ndash users from samelocality tend to receive similar notifica-tion content The browser log analysisshows that a smaller set of URLs areaccessed most times though the accesspattern does not fit any Zipf curve andthe highly accessed URLS remain stableA correlation study between the notifi-cation and browsing services shows thatwireless users have a moderate correla-tion of 012
FREENIX TRACK SESSIONS
BUILDING APPLICATIONS
Summarized by Matt Butner
INTERACTIVE 3-D GRAPHICS APPLICATIONS
FOR TCL
Oliver Kersting Juumlrgen Doumlllner HassoPlattner Institute for Software SystemsEngineering University of Potsdam
The integration of 3-D image renderingfunctionality into a scripting languagepermits interactive and animated 3-Ddevelopment and application withoutthe formalities and precision demandedby low-level CC++ graphics and visual-ization libraries The large and complexC++ API of the Virtual Rendering Sys-tem (VRS) can be combined with theconveniences of the Tcl scripting lan-guage The mapping of class interfaces isdone via an automated process and gen-erates respective wrapper classes all ofwhich ensures complete API accessibilityand functionality without any signifi-
76 Vol 27 No 5 login
cant performance retribution The gen-eral C++-to-Tcl mapper SWIG grantsthe necessary object and hierarchal abili-ties without the use of object-orientedTcl extensions The final API mappingtechniques address C++ features such asclasses overloaded methods enumera-tions and inheritance relations All areimplemented in a proof-of-concept thatmaps the complete C++ API of the VRSto Tcl and are showcased in a completeinteractive 3-D-map system
THE AGFL GRAMMAR WORK LAB
Cornelis HA Coster Erik VerbruggenUniversity of Nijmegen (KUN)
The growth and implementation of Nat-ural Language Processing (NLP) is thecornerstone of the continued evolutionand implementation of truly intelligentsearch machines and services In partdue to the growing collections of com-puter-stored human-readable docu-ments in the public domain theimplementation of linguistic analysiswill become necessary for desirable pre-cision and resolution Subsequentlysuch tools and linguistic resources mustbe released into the public domain andthey have done so with the release of theAGFL Grammar Work Lab under theGNU Public License as a tool for lin-guistic research and the development ofNLP-based applications
The AGFL (Affix Grammars over aFinite Lattice) Grammar Work Labmeshes context-free grammars withfinite set-valued features that are accept-able to a range of languages In com-puter science terms ldquoSyntax rules areprocedures with parameters and a non-deterministic executionrdquo The EnglishPhrases for Information Retrieval(EP4IR) released with the AGFL-GWLas a robust grammar of English is anAGFL-GWL generated English parserthat outputs ldquoHeadModifiedrdquo framesThe sentences ldquoCompanyX sponsoredthis conferencerdquo and ldquoThis conferencewas sponsored by CompanyXrdquo both gen-erate [CompanyX[sponsored confer-
ence]] The hope is that public availabil-ity of such tools will encourage furtherdevelopment of grammar and lexicalsoftware systems
SWILL A SIMPLE EMBEDDED WEB SERVER
LIBRARY
Sotiria Lampoudi David M BeazleyUniversity of Chicago
This paper won the FREENIX Best Stu-dent Paper award
SWILL (Simple Web Interface LinkLibrary) is a simple Web server in theform of a C library whose developmentwas motivated by a wish to give coolapplications an interface to the Web TheSWILL library provides a simple inter-face that can be efficiently implementedfor tasks that vary from flexible Web-based monitoring to software debuggingand diagnostics Though originallydesigned to be integrated with high-per-formance scientific simulation softwarethe interface is generic enough to allowfor unbounded uses SWILL is a single-threaded server relying upon non-blocking IO through the creation of atemporary server which services IOrequests SWILL does not provide SSL orcryptographic authentication but doeshave HTTP authentication abilities
A fantastic feature of SWILL is its sup-port for SPMD-style parallel applica-tions which utilize MPI provingvaluable for Beowulf clusters and largeparallel machines Another practicalapplication was the implementation ofSWILL in a modified Yalnix emulator byUniversity of Chicago Operating Sys-tems courses which utilized the addedYalnix functionality for OS developmentand debugging SWILL requires minimalmemory overhead and relies upon theHTTP10 protocol
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
quality work (mostly) lots of hands andan intensely engineering-drivenapproach The downsides include nosupport structure limited project mar-keting expertise and volunteers havinglimited time
In 1998 Sendmail was becoming a suc-cess disaster A success disaster includestwo key elements (1) volunteers startspending all their time supporting cur-rent functionality instead of enhancing(this heavy burden can lead to projectstagnation and eventually death)(2) competition for the better-fundedsources can lead to FUD (fear uncer-tainty and doubt) about the project
Eric then led listeners down the path hetook to turn Sendmail into a businessEric envisioned Sendmail Inc as a small-ish and close-knit company but soonrealized that the family atmosphere hedesired could not last as the companygrew He observed that maybe only thefirst 10 people will buy into your visionafter which each individual is primarilyinterested in something else (eg powerwealth or status) He warns that therewill be people working for you that youdo not like Eric particularly did not carefor sales people but he emphasized howimportant sales marketing finance andlegal people are to companies
The nature of companies in infancy canbe summed up as you need money andyou need investors in order to getmoney Investors want a return oninvestment so money managementbecomes critical Companies thereforemust optimize money functions Ericrsquosexperience with investors were lessonsall in themselves More than givingmoney good investors can provide con-nections and insight so you donrsquot wantinvestors who donrsquot believe in you
A company must have bug historychange management and people whounderstand the product and have a senseof the target market Eric now sees theimportance of marketing targetresearch A company can never forget
66 Vol 27 No 5 login
the simple equation that drives businessprofit = revenue - expense
Finally the concept of value should bebased on what is valuable to the cus-tomer Eric observed that his customersneeded documentation extra value inthe product that they couldnrsquot get else-where and technical support
Eric learned hard lessons along the wayIf you want to do interesting opensource it might be best not to be toosuccessful You donrsquot want to let thelarger dragons (companies) notice youIf you want a commercial user base youhave to manage process watch corpo-rate culture provide value to customerswatch bottom lines and develop a thickskin
Starting a company is easy but perpetu-ating it is extremely difficult
INFORMATION VISUALIZATION FOR
SYSTEMS PEOPLE
Tamara Munzner University of BritishColumbia
Summarized by JD Welch
Munzner presented interesting evidencethat information visualization throughinteractive representations of data canhelp people to perform tasks more effec-tively and reduce the load on workingmemory
A popular technique in visualizing datais abstracting it through node-link rep-resentation ndash for example a list of cross-references in a book Representing therelationships between references in agraphical (node-link) manner ldquooffloadscognition to the human perceptual sys-temrdquo These graphs can be producedmanually but automated drawing allowsfor increased complexity and quickerproduction times
The way in which people interpret visualdata is important in designing effectivevisualization systems Attention to pre-attentive visual cues are key to successPicking out a red dot among a field ofidentically shaped blue dots is signifi-cantly faster than the same data repre-
sented in a mixture of hue and shapevariations There are many pre-attentivevisual cues including size orientationintersection and intensity The accuracyof these cues can be ranked with posi-tion triggering high accuracy and coloror density on the low end
Visual cues combined with differentdata types ndash quantitative (eg 10inches) ordered (small medium large)and categorical (apples oranges) ndash canalso be ranked with spatial positionbeing best for all types Beyond thiscommonality however accuracy variedwidely with length ranked second forquantitative data eighth for ordinal andninth for categorical data types
These guidelines about visual perceptioncan be applied to information visualiza-tion which unlike scientific visualiza-tion focuses on abstract data and choiceof specialization Several techniques areused to clarify abstract data includingmulti-part glyphs where changes inindividual parts are incorporated into aneasier to understand gestalt interactiv-ity motion and animation In large datasets techniques like ldquofocus + contextrdquowhere a zoomed portion of the graph isshown along with a thumbnail view ofthe entire graph are used to minimizeuser disorientation
Future problems include dealing withhuge databases such as the HumanGenome reckoning dynamic data likethe changing structure of the Web orreal-time network monitoring andtransforming ldquopixel boundrdquo displays intolarge ldquodigital wallpaperrdquo-type systems
FIXING NETWORK SECURITY BY HACKING
THE BUSINESS CLIMATE
Bruce Schneier Counterpane InternetSecurity
Summarized by Florian Buchholz
Bruce Schneier identified security as oneof the fundamental building blocks ofthe Internet A certain degree of securityis needed for all things on the Net andthe limits of security will become limitsof the Net itself However companies are
hesitant to employ security measuresScience is doing well but one cannot seethe benefits in the real world Anincreasing number of users means moreproblems affecting more people Oldproblems such as buffer overflowshavenrsquot gone away and new problemsshow up Furthermore the amount ofexpertise needed to launch attacks isdecreasing
Schneier argues that complexity is theenemy of security and that while secu-rity is getting better the growth in com-plexity is outpacing it As security isfundamentally a people problem oneshouldnrsquot focus on technologies butrather should look at businesses busi-ness motivations and business costsTraditionally one can distinguishbetween two security models threatavoidance where security is absoluteand risk management where security isrelative and one has to mitigate the riskwith technologies and procedures In thelatter model one has to find a pointwith reasonable security at an acceptablecost
After a brief discussion on how securityis handled in businesses today in whichhe concluded that businesses ldquotalk bigabout it but do as little as possiblerdquoSchneier identified four necessary stepsto fix the problems
First enforce liabilities Today no realconsequences have to be feared fromsecurity incidents Holding peopleaccountable will increase the trans-parency of security and give an incentiveto make processes public As possibleoptions to achieve this he listed indus-try-defined standards federal regula-tion and lawsuits The problems withenforcement however lie in difficultiesassociated with the international natureof the problem and the fact that com-plexity makes assigning liabilities diffi-cult Furthermore fear of liability couldhave a chilling effect on free softwaredevelopment and could stifle new com-panies
67October 2002 login
C
ON
FERE
NC
ERE
PORT
SAs a second step Schneier identified theneed to allow partners to transfer liabil-ity Insurance would spread liability riskamong a group and would be a CEOrsquosprimary risk analysis tool There is aneed for standardized risk models andprotection profiles and for more securi-ties as opposed to empty press releasesSchneier predicts that insurance will cre-ate a marketplace for security where cus-tomers and vendors will have the abilityto accept liabilities from each otherComputer-security insurance shouldsoon be as common as household or fireinsurance and from that developmentinsurance will become the driving factorof the security business
The next step is to provide mechanismsto reduce risks both before and aftersoftware is released techniques andprocesses to improve software quality aswell as an evolution of security manage-ment are therefore needed Currentlysecurity products try to rebuild the wallsndash such as physical badges and entrydoorways ndash that were lost when gettingconnected Schneier believes thisldquofortress metaphorrdquo is bad one shouldthink of the problem more in terms of acity Since most businesses cannot affordproper security outsourcing is the onlyway to make security scalable With out-sourcing the concenpt of best practicesbecomes important and insurance com-panies can be tied to them outsourcingwill level the playing field
As a final step Schneier predicts thatrational prosecution and education willlead to deterrence He claims that peoplefeel safe because they live in a lawfulsociety whereas the Internet is classifiedas lawless very much like the AmericanWest in the 1800s or a society ruled bywarlords This is because it is difficult toprove who an attacker is prosecution ishampered by complicated evidencegathering and irrational prosecutionSchneier believes however that educa-tion will play a major role in turning theInternet into a lawful society Specifi-cally he pointed out that we need lawsthat can be explained
Risks wonrsquot go away the best we can dois manage them A company able tomanage risk better will be more prof-itable and we need to give CEOs thenecessary risk-management tools
There were numerous questions Listedbelow are the more complicated ones ina paraphrased QampA format
Q Will liability be effective will insur-ance companies be willing to accept therisks A The government might have tostep in it needs to be seen how it playsout
Q Is there an analogy to the real worldin the fact that in a lawless society onlythe rich can afford security Securitysolutions differentiate between classesA Schneier disagreed with the state-ment giving an analogy to front-doorlocks but conceded that there might bespecial cases
Q Doesnrsquot homogeneity hurt securityA Homogeneity is oversold Diversetypes can be more survivable but giventhe limited number of options the dif-ference will be negligible
Q Regulations in airbag protection haveled to deaths in some cases How can wekeep the pendulum from swinging theother way A Lobbying will not be pre-vented An imperfect solution is proba-ble there might be reversals ofrequirements such as the airbag one
Q What about personal liability A Thiswill be analogous to auto insurance lia-bility comes with computerNet access
Q If the rule of law is to become realitythere must be a law enforcement func-tion that applies to a physical space Youcannot do that with any existing govern-ment agency for the whole worldShould an organization like the UNassume that role A Schneier was notconvinced global enforcement is possi-ble
Q What advice should we take awayfrom this talk A Liability is comingSince the network is important to ourinfrastructure eventually the problems
USENIX 2002
will be solved in a legal environmentYou need to start thinking about how tosolve the problems and how the solu-tions will affect us
The entire speech (including slides) isavailable at httpwwwcounterpanecompresentation4pdf
LIFE IN AN OPEN SOURCE STARTUP
Daryll Strauss Consultant
Summarized by Teri Lampoudi
This talk was packed with morsels ofinsight and tidbits of information aboutwhat life in an open source startup islike (though ldquowas likerdquo might be moreappropriate) what the issues are instarting maintaining and commercial-izing an open source project and theway hardware vendors treat open sourcedevelopers Strauss began by tracing thetimeline of his involvement with devel-oping 3-D graphics support for Linuxfrom the 3dfx voodoo1 driver for Linuxin 1997 to the establishment of Preci-sion Insight in 1998 to its buyout by VALinux to the dismantling of his groupby VA to what the future may hold
Obviously there are benefits from doingopen source development both for thedeveloper and for the end user Animportant point was that open sourcedevelops entire technologies not justproducts A misapprehension is theinherent difficulty of managing a projectthat accepts code from a large number ofdevelopers some volunteer and somepaid The notion that code can just bethrown over the wall into the world is aBig Lie
How does one start and keep afloat anopen source development company Inthe case of Precision Insight the subjectmatter ndash developing graphics andOpenGL for Linux ndash required a consid-erable amount of expertise intimateknowledge of the hardware the librariesX and the Linux kernel as well as theend applications Expertise is mar-ketable What helps even further is hav-ing a visible virtual team of expertspeople who have an established trackrecord of contributions to open source
68 Vol 27 No 5 login
in the particular area of expertise Sup-port from the hardware and softwarevendors came next While everyone waswilling to contract PI to write the por-tions of code that were specific to theirhardware no one wanted to pay the billfor developing the underlying founda-tion that was necessary for the drivers tobe useful In the PI model the infra-structure cost was shared by several cus-tomers at once and the technology keptmoving Once a driver is written for avendor the vendor is perfectly capableof figuring out how to write the nextdriver necessary thereby obviating theneed to contract the job This and ques-tions of protecting intellectual propertyndash hardware design in this case ndash aredeeply problematic with the open sourcemode of development This is not to saythat there is no way to get over thembut that they are likely to arise moreoften than not
Management issues seem equally impor-tant In the PI setting contracts wereflexible ndash both a good and a bad thing ndashand developers overworked Furtherdevelopment ldquoin a fishbowlrdquo that isunder public scrutiny is not easyStrauss stressed the value of good com-munication and the use of various out-of-sync communication methods likeIRC and mailing lists
Finally Strauss discussed what portionsof software can be free and what can beproprietary His suggestion was that hor-izontal markets want to be free whereasvertically one can develop proprietarysolutions The talk closed with a smallstroll through the problems that projectpolitics brings up from things likemerging branches to mailing list andsource tree readability
GENERAL TRACK SESSIONS
FILE SYSTEMS
Summarized by Haijin Yan
STRUCTURE AND PERFORMANCE OF THE
DIRECT ACCESS FILE SYSTEM
Kostas Magoutis Salimah AddetiaAlexandra Fedorova and Margo ISeltzer Harvard University JeffreyChase Andrew Gallatin Richard Kisleyand Rajiv Wickremesinghe Duke Uni-versity and Eran Gabber Lucent Tech-nologies
This paper won the Best Paper award
The Direct Access File System (DAFS) isa new standard for network-attachedstorage over direct-access transport net-works DAFS takes advantage of user-level network interface standards thatenable client-side functionality in whichremote data access resides in a libraryrather than in the kernel This reducesthe overhead of memory copy for datamovement and protocol overheadRemote Direct Memory Access (RDMA)is a direct access transport network thatallows the network adapter to reducecopy overhead by accessing applicationbuffers directly
This paper explores the fundamentalstructural and performance characteris-tics of network file access using a user-level file system structure on adirect-access transport network withRDMA It describes DAFS-based clientand server reference implementationsfor FreeBSD and reports experimentalresults comparing DAFS to a zero-copyNFS implementation It illustrates thebenefits and trade-offs of these tech-niques to provide a basis for informedchoices about deployment of DAFS-based systems and similar extensions toother network file protocols such asNFS Experiments show that DAFS givesapplications direct control over an IOsystem and increases the client CPUrsquosusage while the client is doing IOFuture work includes how to addresslongstanding problems related to theintegration of the application and filesystem for high-performance applica-tions
CONQUEST BETTER PERFORMANCE THROUGH
A DISKPERSISTENT-RAM HYBRID FILE
SYSTEM
An-I A Wang Peter Reiher and GeraldJ Popek UCLA Geoffrey M Kuenning Harvey Mudd College
Motivated by the declining cost of per-sistent RAM the authors propose theConquest file system which stores allsmall files metadata executables andshared libraries in persistent RAM diskshold only the data content of remaininglarge files Compared to alternativessuch as caching and RAM file systemsConquest has the advantages of effi-ciency consistency and reliability at areduced cost Using popular bench-marks experiments show that Conquestincurs little overhead while achievingfaster performance Future workincludes designing mechanisms foradjusting file-size threshold dynamicallyand finding a better disk layout for largedata blocks
EXPLOITING GRAY-BOX KNOWLEDGE OF
BUFFER-CACHE MANAGEMENT
Nathan C Burnett John Bent AndreaC Arpaci-Dusseau and Remzi HArpaci-Dusseau University of Wisconsin Madison
Knowing what algorithm is used tomanage the buffer cache is very impor-tant for improving application perfor-mance However there is currently nointerface for finding this algorithm Thispaper introduces Dust a simple finger-printing tool that is able to identify thebuffer-cache replacement policy Dustautomatically identifies the cache sizeand replacement policy based on theconfiguring attributes of access ordersrecency frequency and long-term his-tory Through simulation Dust was ableto distinguish between a variety ofreplacement algorithm policies found inthe literature FIFO LRU LFU ClockSegmented FIFO 2Qand LRU-K Fur-ther experiments of fingerprinting realoperating system such as NetBSDLinux and Solaris show that Dust is ableto identify the hidden cache replacementalgorithm
69October 2002 login
C
ON
FERE
NC
ERE
PORT
SBy knowing the underlying cachereplacement policy a cache-aware Webserver can reschedule the requests on acached request-first policy to obtain per-formance improvement Experimentsshow that cache-aware schedulingimproves average response time and sys-tem throughput
OPERATING SYSTEMS
(AND DANCING BEARS)
Summarized by Matt Butner
THE JX OPERATING SYSTEM
Michael Golm Meik Felser ChristianWawersich and Juumlrgen Kleinoumlder University of Erlagen-Nuumlrnberg
The talk opened by emphasizing theneed for and the practicality of a func-tional Java OS Such implementation inthe JX OS attempts to mimic the recenttrend toward using highly abstractedlanguages in application development inorder to create an OS as functional andpowerful as one written in a lower-levellanguage
The JX is a micro-kernel solution thatuses separate JVMs for each entity of thekernel and in some cases for eachapplication The separated domains donot share objects and have no threadmigration and each domain implementsits own code The interesting part of thepresentation was the discussion of sys-tem-level programming with Java keyareas discussed were memory manage-ment and interrupt handlers Theauthors concluded by noting that theirtype-safe and modular system resultedin a robust system with great configura-tion flexibility and acceptable perfor-mance
DESIGN EVOLUTION OF THE EROS
SINGLE-LEVEL STORE
Jonathan S Shapiro Johns HopkinsUniversity Jonathan Adams Universityof Pennsylvania
This presentation outlined current char-acteristics of file systems and some oftheir least desirable characteristics Itwas based on the revival of the EROS
single-level-store approach for currentattempts to capitalize on system consis-tency and efficiency Their solution didnot relate directly to EROS but ratherto the design and use of the constructsand notions on which EROS is based Byextending the mapping of the memorysystem to include the disk systems arefurther able to ensure global persistencewithout regard for a disk structure Inexplaining the need for absolute systemconsistency and an environment whichby definition is not allowed to crash thegoal of having such an exhaustive designbecomes clear The cool part of thiswork is the availability of its design andarchitecture to the public
THINK A SOFTWARE FRAMEWORK FOR
COMPONENT-BASED OPERATING SYSTEM
KERNELS
Jean-Philippe Fassino France TeacuteleacutecomRampD Jean-Bernard Stefani INRIA JuliaLawall DIKU Gilles Muller INRIA
This presentation discussed the need forcomponent-based operating systemsand respective structures to ensure flexi-bility for arbitrary-sized systems Thinkprovides a binding model that mapsuniformed components for OS develop-ers and architects to follow ensuringconsistent implementations for an arbi-trary system size However the goal isnot to force developers into a predefinedkernel but to promote the use of certaincomponents in varied ways
Thinkrsquos concentration is primarily onembedded systems where short devel-opment time is necessary but is con-strained by rigorous needs and limitedresources The need to build flexible sys-tems in such an environment can becostly and implementation-specific butThink creates an environment sup-ported by the ability to dynamically loadtype-safe components This allows formore flexible systems that retain func-tionality because of Thinkrsquos ability tobind fine-grained components Bench-marks revealed that dedicated micro-kernels can show performanceimprovements in comparison to mono-
USENIX 2002
lithic kernels Notable future workincludes building components for low-end appliances and the development of areal-time OS component library
BUILDING SERVICES
Summarized by Pradipta De
NINJA A FRAMEWORK FOR NETWORK
SERVICES
J Robert von Behren Eric A BrewerNikita Borisov Michael Chen MattWelsh Josh MacDonald Jeremy Lauand David Culler University of Califor-nia Berkeley
Robert von Behren presented Ninja anongoing project that aims to provide aframework for building robust and scal-able Internet-based network serviceslike Web-hosting instant-messagingemail and file-sharing applicationsRobert drew attention to the difficultiesof writing cluster applications One hasto take care of data consistency andissues of concurrency and for robustapplications there are problems relatedto fault tolerance Ninja works as awrapper to relieve the user of theseproblems The goal of the project isbuilding network services that are scala-ble and highly available maintainingpersistent data providing gracefuldegradation and supporting online evo-lution
The use of clusters distinguishes thissetup from generic distributed systemsin terms of reliability and security aswell as providing a partition-free net-work The next important feature of thisproject is the new programming modelwhich is more restrictive than a generalprogramming model but still expressiveenough to write most of the applica-tions This model described as ldquosingleprogram multiple connectionrdquo usesintertask parallelism instead of multi-threaded concurrency and is character-ized by all nodes running the sameprogram with connections delegated tothe nodes by a centralized connectionmanager (CM) A CM takes care of hid-ing the details of mapping an externalconnection to an internal node Ninja
70 Vol 27 No 5 login
also uses a design called ldquostaged event-driven architecturerdquo where each serviceis broken down into a set of stages con-nected by event queues This architec-ture is suitable for a modular design andhelps in graceful degradation by adap-tive load shedding from the eventqueues
The presentation concluded with exam-ples of implementation and evaluationof a Web server and an email systemcalled NinjaMail to show the ease ofauthoring and the efficacy of using theNinja framework for service develop-ment
USING COHORT-SCHEDULING TO ENHANCE
SERVER PERFORMANCE
James R Larus and Michael ParkesMicrosoft Research
James Larus presented a new schedulingpolicy for increasing the throughput ofserver applications Cohort schedulingbatches execution of similar operationsarising in different server requests Theusual programming paradigm in han-dling server requests is to spawn multi-ple concurrent threads and switch fromone thread to another whenever a threadgets blocked for IO Since the threads ina server mostly execute unrelated piecesof code the locality of reference isreduced hence the effectiveness of dif-ferent caching mechanisms One way tosolve this problem is to throw in morehardware But Larus presented a com-plementary view where the programbehavior is investigated and used toimprove the performance
The problem in this scheme is to iden-tify pieces of code that can be batchedtogether for processing One simple wayis to look at the next program countervalues and use them to club threadstogether However this talk presented anew programming abstraction ldquostagedcomputationrdquo which replaces the threadmodel with ldquostagesrdquo A stage is anabstraction for operations with similarbehavior and locality The StagedServerlibrary can be used for programming inthis model It is a collection of C++
classes that implement staged computa-tion and cohort scheduling on either auniprocessor or multiprocessor It canbe used to define stages
The presentation showed the experi-mental evaluation of the cohort schedul-ing over the thread-based model byimplementing two servers a Web serverwhich is mainly IO bound and a pub-lish-subscribe server which is mainlycompute bound The SURGE bench-mark was used for the first experimentand the Fabret workload for the secondThe results showed that cohort-schedul-ing-based implementation gave a betterthroughput than the thread-basedimplementation at high loads
NETWORK PERFORMANCE
Summarized by Xiaobo Fan
ETE PASSIVE END-TO-END INTERNET SERVICE
PERFORMANCE MONITORING
Yun Fu Amin Vahdat Duke UniversityLudmila Cherkasova Wenting Tang HPLabs
This paper won the Best Student Paperaward
Ludmila Cherkasova began by listingseveral questions most Web serviceproviders want answered in order toimprove service quality She reviewedthe difficulties of making accurate andefficient end-to-end Web service mea-surement and the shortcomings of cur-rently available approaches Theypropose a passive trace-based architec-ture called EtE to monitor Web serverperformance on behalf of end users
The first step is to collect network pack-ets passively The second module recon-structs all TCP connections and extractsHTTP transactions To reconstruct Webpage accesses they first build a knowl-edge base indexed by client IP and URLand then group objects to the relatedWeb pages they are embedded in Statis-tical analysis is used to handle non-matched objects EtE Monitor cangenerate three groups of metrics to mea-sure Web service performance responsetime Web caching and page abortion
To demonstrate the benefits of EtE mon-itor Cherkasova talked about two casestudies and based on the calculatedmetrics gave some insightful explana-tions about whatrsquos happening behind the variations of Web performanceThrough validation they claim theirapproach provides a very close approxi-mation to the real scenario
THE PERFORMANCE OF REMOTE DISPLAY
MECHANISMS FOR THIN-CLIENT COMPUTING
S Jae Yang Jason Nieh Matt SelskyNikhil Tiwari Columbia University
Noting the trend toward thin-clientcomputing the authors compared dif-ferent techniques and design choices inmeasuring the performance of six popu-lar thin-client platforms ndash CitrixMetaFrame Microsoft Terminal Ser-vices Sun Ray Tarantella VNC and XAfter pointing out several challenges inbenchmarking thin clients Yang pro-posed slow-motion benchmarking toachieve non-invasive packet monitoringand consistent visual quality Basicallythey insert delays between separatevisual events in the benchmark applica-tions of the server side so that theclientrsquos display update can catch up withthe serverrsquos processing speed The exper-iments are conducted on an emulatednetwork over a range of network band-widths
Their results show that thin clients canprovide good performance for Webapplications in LAN environments butonly some platforms performed well forvideo benchmark Pixel-based encodingmay achieve better performance andbandwidth efficiency than high-levelgraphics Display caching and compres-sion should be used with care
A MECHANISM FOR TCP-FRIENDLY
TRANSPORT-LEVEL PROTOCOL
COORDINATION
David E Ott and Ketan Mayer-PatelUniversity of North Carolina ChapelHill
A revised transport-level protocol opti-mized for cluster-to-cluster (C-to-C)
71October 2002 login
C
ON
FERE
NC
ERE
PORT
Sapplications is what David Ott tries toexplain in his talk A C-to-C applicationclass is identified as one set of processescommunicating to another set ofprocesses across a common Internetpath The fundamental problem with C-to-C applications is how to coordinateall C-to-C communication flows so thatthey share a consistent view of the com-mon C-to-C network adapt to changingnetwork conditions and cooperate tomeet specific requirements Aggregationpoints (AP) are placed at the first andlast hop of the common data path toprobe network conditions (latencybandwidth loss rate etc) To carry andtransfer this information a new protocolndash Coordination Protocol (CP) ndash isinserted between the network layer (IP)and the transport layer (TCP UCP etc)Ott illustrated how the CP header isupdated and used when packets origi-nate from source and traverse throughlocal and remote AP to arrive at theirdestination and how AP maintains aper-cluster state table and detects net-work conditions Through simulationresults this coordination mechanismappears effective in sharing commonnetwork resources among C-to-C com-munication flows
STORAGE SYSTEMS
Summarized by Praveen Yalagandula
MY CACHE OR YOURS MAKING STORAGE
MORE EXCLUSIVE
Theodore M Wong Carnegie MellonUniversity John Wilkes HP Labs
Theodore Wong explained the ineffi-ciency of current ldquoinclusiverdquo cachingschemes in storage area networks ndashwhen a client accesses a block the blockis read from the disk and is cached atboth the disk array cache and at theclientrsquos cache Then he presented theconcept of ldquoexclusiverdquo caching where ablock accessed by a client is only cachedin that clientrsquos cache On eviction fromthe clientrsquos cache they have come upwith a DEMOTE operation to move thedata block to the tail of the array cache
The new exclusion caching schemes areevaluated on both single-client and mul-tiple-client systems The single-clientresults are presented for two differenttypes of synthetic workloads Random(transaction-processing type workloads)and Zipf (typical Web workloads) Theexclusive policy was quite effective inachieving higher hit rates and lower readlatencies They also showed a 22 timeshit-rate improvement over inclusivetechniques in the case of a real-lifeworkload httpd with a single-client set-ting
The DEMOTE scheme also performedwell in the case of multiple-client sys-tems when the data accessed by clients isdisjointed In the case of shared dataworkloads this scheme performed worsethan the inclusive schemes Theodorethen presented an adaptive exclusivecaching scheme ndash a block accessed by aclient is placed at the tail in the clientrsquoscache and also in the array cache at anappropriate place determined by thepopularity of the block The popularityof the block is measured by maintaininga ghost cache to accumulate the numberof times each block is accessed This newadaptive scheme has achieved a maxi-mum of 52 speedup in the meanlatency in the experiments with real-lifeworkloads
For more information visit httpwwwcscmuedu~tmwongresearch and httpwwwhplhpcomresearchitccslssp
BRIDGING THE INFORMATION GAP IN
STORAGE PROTOCOL STACKS
Timothy E Denehy Andrea C Arpaci-Dusseau Remzi H Arpaci-DusseauUniversity of Wisconsin Madison
Currently there is a huge informationgap between storage systems and file sys-tems The interface exposed by storagesystems to file systems based on blocksand providing only readwrite interfacesis very narrow This leads to poor per-formance as a whole because of dupli-cated functionality in both systems and
USENIX 2002
reduced functionality resulting fromstorage systemsrsquo lack of file information
The speaker presented two enhance-ments ExRaid an exposed RAID andILFS Informed Log-Structured File Sys-tem ExRAID is an enhancement to theblock-based RAID storage system thatexposes the following three types ofinformation to the file system (1)regions ndash contiguous portions of theaddress space comprising one or multi-ple disks (2) performance informationabout queue lengths and throughput ofthe regions revealing disk heterogeneityto the file systems and (3) failure infor-mation ndash dynamically updated informa-tion conveying the number of tolerablefailures in each region
ILFS allows online incremental expan-sion of the storage space performsdynamic parallelism using ExRAIDrsquosperformance information to performwell on heterogeneous storage systemsand provides a range of different mecha-nisms with different granularities forredundancy of files using ExRAIDrsquosregion failure characteristics These newfeatures are added to LFS with only a19 increase in the code size
More informationhttpwwwcswisceduwind
MAXIMIZING THROUGHPUT IN REPLICATED
DISK STRIPING OF VARIABLE
BIT-RATE STREAMS
Stergios V Anastasiadis Duke Univer-sity Kenneth C Sevcik and MichaelStumm University of Toronto
There is an increasing demand for thecontinuous real-time streaming ofmedia files Disk striping is a commontechnique used for supporting manyconnections on the media servers Buteven with 12 million hours of meantime between failures there will be morethan one disk failure per week with 7000disks Fault tolerance can be achieved byusing data replication and reservingextra bandwidth during normal opera-tion This work focused on supportingthe most common variable bit-ratestream formats (eg MPEG)
72 Vol 27 No 5 login
The authors used prototype mediaserver EXEDRA to implement and eval-uate different replication and bandwidthreservation schemes This system sup-ports a variable bit-rate scheme doesstride-based disk allocation and is capa-ble of supporting variable-grain diskstriping Two replication policies arepresented deterministic ndash data blocksare replicated in a round-robin fashionon the secondary replicas random ndashdata blocks are replicated on the ran-domly chosen secondary replicas Twobandwidth reservation techniques arealso presented mirroring reservation ndashdisk bandwidth is reserved for both primary and replicas of the media fileduring the playback and minimumreservation ndash a more efficient scheme inwhich bandwidth is reserved only for thesum of primary data access time and themaximum of the backup data accesstimes required in each round
Experimental results showed that deter-ministic replica placement is better thanrandom placement for small disks mini-mum disk bandwidth reservation istwice as good as mirroring in thethroughput achieved and fault tolerancecan be achieved with a minimal impacton the throughput
TOOLS
Summarized by Li Xiao
SIMPLE AND GENERAL STATISTICAL PROFILING
WITH PCT
Charles Blake and Steven Bauer MIT
This talk introduced a Profile CollectionToolkit (PCT) ndash a sampling-based CPUprofiling facility A novel aspect of PCTis that it allows sampling of semanticallyrich data such as function call stacksfunction parameters local or globalvariables CPU registers or other execu-tion contexts The design objectives ofPCT were driven by user needs and theinadequacies or inaccessibility of priorsystems
The rich data collection capability isachieved via a debugger-controller pro-gram dbct1 The talk then introducedthe profiling collector and data collec-
tion activation PCT file format optionsprovide flexibility and various ways tostore exact states PCT also has analysisoptions two examples were given ndashldquoSimplerdquo and ldquoMixed User and LinuxCoderdquo
The talk then went into how generalsampling helps in the following areas adebugger-controller state machineexample script fragments a simpleexample program and general valuereports in terms of call-path histogramsnumeric value histograms and data gen-erality Related work such as gprof andExpect was then summarized The con-tribution of this work is to provide aportable general value sampling toolThe limitations are that PCT does notsupport much of strip and count inac-curacies happen because of its statisticalnature Based on this limitation -g ispreferred over strip
Possible future directions included sup-porting more debuggers such as dbxand xdb script generators for otherkinds of traces more canned reports forgeneral values and a libgdb-basedlibrary sampler PCT is available athttppdoslcsmitedupct PCT runs onalmost any UNIX-like system
ENGINEERING A DIFFERENCING AND
COMPRESSION DATA FORMAT
David G Korn and Kiem Phong VoATampT Laboratories ndash Research
This talk began with an equation ldquoDif-ferencing + Compression = Delta Com-pressionrdquo Compression removesinformation redundancy in a data setExamples are gzip bzip compress andpack Differencing encodes differencesbetween two data sets Examples are diff -e fdelta bdiff and xdelta Deltacompression compresses a data set givenanother combining differencing andcompression and reduces to pure com-pression when therersquos no commonality
After presenting an overall scheme for adelta compressor and showing deltacompression performance this talk dis-cussed the encoding format of a newlydesigned Vcdiff for delta compression A
Vcdiff instruction code table consists of256 entries of each coding up to a pair ofinstructions and recodes I-byte indicesand any additional data
The talk then showed Vcdiff rsquos perfor-mance with Web data collected fromCNN compared with W3C standardGdiff Gdiff+gzip and diff+gzip whereGdiff was computed using Vcdiff deltainstructions The results of two experi-ments ldquoFirstrdquo and ldquoSuccessiverdquo werepresented In ldquoFirstrdquo each file is com-pressed against the first file collected inldquoSuccessiverdquo each file is compressedagainst the one in the previous hourThe diff+gzip did not work well becausediff was line-oriented Vcdiff performedfavorably compared with other formats
Vcdiff is part of the Vcodex packageVcodex is a platform for all commondata transformations delta compres-sion plain compression encryptiontranscoding (eg uuencode base64) Itis structured in three layers for maxi-mum usability Base library uses Dis-plines and Methods interfaces
The code for Vcdiff can be found athttpwwwresearchattcomswtoolsThey are moving Vcdiff to the IETFstandard as a comprehensive platformfor transforming data Please refer tohttpwwwietforginternet-draftdraft-korn-vcdiff06txt
WHERE IN THE NET
Summarized by Amit Purohit
A PRECISE AND EFFICIENT EVALUATION OF
THE PROXIMITY BETWEEN WEB CLIENTS AND
THEIR LOCAL DNS SERVERS
Zhuoqing Morley Mao University ofCalifornia Berkeley Charles D CranorFred Douglis Michael RabinovichOliver Spatscheck and Jia Wang ATampTLabs ndash Research
Content Distribution Networks (CDNs)attempt to improve Web performance bydelivering Web content to end usersfrom servers located at the edge of thenetwork When a Web client requestscontent the CDN dynamically chooses aserver to route the request to usually the
73October 2002 login
C
ON
FERE
NC
ERE
PORT
Sone that is nearest the client As CDNshave access only to the IP address of thelocal DNS server (LDNS) of the clientthe CDNrsquos authoritative DNS servermaps the clientrsquos LDNS to a geographicregion within a particular network andcombines that with network and serverload information to perform CDNserver selection
This method has two limitations First itis based on the implicit assumption thatclients are close to their LDNS This maynot always be valid Second a singlerequest from an LDNS can represent adiffering number of Web clients ndash calledthe hidden load factor Knowledge of thehidden load factor can be used toachieve better load distribution
The extent of the first limitation and itsimpact on the CDN server selection isdealt with To determine the associationsbetween clients and their LDNS a sim-ple non-intrusive and efficient map-ping technique was developed The datacollected was used to study the impact ofproximity on DNS-based server selec-tion using four different proximity met-rics (1) autonomous system (AS)clustering ndash observing whether a client isin the same AS as its LDNS ndash concludedthat LDNS is very good for coarse-grained server selection as 64 of theassociations belong to the same AS(2) network clustering ndash observingwhether a client is in the same network-aware cluster (NAC) ndash implied that DNSis less useful for finer-grained serverselection since only 16 of the clientand LDNS are in the same NAC (3)traceroute divergence ndash the length of thedivergent paths to the client and itsLDNS from a probe point using tracer-oute ndash implies that most clients aretopologically close to their LDNS asviewed from a randomly chosen probesite (4) round-trip time (RTT) correla-tion (some CDNs select severs based onRTT between the CDN server and theclientrsquos LDNS) examines the correlationbetween the message RTTs from a probepoint to the client and its local DNS
results imply this is a reasonable metricto use to avoid really distant servers
A study of the impact that client-LDNSassociations have on DNS-based serverselection concludes that knowing theclientrsquos IP address would allow moreaccurate server selection in a large num-ber of cases The optimality of the serverselection also depends on the serverdensity placement and selection algo-rithms
Further information can be found athttpwwweecsberkeleyedu~zmaomyresearchhtml or by contactingzmaoeecsberkeleyedu
GEOGRAPHIC PROPERTIES OF INTERNET
ROUTING
Lakshminarayanan SubramanianVenkata N Padmanabhan MicrosoftResearch Randy H Katz University ofCalifornia at Berkeley
Geographic information can provideinsights into the structure and function-ing of the Internet including interac-tions between different autonomoussystems by analyzing certain propertiesof Internet routing It can be used tomeasure and quantify certain routingproperties such as circuitous routinghot-potato routing and geographic faulttolerance
Traceroute has been used to gather therequired data and Geotrack tool hasbeen used to determine the location ofthe nodes along each network path Thisenables the computation of ldquolinearizeddistancesrdquo which is the sum of the geo-graphic distances between successivepairs of routers along the path
In order to measure the circuitousnessof a path a metric ldquodistance ratiordquo hasbeen defined as the ratio of the lin-earized distance of a path to the geo-graphic distance between the source anddestination of the path From the data ithas been observed that the circuitous-ness of a route depends on both geo-graphic and network location of the endhosts A large value of the distance ratioenables us to flag paths that are highly
USENIX 2002
circuitous possibly (though not neces-sarily) because of routing anomalies Itis also shown that the minimum delaybetween end hosts is strongly correlatedwith the linearized distance of the path
Geographic information can be used tostudy various aspects of wide-area Inter-net paths that traverse multiple ISPs Itwas found that end-to-end Internetpaths tend to be more circuitous thanintra-ISP paths the cause for this beingthe peering relationships between ISPsAlso paths traversing substantial dis-tances within two or more ISPs tend tobe more circuitous than paths largelytraversing only a single ISP Anotherfinding is that ISPs generally employhot-potato routing
Geographic information can also beused to capture the fact that two seem-ingly unrelated routers can be suscepti-ble to correlated failures By using thegeographic information of routers wecan construct a geographic topology ofan ISP Using this we can find the toler-ance of an ISPrsquos network to the total net-work failure in a geographic region
For further information contactlakmecsberkeleyedu
PROVIDING PROCESS ORIGIN INFORMATION
TO AID IN NETWORK TRACEBACK
Florian P Buchholz Purdue UniversityClay Shields Georgetown University
Network traceback is currently limitedbecause host audit systems do not main-tain enough information to matchincoming network traffic to outgoingnetwork traffic The talk presented analternative assigning origin informationto every process and logging it duringinteractive login creation
The current implementation concen-trates mainly on interactive sessions inwhich an event is logged when a newconnection is established using SSH orTelnet The method is effective andcould successfully determine steppingstones and reliably detect the source of aDDoS attack The speaker then talkedabout the related work done in this area
74 Vol 27 No 5 login
which led to discussion of the latestdevelopment in the area of ldquohostcausalityrdquo
The speaker ended with the limitationsand future directions of the frameworkThis framework could be extended andcould find many applications in the areaof security Current implementationdoesnrsquot take care of startup scripts andcron jobs but incorporating the origininformation in FS could solve this prob-lem In the current implementation log-ging is just implemented as a proof ofconcept It could be made safe in manyways and this could be another impor-tant aspect of future work
PROGRAMMING
Summarized by Amit Purohit
CYCLONE A SAFE DIALECT OF C
Trevor Jim ATampT Labs ndash Research GregMorrisett Dan Grossman MichaelHicks James Cheney Yanling WangCornell University
Cyclone is designed to provide protec-tion against attacks such as buffer over-flows format string attacks andmemory management errors The cur-rent structure of C allows programmersto write vulnerable programs Cycloneextends C so that it has the safety guar-antee of Java while keeping C syntaxtypes and semantics untouched
The Cyclone compiler performs staticanalysis of the source code and insertsruntime checks into the compiled out-put at places where the analysis cannotdetermine that the code execution willnot violate safety constraints Cycloneimposes many restrictions to preservesafety such as NULL checksThesechecks do not exist in normal C
The speaker then talked about somesample code written in Cyclone and howit tackles safety problems Porting anexisting C application to Cyclone ispretty easy with fewer than 10 changerequired The current implementationconcentrates more on safety than onperformance ndash hence the performance
penalty is substantial Cyclone was ableto find many lingering bugs The finalstep of the project will be to write acompiler to convert a normal C programto an equivalent Cyclone program
COOPERATIVE TASK MANAGEMENT WITHOUT
MANUAL STACK MANAGEMENT
Atul Adya Jon Howell MarvinTheimer William J Bolosky John RDouceur Microsoft Research
The speaker described the definitionsand motivations behind the distinctconcepts of task management stackmanagement IO management conflictmanagement and data partitioningConventional concurrent programminguses preemptive task management andexploits the automatic stack manage-ment of a standard language In the sec-ond approach cooperative tasks areorganized as event handlers that yieldcontrol by returning control to the eventscheduler manually unrolling theirstacks In this project they have adopteda hybrid approach that makes it possiblefor both stack management styles tocoexist Thus programmers can codeassuming one or the other of these stackmanagement styles will operate depend-ing upon the application The speakeralso gave a detailed example of how touse their mechanism to switch betweenthe two styles
The talk then continued with the imple-mentation They were able to preemptmany subtle concurrency problems byusing cooperative task managementPaying a cost up-front to reduce a subtlerace condition proved to be a goodinvestment
Though the choice of task managementis fundamental the choice of stack man-agement can be left to individual tasteThis project enables use of any type ofstack management in conjunction withcooperative task management
IMPROVING WAIT-FREE ALGORITHMS FOR
INTERPROCESS COMMUNICATION IN
EMBEDDED REAL-TIME SYSTEMS
Hai Huang Padmanabhan Pillai KangG Shin University of Michigan
The main characteristic of the real-timetime-sensitive system is its pre-dictable response time But concurrencymanagement creates hurdles to achiev-ing this goal because of the use of locksto maintain consistency To solve thisproblem many wait-free algorithmshave been developed but these are typi-cally high-cost solutions By takingadvantage of the temporal characteris-tics of the system however the time andspace overhead can be reduced
The speaker presented an algorithm fortemporal concurrency control andapplied this technique to improve threewait-free algorithms A single-writermultiple-reader wait-free algorithm anda double-buffer algorithm were pro-posed
Using their transformation mechanismthey achieved an improvement of17ndash66 in ACET and a 14ndash70 reduc-tion in memory requirements for IPCalgorithms This mechanism is extensi-ble and can be applied to other non-blocking IPC algorithms as well Futurework involves reducing the synchroniz-ing overheads in more general IPC algo-rithms with multiple-writer semantics
MOBILITY
Summarized by Praveen Yalagandula
ROBUST POSITIONING ALGORITHMS FOR
DISTRIBUTED AD-HOC WIRELESS SENSOR
NETWORKS
Chris Savarese Jan Rabaey Universityof California Berkeley Koen Langen-doen Delft University of Technology
The Pico Radio Network comprisesmore than a hundred sensors monitorsand actuators equipped with wirelessconnectivity with the following proper-ties (1) no infrastructure (2) computa-tion in a distributed fashion instead ofcentralized computation (3) dynamictopology (4) limited radio range
75October 2002 login
C
ON
FERE
NC
ERE
PORT
S(5) nodes that can act as repeaters(6) devices of low cost and with lowpower and (7) sparse anchor nodes ndashnodes with GPS capability A positioningalgorithm determines the geographicalposition of each node in the network
In two-dimensional space each nodeneeds three reference positions to esti-mate its geographical position There aretwo problems that make positioning dif-ficult in the Pico Radio Network typesetting (1) a sparse anchor node prob-lem and (2) a range error problem
The two-phase approach that theauthors have taken in solving the prob-lem consists of (1) a hop-terrain algo-rithm ndash in this first phase each noderoughly guesses its location by the dis-tance calculated using multi-hops to theanchor points and (2) a refinementalgorithm ndash in this second phase eachnode uses its neighborsrsquo positions torefine its own position estimate Toguarantee the convergence thisapproach uses confidence metricsassigning a value of 10 for anchor nodesand 01 to start for other nodes andincreasing these with each iteration
A simulation tool OMNet++ was usedfor both phases and in various scenariosThe results show that they achievedposition errors of less than 33 in a sce-nario with 5 anchor nodes an averageconnectivity of 7 and 5 range mea-surement error
Guidelines for anchor node deploymentare high connectivity (gt10) a reason-able fraction of anchor nodes (gt 5)and for anchor placement coverededges
APPLICATION-SPECIFIC NETWORK
MANAGEMENT FOR ENERGY-AWARE
STREAMING OF POPULAR MULTIMEDIA
FORMATS
Surendar Chandra University of Geor-gia and Amin Vahdat Duke University
The main hindrance in supporting theincreasing demand for mobile multime-dia on PDA is the battery capacity of the
small devices The idle-time power con-sumption is almost the same as thereceive power consumption on typicalwireless network cards (1319mW vs1425mW) while the sleep state con-sumes far less power (177mW) To letthe system consume energy proportionalto the stream quality the network cardshould transition to sleep state aggres-sively between each packet
The speaker presented previous workwhere different multimedia streamswere studied for PDAs MS Media RealMedia and Quicktime It was found thatif the inter-packet gap is predictablethen huge savings are possible forexample about 80 savings in MSmedia streams with few packet lossesThe limitation of the IEEE 80211bpower-save mode comes into play whenthere are two or more nodes producinga delay between beacon time and whenthe node receivessends packets Thisbadly affects the streams since higherenergy is consumed while waiting
The authors propose traffic shaping forenergy conservation where this is doneby proxy such that packets arrive at reg-ular intervals This is achieved using alocal proxy in the access point and aclient-side proxy in the mobile hostSimulations show that traffic shapingreduces energy consumption and alsoreveals that the higher bandwidthstreams have a lower energy metric(mJkB)
More information is at httpgreenhousecsugaedu
CHARACTERIZING ALERT AND BROWSE SER-
VICES OF MOBILE CLIENTS
Atul Adya Paramvir Bahl and Lili QiuMicrosoft Research
Even though there is a dramatic increasein Internet access from wireless devicesthere are not many studies done oncharacterizing this traffic In this paperthe authors characterize the trafficobserved on the MSN Web site withboth notification and browse traces
USENIX 2002
Around 33 million browsing accessesand about 325 million notificationentries are present in the traces Threetypes of analyses are done for each oneof these two services content analysisconcerning the most popular contentcategories and their distribution popu-larity analysis or the popularity distri-bution of documents and user behavioranalysis
The analysis of the notification logsshows that document access rates followa Zipf-like distribution with most of theaccesses concentrated on a small num-ber of messages and the accesses exhibitgeographical locality ndash users from samelocality tend to receive similar notifica-tion content The browser log analysisshows that a smaller set of URLs areaccessed most times though the accesspattern does not fit any Zipf curve andthe highly accessed URLS remain stableA correlation study between the notifi-cation and browsing services shows thatwireless users have a moderate correla-tion of 012
FREENIX TRACK SESSIONS
BUILDING APPLICATIONS
Summarized by Matt Butner
INTERACTIVE 3-D GRAPHICS APPLICATIONS
FOR TCL
Oliver Kersting Juumlrgen Doumlllner HassoPlattner Institute for Software SystemsEngineering University of Potsdam
The integration of 3-D image renderingfunctionality into a scripting languagepermits interactive and animated 3-Ddevelopment and application withoutthe formalities and precision demandedby low-level CC++ graphics and visual-ization libraries The large and complexC++ API of the Virtual Rendering Sys-tem (VRS) can be combined with theconveniences of the Tcl scripting lan-guage The mapping of class interfaces isdone via an automated process and gen-erates respective wrapper classes all ofwhich ensures complete API accessibilityand functionality without any signifi-
76 Vol 27 No 5 login
cant performance retribution The gen-eral C++-to-Tcl mapper SWIG grantsthe necessary object and hierarchal abili-ties without the use of object-orientedTcl extensions The final API mappingtechniques address C++ features such asclasses overloaded methods enumera-tions and inheritance relations All areimplemented in a proof-of-concept thatmaps the complete C++ API of the VRSto Tcl and are showcased in a completeinteractive 3-D-map system
THE AGFL GRAMMAR WORK LAB
Cornelis HA Coster Erik VerbruggenUniversity of Nijmegen (KUN)
The growth and implementation of Nat-ural Language Processing (NLP) is thecornerstone of the continued evolutionand implementation of truly intelligentsearch machines and services In partdue to the growing collections of com-puter-stored human-readable docu-ments in the public domain theimplementation of linguistic analysiswill become necessary for desirable pre-cision and resolution Subsequentlysuch tools and linguistic resources mustbe released into the public domain andthey have done so with the release of theAGFL Grammar Work Lab under theGNU Public License as a tool for lin-guistic research and the development ofNLP-based applications
The AGFL (Affix Grammars over aFinite Lattice) Grammar Work Labmeshes context-free grammars withfinite set-valued features that are accept-able to a range of languages In com-puter science terms ldquoSyntax rules areprocedures with parameters and a non-deterministic executionrdquo The EnglishPhrases for Information Retrieval(EP4IR) released with the AGFL-GWLas a robust grammar of English is anAGFL-GWL generated English parserthat outputs ldquoHeadModifiedrdquo framesThe sentences ldquoCompanyX sponsoredthis conferencerdquo and ldquoThis conferencewas sponsored by CompanyXrdquo both gen-erate [CompanyX[sponsored confer-
ence]] The hope is that public availabil-ity of such tools will encourage furtherdevelopment of grammar and lexicalsoftware systems
SWILL A SIMPLE EMBEDDED WEB SERVER
LIBRARY
Sotiria Lampoudi David M BeazleyUniversity of Chicago
This paper won the FREENIX Best Stu-dent Paper award
SWILL (Simple Web Interface LinkLibrary) is a simple Web server in theform of a C library whose developmentwas motivated by a wish to give coolapplications an interface to the Web TheSWILL library provides a simple inter-face that can be efficiently implementedfor tasks that vary from flexible Web-based monitoring to software debuggingand diagnostics Though originallydesigned to be integrated with high-per-formance scientific simulation softwarethe interface is generic enough to allowfor unbounded uses SWILL is a single-threaded server relying upon non-blocking IO through the creation of atemporary server which services IOrequests SWILL does not provide SSL orcryptographic authentication but doeshave HTTP authentication abilities
A fantastic feature of SWILL is its sup-port for SPMD-style parallel applica-tions which utilize MPI provingvaluable for Beowulf clusters and largeparallel machines Another practicalapplication was the implementation ofSWILL in a modified Yalnix emulator byUniversity of Chicago Operating Sys-tems courses which utilized the addedYalnix functionality for OS developmentand debugging SWILL requires minimalmemory overhead and relies upon theHTTP10 protocol
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
hesitant to employ security measuresScience is doing well but one cannot seethe benefits in the real world Anincreasing number of users means moreproblems affecting more people Oldproblems such as buffer overflowshavenrsquot gone away and new problemsshow up Furthermore the amount ofexpertise needed to launch attacks isdecreasing
Schneier argues that complexity is theenemy of security and that while secu-rity is getting better the growth in com-plexity is outpacing it As security isfundamentally a people problem oneshouldnrsquot focus on technologies butrather should look at businesses busi-ness motivations and business costsTraditionally one can distinguishbetween two security models threatavoidance where security is absoluteand risk management where security isrelative and one has to mitigate the riskwith technologies and procedures In thelatter model one has to find a pointwith reasonable security at an acceptablecost
After a brief discussion on how securityis handled in businesses today in whichhe concluded that businesses ldquotalk bigabout it but do as little as possiblerdquoSchneier identified four necessary stepsto fix the problems
First enforce liabilities Today no realconsequences have to be feared fromsecurity incidents Holding peopleaccountable will increase the trans-parency of security and give an incentiveto make processes public As possibleoptions to achieve this he listed indus-try-defined standards federal regula-tion and lawsuits The problems withenforcement however lie in difficultiesassociated with the international natureof the problem and the fact that com-plexity makes assigning liabilities diffi-cult Furthermore fear of liability couldhave a chilling effect on free softwaredevelopment and could stifle new com-panies
67October 2002 login
C
ON
FERE
NC
ERE
PORT
SAs a second step Schneier identified theneed to allow partners to transfer liabil-ity Insurance would spread liability riskamong a group and would be a CEOrsquosprimary risk analysis tool There is aneed for standardized risk models andprotection profiles and for more securi-ties as opposed to empty press releasesSchneier predicts that insurance will cre-ate a marketplace for security where cus-tomers and vendors will have the abilityto accept liabilities from each otherComputer-security insurance shouldsoon be as common as household or fireinsurance and from that developmentinsurance will become the driving factorof the security business
The next step is to provide mechanismsto reduce risks both before and aftersoftware is released techniques andprocesses to improve software quality aswell as an evolution of security manage-ment are therefore needed Currentlysecurity products try to rebuild the wallsndash such as physical badges and entrydoorways ndash that were lost when gettingconnected Schneier believes thisldquofortress metaphorrdquo is bad one shouldthink of the problem more in terms of acity Since most businesses cannot affordproper security outsourcing is the onlyway to make security scalable With out-sourcing the concenpt of best practicesbecomes important and insurance com-panies can be tied to them outsourcingwill level the playing field
As a final step Schneier predicts thatrational prosecution and education willlead to deterrence He claims that peoplefeel safe because they live in a lawfulsociety whereas the Internet is classifiedas lawless very much like the AmericanWest in the 1800s or a society ruled bywarlords This is because it is difficult toprove who an attacker is prosecution ishampered by complicated evidencegathering and irrational prosecutionSchneier believes however that educa-tion will play a major role in turning theInternet into a lawful society Specifi-cally he pointed out that we need lawsthat can be explained
Risks wonrsquot go away the best we can dois manage them A company able tomanage risk better will be more prof-itable and we need to give CEOs thenecessary risk-management tools
There were numerous questions Listedbelow are the more complicated ones ina paraphrased QampA format
Q Will liability be effective will insur-ance companies be willing to accept therisks A The government might have tostep in it needs to be seen how it playsout
Q Is there an analogy to the real worldin the fact that in a lawless society onlythe rich can afford security Securitysolutions differentiate between classesA Schneier disagreed with the state-ment giving an analogy to front-doorlocks but conceded that there might bespecial cases
Q Doesnrsquot homogeneity hurt securityA Homogeneity is oversold Diversetypes can be more survivable but giventhe limited number of options the dif-ference will be negligible
Q Regulations in airbag protection haveled to deaths in some cases How can wekeep the pendulum from swinging theother way A Lobbying will not be pre-vented An imperfect solution is proba-ble there might be reversals ofrequirements such as the airbag one
Q What about personal liability A Thiswill be analogous to auto insurance lia-bility comes with computerNet access
Q If the rule of law is to become realitythere must be a law enforcement func-tion that applies to a physical space Youcannot do that with any existing govern-ment agency for the whole worldShould an organization like the UNassume that role A Schneier was notconvinced global enforcement is possi-ble
Q What advice should we take awayfrom this talk A Liability is comingSince the network is important to ourinfrastructure eventually the problems
USENIX 2002
will be solved in a legal environmentYou need to start thinking about how tosolve the problems and how the solu-tions will affect us
The entire speech (including slides) isavailable at httpwwwcounterpanecompresentation4pdf
LIFE IN AN OPEN SOURCE STARTUP
Daryll Strauss Consultant
Summarized by Teri Lampoudi
This talk was packed with morsels ofinsight and tidbits of information aboutwhat life in an open source startup islike (though ldquowas likerdquo might be moreappropriate) what the issues are instarting maintaining and commercial-izing an open source project and theway hardware vendors treat open sourcedevelopers Strauss began by tracing thetimeline of his involvement with devel-oping 3-D graphics support for Linuxfrom the 3dfx voodoo1 driver for Linuxin 1997 to the establishment of Preci-sion Insight in 1998 to its buyout by VALinux to the dismantling of his groupby VA to what the future may hold
Obviously there are benefits from doingopen source development both for thedeveloper and for the end user Animportant point was that open sourcedevelops entire technologies not justproducts A misapprehension is theinherent difficulty of managing a projectthat accepts code from a large number ofdevelopers some volunteer and somepaid The notion that code can just bethrown over the wall into the world is aBig Lie
How does one start and keep afloat anopen source development company Inthe case of Precision Insight the subjectmatter ndash developing graphics andOpenGL for Linux ndash required a consid-erable amount of expertise intimateknowledge of the hardware the librariesX and the Linux kernel as well as theend applications Expertise is mar-ketable What helps even further is hav-ing a visible virtual team of expertspeople who have an established trackrecord of contributions to open source
68 Vol 27 No 5 login
in the particular area of expertise Sup-port from the hardware and softwarevendors came next While everyone waswilling to contract PI to write the por-tions of code that were specific to theirhardware no one wanted to pay the billfor developing the underlying founda-tion that was necessary for the drivers tobe useful In the PI model the infra-structure cost was shared by several cus-tomers at once and the technology keptmoving Once a driver is written for avendor the vendor is perfectly capableof figuring out how to write the nextdriver necessary thereby obviating theneed to contract the job This and ques-tions of protecting intellectual propertyndash hardware design in this case ndash aredeeply problematic with the open sourcemode of development This is not to saythat there is no way to get over thembut that they are likely to arise moreoften than not
Management issues seem equally impor-tant In the PI setting contracts wereflexible ndash both a good and a bad thing ndashand developers overworked Furtherdevelopment ldquoin a fishbowlrdquo that isunder public scrutiny is not easyStrauss stressed the value of good com-munication and the use of various out-of-sync communication methods likeIRC and mailing lists
Finally Strauss discussed what portionsof software can be free and what can beproprietary His suggestion was that hor-izontal markets want to be free whereasvertically one can develop proprietarysolutions The talk closed with a smallstroll through the problems that projectpolitics brings up from things likemerging branches to mailing list andsource tree readability
GENERAL TRACK SESSIONS
FILE SYSTEMS
Summarized by Haijin Yan
STRUCTURE AND PERFORMANCE OF THE
DIRECT ACCESS FILE SYSTEM
Kostas Magoutis Salimah AddetiaAlexandra Fedorova and Margo ISeltzer Harvard University JeffreyChase Andrew Gallatin Richard Kisleyand Rajiv Wickremesinghe Duke Uni-versity and Eran Gabber Lucent Tech-nologies
This paper won the Best Paper award
The Direct Access File System (DAFS) isa new standard for network-attachedstorage over direct-access transport net-works DAFS takes advantage of user-level network interface standards thatenable client-side functionality in whichremote data access resides in a libraryrather than in the kernel This reducesthe overhead of memory copy for datamovement and protocol overheadRemote Direct Memory Access (RDMA)is a direct access transport network thatallows the network adapter to reducecopy overhead by accessing applicationbuffers directly
This paper explores the fundamentalstructural and performance characteris-tics of network file access using a user-level file system structure on adirect-access transport network withRDMA It describes DAFS-based clientand server reference implementationsfor FreeBSD and reports experimentalresults comparing DAFS to a zero-copyNFS implementation It illustrates thebenefits and trade-offs of these tech-niques to provide a basis for informedchoices about deployment of DAFS-based systems and similar extensions toother network file protocols such asNFS Experiments show that DAFS givesapplications direct control over an IOsystem and increases the client CPUrsquosusage while the client is doing IOFuture work includes how to addresslongstanding problems related to theintegration of the application and filesystem for high-performance applica-tions
CONQUEST BETTER PERFORMANCE THROUGH
A DISKPERSISTENT-RAM HYBRID FILE
SYSTEM
An-I A Wang Peter Reiher and GeraldJ Popek UCLA Geoffrey M Kuenning Harvey Mudd College
Motivated by the declining cost of per-sistent RAM the authors propose theConquest file system which stores allsmall files metadata executables andshared libraries in persistent RAM diskshold only the data content of remaininglarge files Compared to alternativessuch as caching and RAM file systemsConquest has the advantages of effi-ciency consistency and reliability at areduced cost Using popular bench-marks experiments show that Conquestincurs little overhead while achievingfaster performance Future workincludes designing mechanisms foradjusting file-size threshold dynamicallyand finding a better disk layout for largedata blocks
EXPLOITING GRAY-BOX KNOWLEDGE OF
BUFFER-CACHE MANAGEMENT
Nathan C Burnett John Bent AndreaC Arpaci-Dusseau and Remzi HArpaci-Dusseau University of Wisconsin Madison
Knowing what algorithm is used tomanage the buffer cache is very impor-tant for improving application perfor-mance However there is currently nointerface for finding this algorithm Thispaper introduces Dust a simple finger-printing tool that is able to identify thebuffer-cache replacement policy Dustautomatically identifies the cache sizeand replacement policy based on theconfiguring attributes of access ordersrecency frequency and long-term his-tory Through simulation Dust was ableto distinguish between a variety ofreplacement algorithm policies found inthe literature FIFO LRU LFU ClockSegmented FIFO 2Qand LRU-K Fur-ther experiments of fingerprinting realoperating system such as NetBSDLinux and Solaris show that Dust is ableto identify the hidden cache replacementalgorithm
69October 2002 login
C
ON
FERE
NC
ERE
PORT
SBy knowing the underlying cachereplacement policy a cache-aware Webserver can reschedule the requests on acached request-first policy to obtain per-formance improvement Experimentsshow that cache-aware schedulingimproves average response time and sys-tem throughput
OPERATING SYSTEMS
(AND DANCING BEARS)
Summarized by Matt Butner
THE JX OPERATING SYSTEM
Michael Golm Meik Felser ChristianWawersich and Juumlrgen Kleinoumlder University of Erlagen-Nuumlrnberg
The talk opened by emphasizing theneed for and the practicality of a func-tional Java OS Such implementation inthe JX OS attempts to mimic the recenttrend toward using highly abstractedlanguages in application development inorder to create an OS as functional andpowerful as one written in a lower-levellanguage
The JX is a micro-kernel solution thatuses separate JVMs for each entity of thekernel and in some cases for eachapplication The separated domains donot share objects and have no threadmigration and each domain implementsits own code The interesting part of thepresentation was the discussion of sys-tem-level programming with Java keyareas discussed were memory manage-ment and interrupt handlers Theauthors concluded by noting that theirtype-safe and modular system resultedin a robust system with great configura-tion flexibility and acceptable perfor-mance
DESIGN EVOLUTION OF THE EROS
SINGLE-LEVEL STORE
Jonathan S Shapiro Johns HopkinsUniversity Jonathan Adams Universityof Pennsylvania
This presentation outlined current char-acteristics of file systems and some oftheir least desirable characteristics Itwas based on the revival of the EROS
single-level-store approach for currentattempts to capitalize on system consis-tency and efficiency Their solution didnot relate directly to EROS but ratherto the design and use of the constructsand notions on which EROS is based Byextending the mapping of the memorysystem to include the disk systems arefurther able to ensure global persistencewithout regard for a disk structure Inexplaining the need for absolute systemconsistency and an environment whichby definition is not allowed to crash thegoal of having such an exhaustive designbecomes clear The cool part of thiswork is the availability of its design andarchitecture to the public
THINK A SOFTWARE FRAMEWORK FOR
COMPONENT-BASED OPERATING SYSTEM
KERNELS
Jean-Philippe Fassino France TeacuteleacutecomRampD Jean-Bernard Stefani INRIA JuliaLawall DIKU Gilles Muller INRIA
This presentation discussed the need forcomponent-based operating systemsand respective structures to ensure flexi-bility for arbitrary-sized systems Thinkprovides a binding model that mapsuniformed components for OS develop-ers and architects to follow ensuringconsistent implementations for an arbi-trary system size However the goal isnot to force developers into a predefinedkernel but to promote the use of certaincomponents in varied ways
Thinkrsquos concentration is primarily onembedded systems where short devel-opment time is necessary but is con-strained by rigorous needs and limitedresources The need to build flexible sys-tems in such an environment can becostly and implementation-specific butThink creates an environment sup-ported by the ability to dynamically loadtype-safe components This allows formore flexible systems that retain func-tionality because of Thinkrsquos ability tobind fine-grained components Bench-marks revealed that dedicated micro-kernels can show performanceimprovements in comparison to mono-
USENIX 2002
lithic kernels Notable future workincludes building components for low-end appliances and the development of areal-time OS component library
BUILDING SERVICES
Summarized by Pradipta De
NINJA A FRAMEWORK FOR NETWORK
SERVICES
J Robert von Behren Eric A BrewerNikita Borisov Michael Chen MattWelsh Josh MacDonald Jeremy Lauand David Culler University of Califor-nia Berkeley
Robert von Behren presented Ninja anongoing project that aims to provide aframework for building robust and scal-able Internet-based network serviceslike Web-hosting instant-messagingemail and file-sharing applicationsRobert drew attention to the difficultiesof writing cluster applications One hasto take care of data consistency andissues of concurrency and for robustapplications there are problems relatedto fault tolerance Ninja works as awrapper to relieve the user of theseproblems The goal of the project isbuilding network services that are scala-ble and highly available maintainingpersistent data providing gracefuldegradation and supporting online evo-lution
The use of clusters distinguishes thissetup from generic distributed systemsin terms of reliability and security aswell as providing a partition-free net-work The next important feature of thisproject is the new programming modelwhich is more restrictive than a generalprogramming model but still expressiveenough to write most of the applica-tions This model described as ldquosingleprogram multiple connectionrdquo usesintertask parallelism instead of multi-threaded concurrency and is character-ized by all nodes running the sameprogram with connections delegated tothe nodes by a centralized connectionmanager (CM) A CM takes care of hid-ing the details of mapping an externalconnection to an internal node Ninja
70 Vol 27 No 5 login
also uses a design called ldquostaged event-driven architecturerdquo where each serviceis broken down into a set of stages con-nected by event queues This architec-ture is suitable for a modular design andhelps in graceful degradation by adap-tive load shedding from the eventqueues
The presentation concluded with exam-ples of implementation and evaluationof a Web server and an email systemcalled NinjaMail to show the ease ofauthoring and the efficacy of using theNinja framework for service develop-ment
USING COHORT-SCHEDULING TO ENHANCE
SERVER PERFORMANCE
James R Larus and Michael ParkesMicrosoft Research
James Larus presented a new schedulingpolicy for increasing the throughput ofserver applications Cohort schedulingbatches execution of similar operationsarising in different server requests Theusual programming paradigm in han-dling server requests is to spawn multi-ple concurrent threads and switch fromone thread to another whenever a threadgets blocked for IO Since the threads ina server mostly execute unrelated piecesof code the locality of reference isreduced hence the effectiveness of dif-ferent caching mechanisms One way tosolve this problem is to throw in morehardware But Larus presented a com-plementary view where the programbehavior is investigated and used toimprove the performance
The problem in this scheme is to iden-tify pieces of code that can be batchedtogether for processing One simple wayis to look at the next program countervalues and use them to club threadstogether However this talk presented anew programming abstraction ldquostagedcomputationrdquo which replaces the threadmodel with ldquostagesrdquo A stage is anabstraction for operations with similarbehavior and locality The StagedServerlibrary can be used for programming inthis model It is a collection of C++
classes that implement staged computa-tion and cohort scheduling on either auniprocessor or multiprocessor It canbe used to define stages
The presentation showed the experi-mental evaluation of the cohort schedul-ing over the thread-based model byimplementing two servers a Web serverwhich is mainly IO bound and a pub-lish-subscribe server which is mainlycompute bound The SURGE bench-mark was used for the first experimentand the Fabret workload for the secondThe results showed that cohort-schedul-ing-based implementation gave a betterthroughput than the thread-basedimplementation at high loads
NETWORK PERFORMANCE
Summarized by Xiaobo Fan
ETE PASSIVE END-TO-END INTERNET SERVICE
PERFORMANCE MONITORING
Yun Fu Amin Vahdat Duke UniversityLudmila Cherkasova Wenting Tang HPLabs
This paper won the Best Student Paperaward
Ludmila Cherkasova began by listingseveral questions most Web serviceproviders want answered in order toimprove service quality She reviewedthe difficulties of making accurate andefficient end-to-end Web service mea-surement and the shortcomings of cur-rently available approaches Theypropose a passive trace-based architec-ture called EtE to monitor Web serverperformance on behalf of end users
The first step is to collect network pack-ets passively The second module recon-structs all TCP connections and extractsHTTP transactions To reconstruct Webpage accesses they first build a knowl-edge base indexed by client IP and URLand then group objects to the relatedWeb pages they are embedded in Statis-tical analysis is used to handle non-matched objects EtE Monitor cangenerate three groups of metrics to mea-sure Web service performance responsetime Web caching and page abortion
To demonstrate the benefits of EtE mon-itor Cherkasova talked about two casestudies and based on the calculatedmetrics gave some insightful explana-tions about whatrsquos happening behind the variations of Web performanceThrough validation they claim theirapproach provides a very close approxi-mation to the real scenario
THE PERFORMANCE OF REMOTE DISPLAY
MECHANISMS FOR THIN-CLIENT COMPUTING
S Jae Yang Jason Nieh Matt SelskyNikhil Tiwari Columbia University
Noting the trend toward thin-clientcomputing the authors compared dif-ferent techniques and design choices inmeasuring the performance of six popu-lar thin-client platforms ndash CitrixMetaFrame Microsoft Terminal Ser-vices Sun Ray Tarantella VNC and XAfter pointing out several challenges inbenchmarking thin clients Yang pro-posed slow-motion benchmarking toachieve non-invasive packet monitoringand consistent visual quality Basicallythey insert delays between separatevisual events in the benchmark applica-tions of the server side so that theclientrsquos display update can catch up withthe serverrsquos processing speed The exper-iments are conducted on an emulatednetwork over a range of network band-widths
Their results show that thin clients canprovide good performance for Webapplications in LAN environments butonly some platforms performed well forvideo benchmark Pixel-based encodingmay achieve better performance andbandwidth efficiency than high-levelgraphics Display caching and compres-sion should be used with care
A MECHANISM FOR TCP-FRIENDLY
TRANSPORT-LEVEL PROTOCOL
COORDINATION
David E Ott and Ketan Mayer-PatelUniversity of North Carolina ChapelHill
A revised transport-level protocol opti-mized for cluster-to-cluster (C-to-C)
71October 2002 login
C
ON
FERE
NC
ERE
PORT
Sapplications is what David Ott tries toexplain in his talk A C-to-C applicationclass is identified as one set of processescommunicating to another set ofprocesses across a common Internetpath The fundamental problem with C-to-C applications is how to coordinateall C-to-C communication flows so thatthey share a consistent view of the com-mon C-to-C network adapt to changingnetwork conditions and cooperate tomeet specific requirements Aggregationpoints (AP) are placed at the first andlast hop of the common data path toprobe network conditions (latencybandwidth loss rate etc) To carry andtransfer this information a new protocolndash Coordination Protocol (CP) ndash isinserted between the network layer (IP)and the transport layer (TCP UCP etc)Ott illustrated how the CP header isupdated and used when packets origi-nate from source and traverse throughlocal and remote AP to arrive at theirdestination and how AP maintains aper-cluster state table and detects net-work conditions Through simulationresults this coordination mechanismappears effective in sharing commonnetwork resources among C-to-C com-munication flows
STORAGE SYSTEMS
Summarized by Praveen Yalagandula
MY CACHE OR YOURS MAKING STORAGE
MORE EXCLUSIVE
Theodore M Wong Carnegie MellonUniversity John Wilkes HP Labs
Theodore Wong explained the ineffi-ciency of current ldquoinclusiverdquo cachingschemes in storage area networks ndashwhen a client accesses a block the blockis read from the disk and is cached atboth the disk array cache and at theclientrsquos cache Then he presented theconcept of ldquoexclusiverdquo caching where ablock accessed by a client is only cachedin that clientrsquos cache On eviction fromthe clientrsquos cache they have come upwith a DEMOTE operation to move thedata block to the tail of the array cache
The new exclusion caching schemes areevaluated on both single-client and mul-tiple-client systems The single-clientresults are presented for two differenttypes of synthetic workloads Random(transaction-processing type workloads)and Zipf (typical Web workloads) Theexclusive policy was quite effective inachieving higher hit rates and lower readlatencies They also showed a 22 timeshit-rate improvement over inclusivetechniques in the case of a real-lifeworkload httpd with a single-client set-ting
The DEMOTE scheme also performedwell in the case of multiple-client sys-tems when the data accessed by clients isdisjointed In the case of shared dataworkloads this scheme performed worsethan the inclusive schemes Theodorethen presented an adaptive exclusivecaching scheme ndash a block accessed by aclient is placed at the tail in the clientrsquoscache and also in the array cache at anappropriate place determined by thepopularity of the block The popularityof the block is measured by maintaininga ghost cache to accumulate the numberof times each block is accessed This newadaptive scheme has achieved a maxi-mum of 52 speedup in the meanlatency in the experiments with real-lifeworkloads
For more information visit httpwwwcscmuedu~tmwongresearch and httpwwwhplhpcomresearchitccslssp
BRIDGING THE INFORMATION GAP IN
STORAGE PROTOCOL STACKS
Timothy E Denehy Andrea C Arpaci-Dusseau Remzi H Arpaci-DusseauUniversity of Wisconsin Madison
Currently there is a huge informationgap between storage systems and file sys-tems The interface exposed by storagesystems to file systems based on blocksand providing only readwrite interfacesis very narrow This leads to poor per-formance as a whole because of dupli-cated functionality in both systems and
USENIX 2002
reduced functionality resulting fromstorage systemsrsquo lack of file information
The speaker presented two enhance-ments ExRaid an exposed RAID andILFS Informed Log-Structured File Sys-tem ExRAID is an enhancement to theblock-based RAID storage system thatexposes the following three types ofinformation to the file system (1)regions ndash contiguous portions of theaddress space comprising one or multi-ple disks (2) performance informationabout queue lengths and throughput ofthe regions revealing disk heterogeneityto the file systems and (3) failure infor-mation ndash dynamically updated informa-tion conveying the number of tolerablefailures in each region
ILFS allows online incremental expan-sion of the storage space performsdynamic parallelism using ExRAIDrsquosperformance information to performwell on heterogeneous storage systemsand provides a range of different mecha-nisms with different granularities forredundancy of files using ExRAIDrsquosregion failure characteristics These newfeatures are added to LFS with only a19 increase in the code size
More informationhttpwwwcswisceduwind
MAXIMIZING THROUGHPUT IN REPLICATED
DISK STRIPING OF VARIABLE
BIT-RATE STREAMS
Stergios V Anastasiadis Duke Univer-sity Kenneth C Sevcik and MichaelStumm University of Toronto
There is an increasing demand for thecontinuous real-time streaming ofmedia files Disk striping is a commontechnique used for supporting manyconnections on the media servers Buteven with 12 million hours of meantime between failures there will be morethan one disk failure per week with 7000disks Fault tolerance can be achieved byusing data replication and reservingextra bandwidth during normal opera-tion This work focused on supportingthe most common variable bit-ratestream formats (eg MPEG)
72 Vol 27 No 5 login
The authors used prototype mediaserver EXEDRA to implement and eval-uate different replication and bandwidthreservation schemes This system sup-ports a variable bit-rate scheme doesstride-based disk allocation and is capa-ble of supporting variable-grain diskstriping Two replication policies arepresented deterministic ndash data blocksare replicated in a round-robin fashionon the secondary replicas random ndashdata blocks are replicated on the ran-domly chosen secondary replicas Twobandwidth reservation techniques arealso presented mirroring reservation ndashdisk bandwidth is reserved for both primary and replicas of the media fileduring the playback and minimumreservation ndash a more efficient scheme inwhich bandwidth is reserved only for thesum of primary data access time and themaximum of the backup data accesstimes required in each round
Experimental results showed that deter-ministic replica placement is better thanrandom placement for small disks mini-mum disk bandwidth reservation istwice as good as mirroring in thethroughput achieved and fault tolerancecan be achieved with a minimal impacton the throughput
TOOLS
Summarized by Li Xiao
SIMPLE AND GENERAL STATISTICAL PROFILING
WITH PCT
Charles Blake and Steven Bauer MIT
This talk introduced a Profile CollectionToolkit (PCT) ndash a sampling-based CPUprofiling facility A novel aspect of PCTis that it allows sampling of semanticallyrich data such as function call stacksfunction parameters local or globalvariables CPU registers or other execu-tion contexts The design objectives ofPCT were driven by user needs and theinadequacies or inaccessibility of priorsystems
The rich data collection capability isachieved via a debugger-controller pro-gram dbct1 The talk then introducedthe profiling collector and data collec-
tion activation PCT file format optionsprovide flexibility and various ways tostore exact states PCT also has analysisoptions two examples were given ndashldquoSimplerdquo and ldquoMixed User and LinuxCoderdquo
The talk then went into how generalsampling helps in the following areas adebugger-controller state machineexample script fragments a simpleexample program and general valuereports in terms of call-path histogramsnumeric value histograms and data gen-erality Related work such as gprof andExpect was then summarized The con-tribution of this work is to provide aportable general value sampling toolThe limitations are that PCT does notsupport much of strip and count inac-curacies happen because of its statisticalnature Based on this limitation -g ispreferred over strip
Possible future directions included sup-porting more debuggers such as dbxand xdb script generators for otherkinds of traces more canned reports forgeneral values and a libgdb-basedlibrary sampler PCT is available athttppdoslcsmitedupct PCT runs onalmost any UNIX-like system
ENGINEERING A DIFFERENCING AND
COMPRESSION DATA FORMAT
David G Korn and Kiem Phong VoATampT Laboratories ndash Research
This talk began with an equation ldquoDif-ferencing + Compression = Delta Com-pressionrdquo Compression removesinformation redundancy in a data setExamples are gzip bzip compress andpack Differencing encodes differencesbetween two data sets Examples are diff -e fdelta bdiff and xdelta Deltacompression compresses a data set givenanother combining differencing andcompression and reduces to pure com-pression when therersquos no commonality
After presenting an overall scheme for adelta compressor and showing deltacompression performance this talk dis-cussed the encoding format of a newlydesigned Vcdiff for delta compression A
Vcdiff instruction code table consists of256 entries of each coding up to a pair ofinstructions and recodes I-byte indicesand any additional data
The talk then showed Vcdiff rsquos perfor-mance with Web data collected fromCNN compared with W3C standardGdiff Gdiff+gzip and diff+gzip whereGdiff was computed using Vcdiff deltainstructions The results of two experi-ments ldquoFirstrdquo and ldquoSuccessiverdquo werepresented In ldquoFirstrdquo each file is com-pressed against the first file collected inldquoSuccessiverdquo each file is compressedagainst the one in the previous hourThe diff+gzip did not work well becausediff was line-oriented Vcdiff performedfavorably compared with other formats
Vcdiff is part of the Vcodex packageVcodex is a platform for all commondata transformations delta compres-sion plain compression encryptiontranscoding (eg uuencode base64) Itis structured in three layers for maxi-mum usability Base library uses Dis-plines and Methods interfaces
The code for Vcdiff can be found athttpwwwresearchattcomswtoolsThey are moving Vcdiff to the IETFstandard as a comprehensive platformfor transforming data Please refer tohttpwwwietforginternet-draftdraft-korn-vcdiff06txt
WHERE IN THE NET
Summarized by Amit Purohit
A PRECISE AND EFFICIENT EVALUATION OF
THE PROXIMITY BETWEEN WEB CLIENTS AND
THEIR LOCAL DNS SERVERS
Zhuoqing Morley Mao University ofCalifornia Berkeley Charles D CranorFred Douglis Michael RabinovichOliver Spatscheck and Jia Wang ATampTLabs ndash Research
Content Distribution Networks (CDNs)attempt to improve Web performance bydelivering Web content to end usersfrom servers located at the edge of thenetwork When a Web client requestscontent the CDN dynamically chooses aserver to route the request to usually the
73October 2002 login
C
ON
FERE
NC
ERE
PORT
Sone that is nearest the client As CDNshave access only to the IP address of thelocal DNS server (LDNS) of the clientthe CDNrsquos authoritative DNS servermaps the clientrsquos LDNS to a geographicregion within a particular network andcombines that with network and serverload information to perform CDNserver selection
This method has two limitations First itis based on the implicit assumption thatclients are close to their LDNS This maynot always be valid Second a singlerequest from an LDNS can represent adiffering number of Web clients ndash calledthe hidden load factor Knowledge of thehidden load factor can be used toachieve better load distribution
The extent of the first limitation and itsimpact on the CDN server selection isdealt with To determine the associationsbetween clients and their LDNS a sim-ple non-intrusive and efficient map-ping technique was developed The datacollected was used to study the impact ofproximity on DNS-based server selec-tion using four different proximity met-rics (1) autonomous system (AS)clustering ndash observing whether a client isin the same AS as its LDNS ndash concludedthat LDNS is very good for coarse-grained server selection as 64 of theassociations belong to the same AS(2) network clustering ndash observingwhether a client is in the same network-aware cluster (NAC) ndash implied that DNSis less useful for finer-grained serverselection since only 16 of the clientand LDNS are in the same NAC (3)traceroute divergence ndash the length of thedivergent paths to the client and itsLDNS from a probe point using tracer-oute ndash implies that most clients aretopologically close to their LDNS asviewed from a randomly chosen probesite (4) round-trip time (RTT) correla-tion (some CDNs select severs based onRTT between the CDN server and theclientrsquos LDNS) examines the correlationbetween the message RTTs from a probepoint to the client and its local DNS
results imply this is a reasonable metricto use to avoid really distant servers
A study of the impact that client-LDNSassociations have on DNS-based serverselection concludes that knowing theclientrsquos IP address would allow moreaccurate server selection in a large num-ber of cases The optimality of the serverselection also depends on the serverdensity placement and selection algo-rithms
Further information can be found athttpwwweecsberkeleyedu~zmaomyresearchhtml or by contactingzmaoeecsberkeleyedu
GEOGRAPHIC PROPERTIES OF INTERNET
ROUTING
Lakshminarayanan SubramanianVenkata N Padmanabhan MicrosoftResearch Randy H Katz University ofCalifornia at Berkeley
Geographic information can provideinsights into the structure and function-ing of the Internet including interac-tions between different autonomoussystems by analyzing certain propertiesof Internet routing It can be used tomeasure and quantify certain routingproperties such as circuitous routinghot-potato routing and geographic faulttolerance
Traceroute has been used to gather therequired data and Geotrack tool hasbeen used to determine the location ofthe nodes along each network path Thisenables the computation of ldquolinearizeddistancesrdquo which is the sum of the geo-graphic distances between successivepairs of routers along the path
In order to measure the circuitousnessof a path a metric ldquodistance ratiordquo hasbeen defined as the ratio of the lin-earized distance of a path to the geo-graphic distance between the source anddestination of the path From the data ithas been observed that the circuitous-ness of a route depends on both geo-graphic and network location of the endhosts A large value of the distance ratioenables us to flag paths that are highly
USENIX 2002
circuitous possibly (though not neces-sarily) because of routing anomalies Itis also shown that the minimum delaybetween end hosts is strongly correlatedwith the linearized distance of the path
Geographic information can be used tostudy various aspects of wide-area Inter-net paths that traverse multiple ISPs Itwas found that end-to-end Internetpaths tend to be more circuitous thanintra-ISP paths the cause for this beingthe peering relationships between ISPsAlso paths traversing substantial dis-tances within two or more ISPs tend tobe more circuitous than paths largelytraversing only a single ISP Anotherfinding is that ISPs generally employhot-potato routing
Geographic information can also beused to capture the fact that two seem-ingly unrelated routers can be suscepti-ble to correlated failures By using thegeographic information of routers wecan construct a geographic topology ofan ISP Using this we can find the toler-ance of an ISPrsquos network to the total net-work failure in a geographic region
For further information contactlakmecsberkeleyedu
PROVIDING PROCESS ORIGIN INFORMATION
TO AID IN NETWORK TRACEBACK
Florian P Buchholz Purdue UniversityClay Shields Georgetown University
Network traceback is currently limitedbecause host audit systems do not main-tain enough information to matchincoming network traffic to outgoingnetwork traffic The talk presented analternative assigning origin informationto every process and logging it duringinteractive login creation
The current implementation concen-trates mainly on interactive sessions inwhich an event is logged when a newconnection is established using SSH orTelnet The method is effective andcould successfully determine steppingstones and reliably detect the source of aDDoS attack The speaker then talkedabout the related work done in this area
74 Vol 27 No 5 login
which led to discussion of the latestdevelopment in the area of ldquohostcausalityrdquo
The speaker ended with the limitationsand future directions of the frameworkThis framework could be extended andcould find many applications in the areaof security Current implementationdoesnrsquot take care of startup scripts andcron jobs but incorporating the origininformation in FS could solve this prob-lem In the current implementation log-ging is just implemented as a proof ofconcept It could be made safe in manyways and this could be another impor-tant aspect of future work
PROGRAMMING
Summarized by Amit Purohit
CYCLONE A SAFE DIALECT OF C
Trevor Jim ATampT Labs ndash Research GregMorrisett Dan Grossman MichaelHicks James Cheney Yanling WangCornell University
Cyclone is designed to provide protec-tion against attacks such as buffer over-flows format string attacks andmemory management errors The cur-rent structure of C allows programmersto write vulnerable programs Cycloneextends C so that it has the safety guar-antee of Java while keeping C syntaxtypes and semantics untouched
The Cyclone compiler performs staticanalysis of the source code and insertsruntime checks into the compiled out-put at places where the analysis cannotdetermine that the code execution willnot violate safety constraints Cycloneimposes many restrictions to preservesafety such as NULL checksThesechecks do not exist in normal C
The speaker then talked about somesample code written in Cyclone and howit tackles safety problems Porting anexisting C application to Cyclone ispretty easy with fewer than 10 changerequired The current implementationconcentrates more on safety than onperformance ndash hence the performance
penalty is substantial Cyclone was ableto find many lingering bugs The finalstep of the project will be to write acompiler to convert a normal C programto an equivalent Cyclone program
COOPERATIVE TASK MANAGEMENT WITHOUT
MANUAL STACK MANAGEMENT
Atul Adya Jon Howell MarvinTheimer William J Bolosky John RDouceur Microsoft Research
The speaker described the definitionsand motivations behind the distinctconcepts of task management stackmanagement IO management conflictmanagement and data partitioningConventional concurrent programminguses preemptive task management andexploits the automatic stack manage-ment of a standard language In the sec-ond approach cooperative tasks areorganized as event handlers that yieldcontrol by returning control to the eventscheduler manually unrolling theirstacks In this project they have adopteda hybrid approach that makes it possiblefor both stack management styles tocoexist Thus programmers can codeassuming one or the other of these stackmanagement styles will operate depend-ing upon the application The speakeralso gave a detailed example of how touse their mechanism to switch betweenthe two styles
The talk then continued with the imple-mentation They were able to preemptmany subtle concurrency problems byusing cooperative task managementPaying a cost up-front to reduce a subtlerace condition proved to be a goodinvestment
Though the choice of task managementis fundamental the choice of stack man-agement can be left to individual tasteThis project enables use of any type ofstack management in conjunction withcooperative task management
IMPROVING WAIT-FREE ALGORITHMS FOR
INTERPROCESS COMMUNICATION IN
EMBEDDED REAL-TIME SYSTEMS
Hai Huang Padmanabhan Pillai KangG Shin University of Michigan
The main characteristic of the real-timetime-sensitive system is its pre-dictable response time But concurrencymanagement creates hurdles to achiev-ing this goal because of the use of locksto maintain consistency To solve thisproblem many wait-free algorithmshave been developed but these are typi-cally high-cost solutions By takingadvantage of the temporal characteris-tics of the system however the time andspace overhead can be reduced
The speaker presented an algorithm fortemporal concurrency control andapplied this technique to improve threewait-free algorithms A single-writermultiple-reader wait-free algorithm anda double-buffer algorithm were pro-posed
Using their transformation mechanismthey achieved an improvement of17ndash66 in ACET and a 14ndash70 reduc-tion in memory requirements for IPCalgorithms This mechanism is extensi-ble and can be applied to other non-blocking IPC algorithms as well Futurework involves reducing the synchroniz-ing overheads in more general IPC algo-rithms with multiple-writer semantics
MOBILITY
Summarized by Praveen Yalagandula
ROBUST POSITIONING ALGORITHMS FOR
DISTRIBUTED AD-HOC WIRELESS SENSOR
NETWORKS
Chris Savarese Jan Rabaey Universityof California Berkeley Koen Langen-doen Delft University of Technology
The Pico Radio Network comprisesmore than a hundred sensors monitorsand actuators equipped with wirelessconnectivity with the following proper-ties (1) no infrastructure (2) computa-tion in a distributed fashion instead ofcentralized computation (3) dynamictopology (4) limited radio range
75October 2002 login
C
ON
FERE
NC
ERE
PORT
S(5) nodes that can act as repeaters(6) devices of low cost and with lowpower and (7) sparse anchor nodes ndashnodes with GPS capability A positioningalgorithm determines the geographicalposition of each node in the network
In two-dimensional space each nodeneeds three reference positions to esti-mate its geographical position There aretwo problems that make positioning dif-ficult in the Pico Radio Network typesetting (1) a sparse anchor node prob-lem and (2) a range error problem
The two-phase approach that theauthors have taken in solving the prob-lem consists of (1) a hop-terrain algo-rithm ndash in this first phase each noderoughly guesses its location by the dis-tance calculated using multi-hops to theanchor points and (2) a refinementalgorithm ndash in this second phase eachnode uses its neighborsrsquo positions torefine its own position estimate Toguarantee the convergence thisapproach uses confidence metricsassigning a value of 10 for anchor nodesand 01 to start for other nodes andincreasing these with each iteration
A simulation tool OMNet++ was usedfor both phases and in various scenariosThe results show that they achievedposition errors of less than 33 in a sce-nario with 5 anchor nodes an averageconnectivity of 7 and 5 range mea-surement error
Guidelines for anchor node deploymentare high connectivity (gt10) a reason-able fraction of anchor nodes (gt 5)and for anchor placement coverededges
APPLICATION-SPECIFIC NETWORK
MANAGEMENT FOR ENERGY-AWARE
STREAMING OF POPULAR MULTIMEDIA
FORMATS
Surendar Chandra University of Geor-gia and Amin Vahdat Duke University
The main hindrance in supporting theincreasing demand for mobile multime-dia on PDA is the battery capacity of the
small devices The idle-time power con-sumption is almost the same as thereceive power consumption on typicalwireless network cards (1319mW vs1425mW) while the sleep state con-sumes far less power (177mW) To letthe system consume energy proportionalto the stream quality the network cardshould transition to sleep state aggres-sively between each packet
The speaker presented previous workwhere different multimedia streamswere studied for PDAs MS Media RealMedia and Quicktime It was found thatif the inter-packet gap is predictablethen huge savings are possible forexample about 80 savings in MSmedia streams with few packet lossesThe limitation of the IEEE 80211bpower-save mode comes into play whenthere are two or more nodes producinga delay between beacon time and whenthe node receivessends packets Thisbadly affects the streams since higherenergy is consumed while waiting
The authors propose traffic shaping forenergy conservation where this is doneby proxy such that packets arrive at reg-ular intervals This is achieved using alocal proxy in the access point and aclient-side proxy in the mobile hostSimulations show that traffic shapingreduces energy consumption and alsoreveals that the higher bandwidthstreams have a lower energy metric(mJkB)
More information is at httpgreenhousecsugaedu
CHARACTERIZING ALERT AND BROWSE SER-
VICES OF MOBILE CLIENTS
Atul Adya Paramvir Bahl and Lili QiuMicrosoft Research
Even though there is a dramatic increasein Internet access from wireless devicesthere are not many studies done oncharacterizing this traffic In this paperthe authors characterize the trafficobserved on the MSN Web site withboth notification and browse traces
USENIX 2002
Around 33 million browsing accessesand about 325 million notificationentries are present in the traces Threetypes of analyses are done for each oneof these two services content analysisconcerning the most popular contentcategories and their distribution popu-larity analysis or the popularity distri-bution of documents and user behavioranalysis
The analysis of the notification logsshows that document access rates followa Zipf-like distribution with most of theaccesses concentrated on a small num-ber of messages and the accesses exhibitgeographical locality ndash users from samelocality tend to receive similar notifica-tion content The browser log analysisshows that a smaller set of URLs areaccessed most times though the accesspattern does not fit any Zipf curve andthe highly accessed URLS remain stableA correlation study between the notifi-cation and browsing services shows thatwireless users have a moderate correla-tion of 012
FREENIX TRACK SESSIONS
BUILDING APPLICATIONS
Summarized by Matt Butner
INTERACTIVE 3-D GRAPHICS APPLICATIONS
FOR TCL
Oliver Kersting Juumlrgen Doumlllner HassoPlattner Institute for Software SystemsEngineering University of Potsdam
The integration of 3-D image renderingfunctionality into a scripting languagepermits interactive and animated 3-Ddevelopment and application withoutthe formalities and precision demandedby low-level CC++ graphics and visual-ization libraries The large and complexC++ API of the Virtual Rendering Sys-tem (VRS) can be combined with theconveniences of the Tcl scripting lan-guage The mapping of class interfaces isdone via an automated process and gen-erates respective wrapper classes all ofwhich ensures complete API accessibilityand functionality without any signifi-
76 Vol 27 No 5 login
cant performance retribution The gen-eral C++-to-Tcl mapper SWIG grantsthe necessary object and hierarchal abili-ties without the use of object-orientedTcl extensions The final API mappingtechniques address C++ features such asclasses overloaded methods enumera-tions and inheritance relations All areimplemented in a proof-of-concept thatmaps the complete C++ API of the VRSto Tcl and are showcased in a completeinteractive 3-D-map system
THE AGFL GRAMMAR WORK LAB
Cornelis HA Coster Erik VerbruggenUniversity of Nijmegen (KUN)
The growth and implementation of Nat-ural Language Processing (NLP) is thecornerstone of the continued evolutionand implementation of truly intelligentsearch machines and services In partdue to the growing collections of com-puter-stored human-readable docu-ments in the public domain theimplementation of linguistic analysiswill become necessary for desirable pre-cision and resolution Subsequentlysuch tools and linguistic resources mustbe released into the public domain andthey have done so with the release of theAGFL Grammar Work Lab under theGNU Public License as a tool for lin-guistic research and the development ofNLP-based applications
The AGFL (Affix Grammars over aFinite Lattice) Grammar Work Labmeshes context-free grammars withfinite set-valued features that are accept-able to a range of languages In com-puter science terms ldquoSyntax rules areprocedures with parameters and a non-deterministic executionrdquo The EnglishPhrases for Information Retrieval(EP4IR) released with the AGFL-GWLas a robust grammar of English is anAGFL-GWL generated English parserthat outputs ldquoHeadModifiedrdquo framesThe sentences ldquoCompanyX sponsoredthis conferencerdquo and ldquoThis conferencewas sponsored by CompanyXrdquo both gen-erate [CompanyX[sponsored confer-
ence]] The hope is that public availabil-ity of such tools will encourage furtherdevelopment of grammar and lexicalsoftware systems
SWILL A SIMPLE EMBEDDED WEB SERVER
LIBRARY
Sotiria Lampoudi David M BeazleyUniversity of Chicago
This paper won the FREENIX Best Stu-dent Paper award
SWILL (Simple Web Interface LinkLibrary) is a simple Web server in theform of a C library whose developmentwas motivated by a wish to give coolapplications an interface to the Web TheSWILL library provides a simple inter-face that can be efficiently implementedfor tasks that vary from flexible Web-based monitoring to software debuggingand diagnostics Though originallydesigned to be integrated with high-per-formance scientific simulation softwarethe interface is generic enough to allowfor unbounded uses SWILL is a single-threaded server relying upon non-blocking IO through the creation of atemporary server which services IOrequests SWILL does not provide SSL orcryptographic authentication but doeshave HTTP authentication abilities
A fantastic feature of SWILL is its sup-port for SPMD-style parallel applica-tions which utilize MPI provingvaluable for Beowulf clusters and largeparallel machines Another practicalapplication was the implementation ofSWILL in a modified Yalnix emulator byUniversity of Chicago Operating Sys-tems courses which utilized the addedYalnix functionality for OS developmentand debugging SWILL requires minimalmemory overhead and relies upon theHTTP10 protocol
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
will be solved in a legal environmentYou need to start thinking about how tosolve the problems and how the solu-tions will affect us
The entire speech (including slides) isavailable at httpwwwcounterpanecompresentation4pdf
LIFE IN AN OPEN SOURCE STARTUP
Daryll Strauss Consultant
Summarized by Teri Lampoudi
This talk was packed with morsels ofinsight and tidbits of information aboutwhat life in an open source startup islike (though ldquowas likerdquo might be moreappropriate) what the issues are instarting maintaining and commercial-izing an open source project and theway hardware vendors treat open sourcedevelopers Strauss began by tracing thetimeline of his involvement with devel-oping 3-D graphics support for Linuxfrom the 3dfx voodoo1 driver for Linuxin 1997 to the establishment of Preci-sion Insight in 1998 to its buyout by VALinux to the dismantling of his groupby VA to what the future may hold
Obviously there are benefits from doingopen source development both for thedeveloper and for the end user Animportant point was that open sourcedevelops entire technologies not justproducts A misapprehension is theinherent difficulty of managing a projectthat accepts code from a large number ofdevelopers some volunteer and somepaid The notion that code can just bethrown over the wall into the world is aBig Lie
How does one start and keep afloat anopen source development company Inthe case of Precision Insight the subjectmatter ndash developing graphics andOpenGL for Linux ndash required a consid-erable amount of expertise intimateknowledge of the hardware the librariesX and the Linux kernel as well as theend applications Expertise is mar-ketable What helps even further is hav-ing a visible virtual team of expertspeople who have an established trackrecord of contributions to open source
68 Vol 27 No 5 login
in the particular area of expertise Sup-port from the hardware and softwarevendors came next While everyone waswilling to contract PI to write the por-tions of code that were specific to theirhardware no one wanted to pay the billfor developing the underlying founda-tion that was necessary for the drivers tobe useful In the PI model the infra-structure cost was shared by several cus-tomers at once and the technology keptmoving Once a driver is written for avendor the vendor is perfectly capableof figuring out how to write the nextdriver necessary thereby obviating theneed to contract the job This and ques-tions of protecting intellectual propertyndash hardware design in this case ndash aredeeply problematic with the open sourcemode of development This is not to saythat there is no way to get over thembut that they are likely to arise moreoften than not
Management issues seem equally impor-tant In the PI setting contracts wereflexible ndash both a good and a bad thing ndashand developers overworked Furtherdevelopment ldquoin a fishbowlrdquo that isunder public scrutiny is not easyStrauss stressed the value of good com-munication and the use of various out-of-sync communication methods likeIRC and mailing lists
Finally Strauss discussed what portionsof software can be free and what can beproprietary His suggestion was that hor-izontal markets want to be free whereasvertically one can develop proprietarysolutions The talk closed with a smallstroll through the problems that projectpolitics brings up from things likemerging branches to mailing list andsource tree readability
GENERAL TRACK SESSIONS
FILE SYSTEMS
Summarized by Haijin Yan
STRUCTURE AND PERFORMANCE OF THE
DIRECT ACCESS FILE SYSTEM
Kostas Magoutis Salimah AddetiaAlexandra Fedorova and Margo ISeltzer Harvard University JeffreyChase Andrew Gallatin Richard Kisleyand Rajiv Wickremesinghe Duke Uni-versity and Eran Gabber Lucent Tech-nologies
This paper won the Best Paper award
The Direct Access File System (DAFS) isa new standard for network-attachedstorage over direct-access transport net-works DAFS takes advantage of user-level network interface standards thatenable client-side functionality in whichremote data access resides in a libraryrather than in the kernel This reducesthe overhead of memory copy for datamovement and protocol overheadRemote Direct Memory Access (RDMA)is a direct access transport network thatallows the network adapter to reducecopy overhead by accessing applicationbuffers directly
This paper explores the fundamentalstructural and performance characteris-tics of network file access using a user-level file system structure on adirect-access transport network withRDMA It describes DAFS-based clientand server reference implementationsfor FreeBSD and reports experimentalresults comparing DAFS to a zero-copyNFS implementation It illustrates thebenefits and trade-offs of these tech-niques to provide a basis for informedchoices about deployment of DAFS-based systems and similar extensions toother network file protocols such asNFS Experiments show that DAFS givesapplications direct control over an IOsystem and increases the client CPUrsquosusage while the client is doing IOFuture work includes how to addresslongstanding problems related to theintegration of the application and filesystem for high-performance applica-tions
CONQUEST BETTER PERFORMANCE THROUGH
A DISKPERSISTENT-RAM HYBRID FILE
SYSTEM
An-I A Wang Peter Reiher and GeraldJ Popek UCLA Geoffrey M Kuenning Harvey Mudd College
Motivated by the declining cost of per-sistent RAM the authors propose theConquest file system which stores allsmall files metadata executables andshared libraries in persistent RAM diskshold only the data content of remaininglarge files Compared to alternativessuch as caching and RAM file systemsConquest has the advantages of effi-ciency consistency and reliability at areduced cost Using popular bench-marks experiments show that Conquestincurs little overhead while achievingfaster performance Future workincludes designing mechanisms foradjusting file-size threshold dynamicallyand finding a better disk layout for largedata blocks
EXPLOITING GRAY-BOX KNOWLEDGE OF
BUFFER-CACHE MANAGEMENT
Nathan C Burnett John Bent AndreaC Arpaci-Dusseau and Remzi HArpaci-Dusseau University of Wisconsin Madison
Knowing what algorithm is used tomanage the buffer cache is very impor-tant for improving application perfor-mance However there is currently nointerface for finding this algorithm Thispaper introduces Dust a simple finger-printing tool that is able to identify thebuffer-cache replacement policy Dustautomatically identifies the cache sizeand replacement policy based on theconfiguring attributes of access ordersrecency frequency and long-term his-tory Through simulation Dust was ableto distinguish between a variety ofreplacement algorithm policies found inthe literature FIFO LRU LFU ClockSegmented FIFO 2Qand LRU-K Fur-ther experiments of fingerprinting realoperating system such as NetBSDLinux and Solaris show that Dust is ableto identify the hidden cache replacementalgorithm
69October 2002 login
C
ON
FERE
NC
ERE
PORT
SBy knowing the underlying cachereplacement policy a cache-aware Webserver can reschedule the requests on acached request-first policy to obtain per-formance improvement Experimentsshow that cache-aware schedulingimproves average response time and sys-tem throughput
OPERATING SYSTEMS
(AND DANCING BEARS)
Summarized by Matt Butner
THE JX OPERATING SYSTEM
Michael Golm Meik Felser ChristianWawersich and Juumlrgen Kleinoumlder University of Erlagen-Nuumlrnberg
The talk opened by emphasizing theneed for and the practicality of a func-tional Java OS Such implementation inthe JX OS attempts to mimic the recenttrend toward using highly abstractedlanguages in application development inorder to create an OS as functional andpowerful as one written in a lower-levellanguage
The JX is a micro-kernel solution thatuses separate JVMs for each entity of thekernel and in some cases for eachapplication The separated domains donot share objects and have no threadmigration and each domain implementsits own code The interesting part of thepresentation was the discussion of sys-tem-level programming with Java keyareas discussed were memory manage-ment and interrupt handlers Theauthors concluded by noting that theirtype-safe and modular system resultedin a robust system with great configura-tion flexibility and acceptable perfor-mance
DESIGN EVOLUTION OF THE EROS
SINGLE-LEVEL STORE
Jonathan S Shapiro Johns HopkinsUniversity Jonathan Adams Universityof Pennsylvania
This presentation outlined current char-acteristics of file systems and some oftheir least desirable characteristics Itwas based on the revival of the EROS
single-level-store approach for currentattempts to capitalize on system consis-tency and efficiency Their solution didnot relate directly to EROS but ratherto the design and use of the constructsand notions on which EROS is based Byextending the mapping of the memorysystem to include the disk systems arefurther able to ensure global persistencewithout regard for a disk structure Inexplaining the need for absolute systemconsistency and an environment whichby definition is not allowed to crash thegoal of having such an exhaustive designbecomes clear The cool part of thiswork is the availability of its design andarchitecture to the public
THINK A SOFTWARE FRAMEWORK FOR
COMPONENT-BASED OPERATING SYSTEM
KERNELS
Jean-Philippe Fassino France TeacuteleacutecomRampD Jean-Bernard Stefani INRIA JuliaLawall DIKU Gilles Muller INRIA
This presentation discussed the need forcomponent-based operating systemsand respective structures to ensure flexi-bility for arbitrary-sized systems Thinkprovides a binding model that mapsuniformed components for OS develop-ers and architects to follow ensuringconsistent implementations for an arbi-trary system size However the goal isnot to force developers into a predefinedkernel but to promote the use of certaincomponents in varied ways
Thinkrsquos concentration is primarily onembedded systems where short devel-opment time is necessary but is con-strained by rigorous needs and limitedresources The need to build flexible sys-tems in such an environment can becostly and implementation-specific butThink creates an environment sup-ported by the ability to dynamically loadtype-safe components This allows formore flexible systems that retain func-tionality because of Thinkrsquos ability tobind fine-grained components Bench-marks revealed that dedicated micro-kernels can show performanceimprovements in comparison to mono-
USENIX 2002
lithic kernels Notable future workincludes building components for low-end appliances and the development of areal-time OS component library
BUILDING SERVICES
Summarized by Pradipta De
NINJA A FRAMEWORK FOR NETWORK
SERVICES
J Robert von Behren Eric A BrewerNikita Borisov Michael Chen MattWelsh Josh MacDonald Jeremy Lauand David Culler University of Califor-nia Berkeley
Robert von Behren presented Ninja anongoing project that aims to provide aframework for building robust and scal-able Internet-based network serviceslike Web-hosting instant-messagingemail and file-sharing applicationsRobert drew attention to the difficultiesof writing cluster applications One hasto take care of data consistency andissues of concurrency and for robustapplications there are problems relatedto fault tolerance Ninja works as awrapper to relieve the user of theseproblems The goal of the project isbuilding network services that are scala-ble and highly available maintainingpersistent data providing gracefuldegradation and supporting online evo-lution
The use of clusters distinguishes thissetup from generic distributed systemsin terms of reliability and security aswell as providing a partition-free net-work The next important feature of thisproject is the new programming modelwhich is more restrictive than a generalprogramming model but still expressiveenough to write most of the applica-tions This model described as ldquosingleprogram multiple connectionrdquo usesintertask parallelism instead of multi-threaded concurrency and is character-ized by all nodes running the sameprogram with connections delegated tothe nodes by a centralized connectionmanager (CM) A CM takes care of hid-ing the details of mapping an externalconnection to an internal node Ninja
70 Vol 27 No 5 login
also uses a design called ldquostaged event-driven architecturerdquo where each serviceis broken down into a set of stages con-nected by event queues This architec-ture is suitable for a modular design andhelps in graceful degradation by adap-tive load shedding from the eventqueues
The presentation concluded with exam-ples of implementation and evaluationof a Web server and an email systemcalled NinjaMail to show the ease ofauthoring and the efficacy of using theNinja framework for service develop-ment
USING COHORT-SCHEDULING TO ENHANCE
SERVER PERFORMANCE
James R Larus and Michael ParkesMicrosoft Research
James Larus presented a new schedulingpolicy for increasing the throughput ofserver applications Cohort schedulingbatches execution of similar operationsarising in different server requests Theusual programming paradigm in han-dling server requests is to spawn multi-ple concurrent threads and switch fromone thread to another whenever a threadgets blocked for IO Since the threads ina server mostly execute unrelated piecesof code the locality of reference isreduced hence the effectiveness of dif-ferent caching mechanisms One way tosolve this problem is to throw in morehardware But Larus presented a com-plementary view where the programbehavior is investigated and used toimprove the performance
The problem in this scheme is to iden-tify pieces of code that can be batchedtogether for processing One simple wayis to look at the next program countervalues and use them to club threadstogether However this talk presented anew programming abstraction ldquostagedcomputationrdquo which replaces the threadmodel with ldquostagesrdquo A stage is anabstraction for operations with similarbehavior and locality The StagedServerlibrary can be used for programming inthis model It is a collection of C++
classes that implement staged computa-tion and cohort scheduling on either auniprocessor or multiprocessor It canbe used to define stages
The presentation showed the experi-mental evaluation of the cohort schedul-ing over the thread-based model byimplementing two servers a Web serverwhich is mainly IO bound and a pub-lish-subscribe server which is mainlycompute bound The SURGE bench-mark was used for the first experimentand the Fabret workload for the secondThe results showed that cohort-schedul-ing-based implementation gave a betterthroughput than the thread-basedimplementation at high loads
NETWORK PERFORMANCE
Summarized by Xiaobo Fan
ETE PASSIVE END-TO-END INTERNET SERVICE
PERFORMANCE MONITORING
Yun Fu Amin Vahdat Duke UniversityLudmila Cherkasova Wenting Tang HPLabs
This paper won the Best Student Paperaward
Ludmila Cherkasova began by listingseveral questions most Web serviceproviders want answered in order toimprove service quality She reviewedthe difficulties of making accurate andefficient end-to-end Web service mea-surement and the shortcomings of cur-rently available approaches Theypropose a passive trace-based architec-ture called EtE to monitor Web serverperformance on behalf of end users
The first step is to collect network pack-ets passively The second module recon-structs all TCP connections and extractsHTTP transactions To reconstruct Webpage accesses they first build a knowl-edge base indexed by client IP and URLand then group objects to the relatedWeb pages they are embedded in Statis-tical analysis is used to handle non-matched objects EtE Monitor cangenerate three groups of metrics to mea-sure Web service performance responsetime Web caching and page abortion
To demonstrate the benefits of EtE mon-itor Cherkasova talked about two casestudies and based on the calculatedmetrics gave some insightful explana-tions about whatrsquos happening behind the variations of Web performanceThrough validation they claim theirapproach provides a very close approxi-mation to the real scenario
THE PERFORMANCE OF REMOTE DISPLAY
MECHANISMS FOR THIN-CLIENT COMPUTING
S Jae Yang Jason Nieh Matt SelskyNikhil Tiwari Columbia University
Noting the trend toward thin-clientcomputing the authors compared dif-ferent techniques and design choices inmeasuring the performance of six popu-lar thin-client platforms ndash CitrixMetaFrame Microsoft Terminal Ser-vices Sun Ray Tarantella VNC and XAfter pointing out several challenges inbenchmarking thin clients Yang pro-posed slow-motion benchmarking toachieve non-invasive packet monitoringand consistent visual quality Basicallythey insert delays between separatevisual events in the benchmark applica-tions of the server side so that theclientrsquos display update can catch up withthe serverrsquos processing speed The exper-iments are conducted on an emulatednetwork over a range of network band-widths
Their results show that thin clients canprovide good performance for Webapplications in LAN environments butonly some platforms performed well forvideo benchmark Pixel-based encodingmay achieve better performance andbandwidth efficiency than high-levelgraphics Display caching and compres-sion should be used with care
A MECHANISM FOR TCP-FRIENDLY
TRANSPORT-LEVEL PROTOCOL
COORDINATION
David E Ott and Ketan Mayer-PatelUniversity of North Carolina ChapelHill
A revised transport-level protocol opti-mized for cluster-to-cluster (C-to-C)
71October 2002 login
C
ON
FERE
NC
ERE
PORT
Sapplications is what David Ott tries toexplain in his talk A C-to-C applicationclass is identified as one set of processescommunicating to another set ofprocesses across a common Internetpath The fundamental problem with C-to-C applications is how to coordinateall C-to-C communication flows so thatthey share a consistent view of the com-mon C-to-C network adapt to changingnetwork conditions and cooperate tomeet specific requirements Aggregationpoints (AP) are placed at the first andlast hop of the common data path toprobe network conditions (latencybandwidth loss rate etc) To carry andtransfer this information a new protocolndash Coordination Protocol (CP) ndash isinserted between the network layer (IP)and the transport layer (TCP UCP etc)Ott illustrated how the CP header isupdated and used when packets origi-nate from source and traverse throughlocal and remote AP to arrive at theirdestination and how AP maintains aper-cluster state table and detects net-work conditions Through simulationresults this coordination mechanismappears effective in sharing commonnetwork resources among C-to-C com-munication flows
STORAGE SYSTEMS
Summarized by Praveen Yalagandula
MY CACHE OR YOURS MAKING STORAGE
MORE EXCLUSIVE
Theodore M Wong Carnegie MellonUniversity John Wilkes HP Labs
Theodore Wong explained the ineffi-ciency of current ldquoinclusiverdquo cachingschemes in storage area networks ndashwhen a client accesses a block the blockis read from the disk and is cached atboth the disk array cache and at theclientrsquos cache Then he presented theconcept of ldquoexclusiverdquo caching where ablock accessed by a client is only cachedin that clientrsquos cache On eviction fromthe clientrsquos cache they have come upwith a DEMOTE operation to move thedata block to the tail of the array cache
The new exclusion caching schemes areevaluated on both single-client and mul-tiple-client systems The single-clientresults are presented for two differenttypes of synthetic workloads Random(transaction-processing type workloads)and Zipf (typical Web workloads) Theexclusive policy was quite effective inachieving higher hit rates and lower readlatencies They also showed a 22 timeshit-rate improvement over inclusivetechniques in the case of a real-lifeworkload httpd with a single-client set-ting
The DEMOTE scheme also performedwell in the case of multiple-client sys-tems when the data accessed by clients isdisjointed In the case of shared dataworkloads this scheme performed worsethan the inclusive schemes Theodorethen presented an adaptive exclusivecaching scheme ndash a block accessed by aclient is placed at the tail in the clientrsquoscache and also in the array cache at anappropriate place determined by thepopularity of the block The popularityof the block is measured by maintaininga ghost cache to accumulate the numberof times each block is accessed This newadaptive scheme has achieved a maxi-mum of 52 speedup in the meanlatency in the experiments with real-lifeworkloads
For more information visit httpwwwcscmuedu~tmwongresearch and httpwwwhplhpcomresearchitccslssp
BRIDGING THE INFORMATION GAP IN
STORAGE PROTOCOL STACKS
Timothy E Denehy Andrea C Arpaci-Dusseau Remzi H Arpaci-DusseauUniversity of Wisconsin Madison
Currently there is a huge informationgap between storage systems and file sys-tems The interface exposed by storagesystems to file systems based on blocksand providing only readwrite interfacesis very narrow This leads to poor per-formance as a whole because of dupli-cated functionality in both systems and
USENIX 2002
reduced functionality resulting fromstorage systemsrsquo lack of file information
The speaker presented two enhance-ments ExRaid an exposed RAID andILFS Informed Log-Structured File Sys-tem ExRAID is an enhancement to theblock-based RAID storage system thatexposes the following three types ofinformation to the file system (1)regions ndash contiguous portions of theaddress space comprising one or multi-ple disks (2) performance informationabout queue lengths and throughput ofthe regions revealing disk heterogeneityto the file systems and (3) failure infor-mation ndash dynamically updated informa-tion conveying the number of tolerablefailures in each region
ILFS allows online incremental expan-sion of the storage space performsdynamic parallelism using ExRAIDrsquosperformance information to performwell on heterogeneous storage systemsand provides a range of different mecha-nisms with different granularities forredundancy of files using ExRAIDrsquosregion failure characteristics These newfeatures are added to LFS with only a19 increase in the code size
More informationhttpwwwcswisceduwind
MAXIMIZING THROUGHPUT IN REPLICATED
DISK STRIPING OF VARIABLE
BIT-RATE STREAMS
Stergios V Anastasiadis Duke Univer-sity Kenneth C Sevcik and MichaelStumm University of Toronto
There is an increasing demand for thecontinuous real-time streaming ofmedia files Disk striping is a commontechnique used for supporting manyconnections on the media servers Buteven with 12 million hours of meantime between failures there will be morethan one disk failure per week with 7000disks Fault tolerance can be achieved byusing data replication and reservingextra bandwidth during normal opera-tion This work focused on supportingthe most common variable bit-ratestream formats (eg MPEG)
72 Vol 27 No 5 login
The authors used prototype mediaserver EXEDRA to implement and eval-uate different replication and bandwidthreservation schemes This system sup-ports a variable bit-rate scheme doesstride-based disk allocation and is capa-ble of supporting variable-grain diskstriping Two replication policies arepresented deterministic ndash data blocksare replicated in a round-robin fashionon the secondary replicas random ndashdata blocks are replicated on the ran-domly chosen secondary replicas Twobandwidth reservation techniques arealso presented mirroring reservation ndashdisk bandwidth is reserved for both primary and replicas of the media fileduring the playback and minimumreservation ndash a more efficient scheme inwhich bandwidth is reserved only for thesum of primary data access time and themaximum of the backup data accesstimes required in each round
Experimental results showed that deter-ministic replica placement is better thanrandom placement for small disks mini-mum disk bandwidth reservation istwice as good as mirroring in thethroughput achieved and fault tolerancecan be achieved with a minimal impacton the throughput
TOOLS
Summarized by Li Xiao
SIMPLE AND GENERAL STATISTICAL PROFILING
WITH PCT
Charles Blake and Steven Bauer MIT
This talk introduced a Profile CollectionToolkit (PCT) ndash a sampling-based CPUprofiling facility A novel aspect of PCTis that it allows sampling of semanticallyrich data such as function call stacksfunction parameters local or globalvariables CPU registers or other execu-tion contexts The design objectives ofPCT were driven by user needs and theinadequacies or inaccessibility of priorsystems
The rich data collection capability isachieved via a debugger-controller pro-gram dbct1 The talk then introducedthe profiling collector and data collec-
tion activation PCT file format optionsprovide flexibility and various ways tostore exact states PCT also has analysisoptions two examples were given ndashldquoSimplerdquo and ldquoMixed User and LinuxCoderdquo
The talk then went into how generalsampling helps in the following areas adebugger-controller state machineexample script fragments a simpleexample program and general valuereports in terms of call-path histogramsnumeric value histograms and data gen-erality Related work such as gprof andExpect was then summarized The con-tribution of this work is to provide aportable general value sampling toolThe limitations are that PCT does notsupport much of strip and count inac-curacies happen because of its statisticalnature Based on this limitation -g ispreferred over strip
Possible future directions included sup-porting more debuggers such as dbxand xdb script generators for otherkinds of traces more canned reports forgeneral values and a libgdb-basedlibrary sampler PCT is available athttppdoslcsmitedupct PCT runs onalmost any UNIX-like system
ENGINEERING A DIFFERENCING AND
COMPRESSION DATA FORMAT
David G Korn and Kiem Phong VoATampT Laboratories ndash Research
This talk began with an equation ldquoDif-ferencing + Compression = Delta Com-pressionrdquo Compression removesinformation redundancy in a data setExamples are gzip bzip compress andpack Differencing encodes differencesbetween two data sets Examples are diff -e fdelta bdiff and xdelta Deltacompression compresses a data set givenanother combining differencing andcompression and reduces to pure com-pression when therersquos no commonality
After presenting an overall scheme for adelta compressor and showing deltacompression performance this talk dis-cussed the encoding format of a newlydesigned Vcdiff for delta compression A
Vcdiff instruction code table consists of256 entries of each coding up to a pair ofinstructions and recodes I-byte indicesand any additional data
The talk then showed Vcdiff rsquos perfor-mance with Web data collected fromCNN compared with W3C standardGdiff Gdiff+gzip and diff+gzip whereGdiff was computed using Vcdiff deltainstructions The results of two experi-ments ldquoFirstrdquo and ldquoSuccessiverdquo werepresented In ldquoFirstrdquo each file is com-pressed against the first file collected inldquoSuccessiverdquo each file is compressedagainst the one in the previous hourThe diff+gzip did not work well becausediff was line-oriented Vcdiff performedfavorably compared with other formats
Vcdiff is part of the Vcodex packageVcodex is a platform for all commondata transformations delta compres-sion plain compression encryptiontranscoding (eg uuencode base64) Itis structured in three layers for maxi-mum usability Base library uses Dis-plines and Methods interfaces
The code for Vcdiff can be found athttpwwwresearchattcomswtoolsThey are moving Vcdiff to the IETFstandard as a comprehensive platformfor transforming data Please refer tohttpwwwietforginternet-draftdraft-korn-vcdiff06txt
WHERE IN THE NET
Summarized by Amit Purohit
A PRECISE AND EFFICIENT EVALUATION OF
THE PROXIMITY BETWEEN WEB CLIENTS AND
THEIR LOCAL DNS SERVERS
Zhuoqing Morley Mao University ofCalifornia Berkeley Charles D CranorFred Douglis Michael RabinovichOliver Spatscheck and Jia Wang ATampTLabs ndash Research
Content Distribution Networks (CDNs)attempt to improve Web performance bydelivering Web content to end usersfrom servers located at the edge of thenetwork When a Web client requestscontent the CDN dynamically chooses aserver to route the request to usually the
73October 2002 login
C
ON
FERE
NC
ERE
PORT
Sone that is nearest the client As CDNshave access only to the IP address of thelocal DNS server (LDNS) of the clientthe CDNrsquos authoritative DNS servermaps the clientrsquos LDNS to a geographicregion within a particular network andcombines that with network and serverload information to perform CDNserver selection
This method has two limitations First itis based on the implicit assumption thatclients are close to their LDNS This maynot always be valid Second a singlerequest from an LDNS can represent adiffering number of Web clients ndash calledthe hidden load factor Knowledge of thehidden load factor can be used toachieve better load distribution
The extent of the first limitation and itsimpact on the CDN server selection isdealt with To determine the associationsbetween clients and their LDNS a sim-ple non-intrusive and efficient map-ping technique was developed The datacollected was used to study the impact ofproximity on DNS-based server selec-tion using four different proximity met-rics (1) autonomous system (AS)clustering ndash observing whether a client isin the same AS as its LDNS ndash concludedthat LDNS is very good for coarse-grained server selection as 64 of theassociations belong to the same AS(2) network clustering ndash observingwhether a client is in the same network-aware cluster (NAC) ndash implied that DNSis less useful for finer-grained serverselection since only 16 of the clientand LDNS are in the same NAC (3)traceroute divergence ndash the length of thedivergent paths to the client and itsLDNS from a probe point using tracer-oute ndash implies that most clients aretopologically close to their LDNS asviewed from a randomly chosen probesite (4) round-trip time (RTT) correla-tion (some CDNs select severs based onRTT between the CDN server and theclientrsquos LDNS) examines the correlationbetween the message RTTs from a probepoint to the client and its local DNS
results imply this is a reasonable metricto use to avoid really distant servers
A study of the impact that client-LDNSassociations have on DNS-based serverselection concludes that knowing theclientrsquos IP address would allow moreaccurate server selection in a large num-ber of cases The optimality of the serverselection also depends on the serverdensity placement and selection algo-rithms
Further information can be found athttpwwweecsberkeleyedu~zmaomyresearchhtml or by contactingzmaoeecsberkeleyedu
GEOGRAPHIC PROPERTIES OF INTERNET
ROUTING
Lakshminarayanan SubramanianVenkata N Padmanabhan MicrosoftResearch Randy H Katz University ofCalifornia at Berkeley
Geographic information can provideinsights into the structure and function-ing of the Internet including interac-tions between different autonomoussystems by analyzing certain propertiesof Internet routing It can be used tomeasure and quantify certain routingproperties such as circuitous routinghot-potato routing and geographic faulttolerance
Traceroute has been used to gather therequired data and Geotrack tool hasbeen used to determine the location ofthe nodes along each network path Thisenables the computation of ldquolinearizeddistancesrdquo which is the sum of the geo-graphic distances between successivepairs of routers along the path
In order to measure the circuitousnessof a path a metric ldquodistance ratiordquo hasbeen defined as the ratio of the lin-earized distance of a path to the geo-graphic distance between the source anddestination of the path From the data ithas been observed that the circuitous-ness of a route depends on both geo-graphic and network location of the endhosts A large value of the distance ratioenables us to flag paths that are highly
USENIX 2002
circuitous possibly (though not neces-sarily) because of routing anomalies Itis also shown that the minimum delaybetween end hosts is strongly correlatedwith the linearized distance of the path
Geographic information can be used tostudy various aspects of wide-area Inter-net paths that traverse multiple ISPs Itwas found that end-to-end Internetpaths tend to be more circuitous thanintra-ISP paths the cause for this beingthe peering relationships between ISPsAlso paths traversing substantial dis-tances within two or more ISPs tend tobe more circuitous than paths largelytraversing only a single ISP Anotherfinding is that ISPs generally employhot-potato routing
Geographic information can also beused to capture the fact that two seem-ingly unrelated routers can be suscepti-ble to correlated failures By using thegeographic information of routers wecan construct a geographic topology ofan ISP Using this we can find the toler-ance of an ISPrsquos network to the total net-work failure in a geographic region
For further information contactlakmecsberkeleyedu
PROVIDING PROCESS ORIGIN INFORMATION
TO AID IN NETWORK TRACEBACK
Florian P Buchholz Purdue UniversityClay Shields Georgetown University
Network traceback is currently limitedbecause host audit systems do not main-tain enough information to matchincoming network traffic to outgoingnetwork traffic The talk presented analternative assigning origin informationto every process and logging it duringinteractive login creation
The current implementation concen-trates mainly on interactive sessions inwhich an event is logged when a newconnection is established using SSH orTelnet The method is effective andcould successfully determine steppingstones and reliably detect the source of aDDoS attack The speaker then talkedabout the related work done in this area
74 Vol 27 No 5 login
which led to discussion of the latestdevelopment in the area of ldquohostcausalityrdquo
The speaker ended with the limitationsand future directions of the frameworkThis framework could be extended andcould find many applications in the areaof security Current implementationdoesnrsquot take care of startup scripts andcron jobs but incorporating the origininformation in FS could solve this prob-lem In the current implementation log-ging is just implemented as a proof ofconcept It could be made safe in manyways and this could be another impor-tant aspect of future work
PROGRAMMING
Summarized by Amit Purohit
CYCLONE A SAFE DIALECT OF C
Trevor Jim ATampT Labs ndash Research GregMorrisett Dan Grossman MichaelHicks James Cheney Yanling WangCornell University
Cyclone is designed to provide protec-tion against attacks such as buffer over-flows format string attacks andmemory management errors The cur-rent structure of C allows programmersto write vulnerable programs Cycloneextends C so that it has the safety guar-antee of Java while keeping C syntaxtypes and semantics untouched
The Cyclone compiler performs staticanalysis of the source code and insertsruntime checks into the compiled out-put at places where the analysis cannotdetermine that the code execution willnot violate safety constraints Cycloneimposes many restrictions to preservesafety such as NULL checksThesechecks do not exist in normal C
The speaker then talked about somesample code written in Cyclone and howit tackles safety problems Porting anexisting C application to Cyclone ispretty easy with fewer than 10 changerequired The current implementationconcentrates more on safety than onperformance ndash hence the performance
penalty is substantial Cyclone was ableto find many lingering bugs The finalstep of the project will be to write acompiler to convert a normal C programto an equivalent Cyclone program
COOPERATIVE TASK MANAGEMENT WITHOUT
MANUAL STACK MANAGEMENT
Atul Adya Jon Howell MarvinTheimer William J Bolosky John RDouceur Microsoft Research
The speaker described the definitionsand motivations behind the distinctconcepts of task management stackmanagement IO management conflictmanagement and data partitioningConventional concurrent programminguses preemptive task management andexploits the automatic stack manage-ment of a standard language In the sec-ond approach cooperative tasks areorganized as event handlers that yieldcontrol by returning control to the eventscheduler manually unrolling theirstacks In this project they have adopteda hybrid approach that makes it possiblefor both stack management styles tocoexist Thus programmers can codeassuming one or the other of these stackmanagement styles will operate depend-ing upon the application The speakeralso gave a detailed example of how touse their mechanism to switch betweenthe two styles
The talk then continued with the imple-mentation They were able to preemptmany subtle concurrency problems byusing cooperative task managementPaying a cost up-front to reduce a subtlerace condition proved to be a goodinvestment
Though the choice of task managementis fundamental the choice of stack man-agement can be left to individual tasteThis project enables use of any type ofstack management in conjunction withcooperative task management
IMPROVING WAIT-FREE ALGORITHMS FOR
INTERPROCESS COMMUNICATION IN
EMBEDDED REAL-TIME SYSTEMS
Hai Huang Padmanabhan Pillai KangG Shin University of Michigan
The main characteristic of the real-timetime-sensitive system is its pre-dictable response time But concurrencymanagement creates hurdles to achiev-ing this goal because of the use of locksto maintain consistency To solve thisproblem many wait-free algorithmshave been developed but these are typi-cally high-cost solutions By takingadvantage of the temporal characteris-tics of the system however the time andspace overhead can be reduced
The speaker presented an algorithm fortemporal concurrency control andapplied this technique to improve threewait-free algorithms A single-writermultiple-reader wait-free algorithm anda double-buffer algorithm were pro-posed
Using their transformation mechanismthey achieved an improvement of17ndash66 in ACET and a 14ndash70 reduc-tion in memory requirements for IPCalgorithms This mechanism is extensi-ble and can be applied to other non-blocking IPC algorithms as well Futurework involves reducing the synchroniz-ing overheads in more general IPC algo-rithms with multiple-writer semantics
MOBILITY
Summarized by Praveen Yalagandula
ROBUST POSITIONING ALGORITHMS FOR
DISTRIBUTED AD-HOC WIRELESS SENSOR
NETWORKS
Chris Savarese Jan Rabaey Universityof California Berkeley Koen Langen-doen Delft University of Technology
The Pico Radio Network comprisesmore than a hundred sensors monitorsand actuators equipped with wirelessconnectivity with the following proper-ties (1) no infrastructure (2) computa-tion in a distributed fashion instead ofcentralized computation (3) dynamictopology (4) limited radio range
75October 2002 login
C
ON
FERE
NC
ERE
PORT
S(5) nodes that can act as repeaters(6) devices of low cost and with lowpower and (7) sparse anchor nodes ndashnodes with GPS capability A positioningalgorithm determines the geographicalposition of each node in the network
In two-dimensional space each nodeneeds three reference positions to esti-mate its geographical position There aretwo problems that make positioning dif-ficult in the Pico Radio Network typesetting (1) a sparse anchor node prob-lem and (2) a range error problem
The two-phase approach that theauthors have taken in solving the prob-lem consists of (1) a hop-terrain algo-rithm ndash in this first phase each noderoughly guesses its location by the dis-tance calculated using multi-hops to theanchor points and (2) a refinementalgorithm ndash in this second phase eachnode uses its neighborsrsquo positions torefine its own position estimate Toguarantee the convergence thisapproach uses confidence metricsassigning a value of 10 for anchor nodesand 01 to start for other nodes andincreasing these with each iteration
A simulation tool OMNet++ was usedfor both phases and in various scenariosThe results show that they achievedposition errors of less than 33 in a sce-nario with 5 anchor nodes an averageconnectivity of 7 and 5 range mea-surement error
Guidelines for anchor node deploymentare high connectivity (gt10) a reason-able fraction of anchor nodes (gt 5)and for anchor placement coverededges
APPLICATION-SPECIFIC NETWORK
MANAGEMENT FOR ENERGY-AWARE
STREAMING OF POPULAR MULTIMEDIA
FORMATS
Surendar Chandra University of Geor-gia and Amin Vahdat Duke University
The main hindrance in supporting theincreasing demand for mobile multime-dia on PDA is the battery capacity of the
small devices The idle-time power con-sumption is almost the same as thereceive power consumption on typicalwireless network cards (1319mW vs1425mW) while the sleep state con-sumes far less power (177mW) To letthe system consume energy proportionalto the stream quality the network cardshould transition to sleep state aggres-sively between each packet
The speaker presented previous workwhere different multimedia streamswere studied for PDAs MS Media RealMedia and Quicktime It was found thatif the inter-packet gap is predictablethen huge savings are possible forexample about 80 savings in MSmedia streams with few packet lossesThe limitation of the IEEE 80211bpower-save mode comes into play whenthere are two or more nodes producinga delay between beacon time and whenthe node receivessends packets Thisbadly affects the streams since higherenergy is consumed while waiting
The authors propose traffic shaping forenergy conservation where this is doneby proxy such that packets arrive at reg-ular intervals This is achieved using alocal proxy in the access point and aclient-side proxy in the mobile hostSimulations show that traffic shapingreduces energy consumption and alsoreveals that the higher bandwidthstreams have a lower energy metric(mJkB)
More information is at httpgreenhousecsugaedu
CHARACTERIZING ALERT AND BROWSE SER-
VICES OF MOBILE CLIENTS
Atul Adya Paramvir Bahl and Lili QiuMicrosoft Research
Even though there is a dramatic increasein Internet access from wireless devicesthere are not many studies done oncharacterizing this traffic In this paperthe authors characterize the trafficobserved on the MSN Web site withboth notification and browse traces
USENIX 2002
Around 33 million browsing accessesand about 325 million notificationentries are present in the traces Threetypes of analyses are done for each oneof these two services content analysisconcerning the most popular contentcategories and their distribution popu-larity analysis or the popularity distri-bution of documents and user behavioranalysis
The analysis of the notification logsshows that document access rates followa Zipf-like distribution with most of theaccesses concentrated on a small num-ber of messages and the accesses exhibitgeographical locality ndash users from samelocality tend to receive similar notifica-tion content The browser log analysisshows that a smaller set of URLs areaccessed most times though the accesspattern does not fit any Zipf curve andthe highly accessed URLS remain stableA correlation study between the notifi-cation and browsing services shows thatwireless users have a moderate correla-tion of 012
FREENIX TRACK SESSIONS
BUILDING APPLICATIONS
Summarized by Matt Butner
INTERACTIVE 3-D GRAPHICS APPLICATIONS
FOR TCL
Oliver Kersting Juumlrgen Doumlllner HassoPlattner Institute for Software SystemsEngineering University of Potsdam
The integration of 3-D image renderingfunctionality into a scripting languagepermits interactive and animated 3-Ddevelopment and application withoutthe formalities and precision demandedby low-level CC++ graphics and visual-ization libraries The large and complexC++ API of the Virtual Rendering Sys-tem (VRS) can be combined with theconveniences of the Tcl scripting lan-guage The mapping of class interfaces isdone via an automated process and gen-erates respective wrapper classes all ofwhich ensures complete API accessibilityand functionality without any signifi-
76 Vol 27 No 5 login
cant performance retribution The gen-eral C++-to-Tcl mapper SWIG grantsthe necessary object and hierarchal abili-ties without the use of object-orientedTcl extensions The final API mappingtechniques address C++ features such asclasses overloaded methods enumera-tions and inheritance relations All areimplemented in a proof-of-concept thatmaps the complete C++ API of the VRSto Tcl and are showcased in a completeinteractive 3-D-map system
THE AGFL GRAMMAR WORK LAB
Cornelis HA Coster Erik VerbruggenUniversity of Nijmegen (KUN)
The growth and implementation of Nat-ural Language Processing (NLP) is thecornerstone of the continued evolutionand implementation of truly intelligentsearch machines and services In partdue to the growing collections of com-puter-stored human-readable docu-ments in the public domain theimplementation of linguistic analysiswill become necessary for desirable pre-cision and resolution Subsequentlysuch tools and linguistic resources mustbe released into the public domain andthey have done so with the release of theAGFL Grammar Work Lab under theGNU Public License as a tool for lin-guistic research and the development ofNLP-based applications
The AGFL (Affix Grammars over aFinite Lattice) Grammar Work Labmeshes context-free grammars withfinite set-valued features that are accept-able to a range of languages In com-puter science terms ldquoSyntax rules areprocedures with parameters and a non-deterministic executionrdquo The EnglishPhrases for Information Retrieval(EP4IR) released with the AGFL-GWLas a robust grammar of English is anAGFL-GWL generated English parserthat outputs ldquoHeadModifiedrdquo framesThe sentences ldquoCompanyX sponsoredthis conferencerdquo and ldquoThis conferencewas sponsored by CompanyXrdquo both gen-erate [CompanyX[sponsored confer-
ence]] The hope is that public availabil-ity of such tools will encourage furtherdevelopment of grammar and lexicalsoftware systems
SWILL A SIMPLE EMBEDDED WEB SERVER
LIBRARY
Sotiria Lampoudi David M BeazleyUniversity of Chicago
This paper won the FREENIX Best Stu-dent Paper award
SWILL (Simple Web Interface LinkLibrary) is a simple Web server in theform of a C library whose developmentwas motivated by a wish to give coolapplications an interface to the Web TheSWILL library provides a simple inter-face that can be efficiently implementedfor tasks that vary from flexible Web-based monitoring to software debuggingand diagnostics Though originallydesigned to be integrated with high-per-formance scientific simulation softwarethe interface is generic enough to allowfor unbounded uses SWILL is a single-threaded server relying upon non-blocking IO through the creation of atemporary server which services IOrequests SWILL does not provide SSL orcryptographic authentication but doeshave HTTP authentication abilities
A fantastic feature of SWILL is its sup-port for SPMD-style parallel applica-tions which utilize MPI provingvaluable for Beowulf clusters and largeparallel machines Another practicalapplication was the implementation ofSWILL in a modified Yalnix emulator byUniversity of Chicago Operating Sys-tems courses which utilized the addedYalnix functionality for OS developmentand debugging SWILL requires minimalmemory overhead and relies upon theHTTP10 protocol
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
CONQUEST BETTER PERFORMANCE THROUGH
A DISKPERSISTENT-RAM HYBRID FILE
SYSTEM
An-I A Wang Peter Reiher and GeraldJ Popek UCLA Geoffrey M Kuenning Harvey Mudd College
Motivated by the declining cost of per-sistent RAM the authors propose theConquest file system which stores allsmall files metadata executables andshared libraries in persistent RAM diskshold only the data content of remaininglarge files Compared to alternativessuch as caching and RAM file systemsConquest has the advantages of effi-ciency consistency and reliability at areduced cost Using popular bench-marks experiments show that Conquestincurs little overhead while achievingfaster performance Future workincludes designing mechanisms foradjusting file-size threshold dynamicallyand finding a better disk layout for largedata blocks
EXPLOITING GRAY-BOX KNOWLEDGE OF
BUFFER-CACHE MANAGEMENT
Nathan C Burnett John Bent AndreaC Arpaci-Dusseau and Remzi HArpaci-Dusseau University of Wisconsin Madison
Knowing what algorithm is used tomanage the buffer cache is very impor-tant for improving application perfor-mance However there is currently nointerface for finding this algorithm Thispaper introduces Dust a simple finger-printing tool that is able to identify thebuffer-cache replacement policy Dustautomatically identifies the cache sizeand replacement policy based on theconfiguring attributes of access ordersrecency frequency and long-term his-tory Through simulation Dust was ableto distinguish between a variety ofreplacement algorithm policies found inthe literature FIFO LRU LFU ClockSegmented FIFO 2Qand LRU-K Fur-ther experiments of fingerprinting realoperating system such as NetBSDLinux and Solaris show that Dust is ableto identify the hidden cache replacementalgorithm
69October 2002 login
C
ON
FERE
NC
ERE
PORT
SBy knowing the underlying cachereplacement policy a cache-aware Webserver can reschedule the requests on acached request-first policy to obtain per-formance improvement Experimentsshow that cache-aware schedulingimproves average response time and sys-tem throughput
OPERATING SYSTEMS
(AND DANCING BEARS)
Summarized by Matt Butner
THE JX OPERATING SYSTEM
Michael Golm Meik Felser ChristianWawersich and Juumlrgen Kleinoumlder University of Erlagen-Nuumlrnberg
The talk opened by emphasizing theneed for and the practicality of a func-tional Java OS Such implementation inthe JX OS attempts to mimic the recenttrend toward using highly abstractedlanguages in application development inorder to create an OS as functional andpowerful as one written in a lower-levellanguage
The JX is a micro-kernel solution thatuses separate JVMs for each entity of thekernel and in some cases for eachapplication The separated domains donot share objects and have no threadmigration and each domain implementsits own code The interesting part of thepresentation was the discussion of sys-tem-level programming with Java keyareas discussed were memory manage-ment and interrupt handlers Theauthors concluded by noting that theirtype-safe and modular system resultedin a robust system with great configura-tion flexibility and acceptable perfor-mance
DESIGN EVOLUTION OF THE EROS
SINGLE-LEVEL STORE
Jonathan S Shapiro Johns HopkinsUniversity Jonathan Adams Universityof Pennsylvania
This presentation outlined current char-acteristics of file systems and some oftheir least desirable characteristics Itwas based on the revival of the EROS
single-level-store approach for currentattempts to capitalize on system consis-tency and efficiency Their solution didnot relate directly to EROS but ratherto the design and use of the constructsand notions on which EROS is based Byextending the mapping of the memorysystem to include the disk systems arefurther able to ensure global persistencewithout regard for a disk structure Inexplaining the need for absolute systemconsistency and an environment whichby definition is not allowed to crash thegoal of having such an exhaustive designbecomes clear The cool part of thiswork is the availability of its design andarchitecture to the public
THINK A SOFTWARE FRAMEWORK FOR
COMPONENT-BASED OPERATING SYSTEM
KERNELS
Jean-Philippe Fassino France TeacuteleacutecomRampD Jean-Bernard Stefani INRIA JuliaLawall DIKU Gilles Muller INRIA
This presentation discussed the need forcomponent-based operating systemsand respective structures to ensure flexi-bility for arbitrary-sized systems Thinkprovides a binding model that mapsuniformed components for OS develop-ers and architects to follow ensuringconsistent implementations for an arbi-trary system size However the goal isnot to force developers into a predefinedkernel but to promote the use of certaincomponents in varied ways
Thinkrsquos concentration is primarily onembedded systems where short devel-opment time is necessary but is con-strained by rigorous needs and limitedresources The need to build flexible sys-tems in such an environment can becostly and implementation-specific butThink creates an environment sup-ported by the ability to dynamically loadtype-safe components This allows formore flexible systems that retain func-tionality because of Thinkrsquos ability tobind fine-grained components Bench-marks revealed that dedicated micro-kernels can show performanceimprovements in comparison to mono-
USENIX 2002
lithic kernels Notable future workincludes building components for low-end appliances and the development of areal-time OS component library
BUILDING SERVICES
Summarized by Pradipta De
NINJA A FRAMEWORK FOR NETWORK
SERVICES
J Robert von Behren Eric A BrewerNikita Borisov Michael Chen MattWelsh Josh MacDonald Jeremy Lauand David Culler University of Califor-nia Berkeley
Robert von Behren presented Ninja anongoing project that aims to provide aframework for building robust and scal-able Internet-based network serviceslike Web-hosting instant-messagingemail and file-sharing applicationsRobert drew attention to the difficultiesof writing cluster applications One hasto take care of data consistency andissues of concurrency and for robustapplications there are problems relatedto fault tolerance Ninja works as awrapper to relieve the user of theseproblems The goal of the project isbuilding network services that are scala-ble and highly available maintainingpersistent data providing gracefuldegradation and supporting online evo-lution
The use of clusters distinguishes thissetup from generic distributed systemsin terms of reliability and security aswell as providing a partition-free net-work The next important feature of thisproject is the new programming modelwhich is more restrictive than a generalprogramming model but still expressiveenough to write most of the applica-tions This model described as ldquosingleprogram multiple connectionrdquo usesintertask parallelism instead of multi-threaded concurrency and is character-ized by all nodes running the sameprogram with connections delegated tothe nodes by a centralized connectionmanager (CM) A CM takes care of hid-ing the details of mapping an externalconnection to an internal node Ninja
70 Vol 27 No 5 login
also uses a design called ldquostaged event-driven architecturerdquo where each serviceis broken down into a set of stages con-nected by event queues This architec-ture is suitable for a modular design andhelps in graceful degradation by adap-tive load shedding from the eventqueues
The presentation concluded with exam-ples of implementation and evaluationof a Web server and an email systemcalled NinjaMail to show the ease ofauthoring and the efficacy of using theNinja framework for service develop-ment
USING COHORT-SCHEDULING TO ENHANCE
SERVER PERFORMANCE
James R Larus and Michael ParkesMicrosoft Research
James Larus presented a new schedulingpolicy for increasing the throughput ofserver applications Cohort schedulingbatches execution of similar operationsarising in different server requests Theusual programming paradigm in han-dling server requests is to spawn multi-ple concurrent threads and switch fromone thread to another whenever a threadgets blocked for IO Since the threads ina server mostly execute unrelated piecesof code the locality of reference isreduced hence the effectiveness of dif-ferent caching mechanisms One way tosolve this problem is to throw in morehardware But Larus presented a com-plementary view where the programbehavior is investigated and used toimprove the performance
The problem in this scheme is to iden-tify pieces of code that can be batchedtogether for processing One simple wayis to look at the next program countervalues and use them to club threadstogether However this talk presented anew programming abstraction ldquostagedcomputationrdquo which replaces the threadmodel with ldquostagesrdquo A stage is anabstraction for operations with similarbehavior and locality The StagedServerlibrary can be used for programming inthis model It is a collection of C++
classes that implement staged computa-tion and cohort scheduling on either auniprocessor or multiprocessor It canbe used to define stages
The presentation showed the experi-mental evaluation of the cohort schedul-ing over the thread-based model byimplementing two servers a Web serverwhich is mainly IO bound and a pub-lish-subscribe server which is mainlycompute bound The SURGE bench-mark was used for the first experimentand the Fabret workload for the secondThe results showed that cohort-schedul-ing-based implementation gave a betterthroughput than the thread-basedimplementation at high loads
NETWORK PERFORMANCE
Summarized by Xiaobo Fan
ETE PASSIVE END-TO-END INTERNET SERVICE
PERFORMANCE MONITORING
Yun Fu Amin Vahdat Duke UniversityLudmila Cherkasova Wenting Tang HPLabs
This paper won the Best Student Paperaward
Ludmila Cherkasova began by listingseveral questions most Web serviceproviders want answered in order toimprove service quality She reviewedthe difficulties of making accurate andefficient end-to-end Web service mea-surement and the shortcomings of cur-rently available approaches Theypropose a passive trace-based architec-ture called EtE to monitor Web serverperformance on behalf of end users
The first step is to collect network pack-ets passively The second module recon-structs all TCP connections and extractsHTTP transactions To reconstruct Webpage accesses they first build a knowl-edge base indexed by client IP and URLand then group objects to the relatedWeb pages they are embedded in Statis-tical analysis is used to handle non-matched objects EtE Monitor cangenerate three groups of metrics to mea-sure Web service performance responsetime Web caching and page abortion
To demonstrate the benefits of EtE mon-itor Cherkasova talked about two casestudies and based on the calculatedmetrics gave some insightful explana-tions about whatrsquos happening behind the variations of Web performanceThrough validation they claim theirapproach provides a very close approxi-mation to the real scenario
THE PERFORMANCE OF REMOTE DISPLAY
MECHANISMS FOR THIN-CLIENT COMPUTING
S Jae Yang Jason Nieh Matt SelskyNikhil Tiwari Columbia University
Noting the trend toward thin-clientcomputing the authors compared dif-ferent techniques and design choices inmeasuring the performance of six popu-lar thin-client platforms ndash CitrixMetaFrame Microsoft Terminal Ser-vices Sun Ray Tarantella VNC and XAfter pointing out several challenges inbenchmarking thin clients Yang pro-posed slow-motion benchmarking toachieve non-invasive packet monitoringand consistent visual quality Basicallythey insert delays between separatevisual events in the benchmark applica-tions of the server side so that theclientrsquos display update can catch up withthe serverrsquos processing speed The exper-iments are conducted on an emulatednetwork over a range of network band-widths
Their results show that thin clients canprovide good performance for Webapplications in LAN environments butonly some platforms performed well forvideo benchmark Pixel-based encodingmay achieve better performance andbandwidth efficiency than high-levelgraphics Display caching and compres-sion should be used with care
A MECHANISM FOR TCP-FRIENDLY
TRANSPORT-LEVEL PROTOCOL
COORDINATION
David E Ott and Ketan Mayer-PatelUniversity of North Carolina ChapelHill
A revised transport-level protocol opti-mized for cluster-to-cluster (C-to-C)
71October 2002 login
C
ON
FERE
NC
ERE
PORT
Sapplications is what David Ott tries toexplain in his talk A C-to-C applicationclass is identified as one set of processescommunicating to another set ofprocesses across a common Internetpath The fundamental problem with C-to-C applications is how to coordinateall C-to-C communication flows so thatthey share a consistent view of the com-mon C-to-C network adapt to changingnetwork conditions and cooperate tomeet specific requirements Aggregationpoints (AP) are placed at the first andlast hop of the common data path toprobe network conditions (latencybandwidth loss rate etc) To carry andtransfer this information a new protocolndash Coordination Protocol (CP) ndash isinserted between the network layer (IP)and the transport layer (TCP UCP etc)Ott illustrated how the CP header isupdated and used when packets origi-nate from source and traverse throughlocal and remote AP to arrive at theirdestination and how AP maintains aper-cluster state table and detects net-work conditions Through simulationresults this coordination mechanismappears effective in sharing commonnetwork resources among C-to-C com-munication flows
STORAGE SYSTEMS
Summarized by Praveen Yalagandula
MY CACHE OR YOURS MAKING STORAGE
MORE EXCLUSIVE
Theodore M Wong Carnegie MellonUniversity John Wilkes HP Labs
Theodore Wong explained the ineffi-ciency of current ldquoinclusiverdquo cachingschemes in storage area networks ndashwhen a client accesses a block the blockis read from the disk and is cached atboth the disk array cache and at theclientrsquos cache Then he presented theconcept of ldquoexclusiverdquo caching where ablock accessed by a client is only cachedin that clientrsquos cache On eviction fromthe clientrsquos cache they have come upwith a DEMOTE operation to move thedata block to the tail of the array cache
The new exclusion caching schemes areevaluated on both single-client and mul-tiple-client systems The single-clientresults are presented for two differenttypes of synthetic workloads Random(transaction-processing type workloads)and Zipf (typical Web workloads) Theexclusive policy was quite effective inachieving higher hit rates and lower readlatencies They also showed a 22 timeshit-rate improvement over inclusivetechniques in the case of a real-lifeworkload httpd with a single-client set-ting
The DEMOTE scheme also performedwell in the case of multiple-client sys-tems when the data accessed by clients isdisjointed In the case of shared dataworkloads this scheme performed worsethan the inclusive schemes Theodorethen presented an adaptive exclusivecaching scheme ndash a block accessed by aclient is placed at the tail in the clientrsquoscache and also in the array cache at anappropriate place determined by thepopularity of the block The popularityof the block is measured by maintaininga ghost cache to accumulate the numberof times each block is accessed This newadaptive scheme has achieved a maxi-mum of 52 speedup in the meanlatency in the experiments with real-lifeworkloads
For more information visit httpwwwcscmuedu~tmwongresearch and httpwwwhplhpcomresearchitccslssp
BRIDGING THE INFORMATION GAP IN
STORAGE PROTOCOL STACKS
Timothy E Denehy Andrea C Arpaci-Dusseau Remzi H Arpaci-DusseauUniversity of Wisconsin Madison
Currently there is a huge informationgap between storage systems and file sys-tems The interface exposed by storagesystems to file systems based on blocksand providing only readwrite interfacesis very narrow This leads to poor per-formance as a whole because of dupli-cated functionality in both systems and
USENIX 2002
reduced functionality resulting fromstorage systemsrsquo lack of file information
The speaker presented two enhance-ments ExRaid an exposed RAID andILFS Informed Log-Structured File Sys-tem ExRAID is an enhancement to theblock-based RAID storage system thatexposes the following three types ofinformation to the file system (1)regions ndash contiguous portions of theaddress space comprising one or multi-ple disks (2) performance informationabout queue lengths and throughput ofthe regions revealing disk heterogeneityto the file systems and (3) failure infor-mation ndash dynamically updated informa-tion conveying the number of tolerablefailures in each region
ILFS allows online incremental expan-sion of the storage space performsdynamic parallelism using ExRAIDrsquosperformance information to performwell on heterogeneous storage systemsand provides a range of different mecha-nisms with different granularities forredundancy of files using ExRAIDrsquosregion failure characteristics These newfeatures are added to LFS with only a19 increase in the code size
More informationhttpwwwcswisceduwind
MAXIMIZING THROUGHPUT IN REPLICATED
DISK STRIPING OF VARIABLE
BIT-RATE STREAMS
Stergios V Anastasiadis Duke Univer-sity Kenneth C Sevcik and MichaelStumm University of Toronto
There is an increasing demand for thecontinuous real-time streaming ofmedia files Disk striping is a commontechnique used for supporting manyconnections on the media servers Buteven with 12 million hours of meantime between failures there will be morethan one disk failure per week with 7000disks Fault tolerance can be achieved byusing data replication and reservingextra bandwidth during normal opera-tion This work focused on supportingthe most common variable bit-ratestream formats (eg MPEG)
72 Vol 27 No 5 login
The authors used prototype mediaserver EXEDRA to implement and eval-uate different replication and bandwidthreservation schemes This system sup-ports a variable bit-rate scheme doesstride-based disk allocation and is capa-ble of supporting variable-grain diskstriping Two replication policies arepresented deterministic ndash data blocksare replicated in a round-robin fashionon the secondary replicas random ndashdata blocks are replicated on the ran-domly chosen secondary replicas Twobandwidth reservation techniques arealso presented mirroring reservation ndashdisk bandwidth is reserved for both primary and replicas of the media fileduring the playback and minimumreservation ndash a more efficient scheme inwhich bandwidth is reserved only for thesum of primary data access time and themaximum of the backup data accesstimes required in each round
Experimental results showed that deter-ministic replica placement is better thanrandom placement for small disks mini-mum disk bandwidth reservation istwice as good as mirroring in thethroughput achieved and fault tolerancecan be achieved with a minimal impacton the throughput
TOOLS
Summarized by Li Xiao
SIMPLE AND GENERAL STATISTICAL PROFILING
WITH PCT
Charles Blake and Steven Bauer MIT
This talk introduced a Profile CollectionToolkit (PCT) ndash a sampling-based CPUprofiling facility A novel aspect of PCTis that it allows sampling of semanticallyrich data such as function call stacksfunction parameters local or globalvariables CPU registers or other execu-tion contexts The design objectives ofPCT were driven by user needs and theinadequacies or inaccessibility of priorsystems
The rich data collection capability isachieved via a debugger-controller pro-gram dbct1 The talk then introducedthe profiling collector and data collec-
tion activation PCT file format optionsprovide flexibility and various ways tostore exact states PCT also has analysisoptions two examples were given ndashldquoSimplerdquo and ldquoMixed User and LinuxCoderdquo
The talk then went into how generalsampling helps in the following areas adebugger-controller state machineexample script fragments a simpleexample program and general valuereports in terms of call-path histogramsnumeric value histograms and data gen-erality Related work such as gprof andExpect was then summarized The con-tribution of this work is to provide aportable general value sampling toolThe limitations are that PCT does notsupport much of strip and count inac-curacies happen because of its statisticalnature Based on this limitation -g ispreferred over strip
Possible future directions included sup-porting more debuggers such as dbxand xdb script generators for otherkinds of traces more canned reports forgeneral values and a libgdb-basedlibrary sampler PCT is available athttppdoslcsmitedupct PCT runs onalmost any UNIX-like system
ENGINEERING A DIFFERENCING AND
COMPRESSION DATA FORMAT
David G Korn and Kiem Phong VoATampT Laboratories ndash Research
This talk began with an equation ldquoDif-ferencing + Compression = Delta Com-pressionrdquo Compression removesinformation redundancy in a data setExamples are gzip bzip compress andpack Differencing encodes differencesbetween two data sets Examples are diff -e fdelta bdiff and xdelta Deltacompression compresses a data set givenanother combining differencing andcompression and reduces to pure com-pression when therersquos no commonality
After presenting an overall scheme for adelta compressor and showing deltacompression performance this talk dis-cussed the encoding format of a newlydesigned Vcdiff for delta compression A
Vcdiff instruction code table consists of256 entries of each coding up to a pair ofinstructions and recodes I-byte indicesand any additional data
The talk then showed Vcdiff rsquos perfor-mance with Web data collected fromCNN compared with W3C standardGdiff Gdiff+gzip and diff+gzip whereGdiff was computed using Vcdiff deltainstructions The results of two experi-ments ldquoFirstrdquo and ldquoSuccessiverdquo werepresented In ldquoFirstrdquo each file is com-pressed against the first file collected inldquoSuccessiverdquo each file is compressedagainst the one in the previous hourThe diff+gzip did not work well becausediff was line-oriented Vcdiff performedfavorably compared with other formats
Vcdiff is part of the Vcodex packageVcodex is a platform for all commondata transformations delta compres-sion plain compression encryptiontranscoding (eg uuencode base64) Itis structured in three layers for maxi-mum usability Base library uses Dis-plines and Methods interfaces
The code for Vcdiff can be found athttpwwwresearchattcomswtoolsThey are moving Vcdiff to the IETFstandard as a comprehensive platformfor transforming data Please refer tohttpwwwietforginternet-draftdraft-korn-vcdiff06txt
WHERE IN THE NET
Summarized by Amit Purohit
A PRECISE AND EFFICIENT EVALUATION OF
THE PROXIMITY BETWEEN WEB CLIENTS AND
THEIR LOCAL DNS SERVERS
Zhuoqing Morley Mao University ofCalifornia Berkeley Charles D CranorFred Douglis Michael RabinovichOliver Spatscheck and Jia Wang ATampTLabs ndash Research
Content Distribution Networks (CDNs)attempt to improve Web performance bydelivering Web content to end usersfrom servers located at the edge of thenetwork When a Web client requestscontent the CDN dynamically chooses aserver to route the request to usually the
73October 2002 login
C
ON
FERE
NC
ERE
PORT
Sone that is nearest the client As CDNshave access only to the IP address of thelocal DNS server (LDNS) of the clientthe CDNrsquos authoritative DNS servermaps the clientrsquos LDNS to a geographicregion within a particular network andcombines that with network and serverload information to perform CDNserver selection
This method has two limitations First itis based on the implicit assumption thatclients are close to their LDNS This maynot always be valid Second a singlerequest from an LDNS can represent adiffering number of Web clients ndash calledthe hidden load factor Knowledge of thehidden load factor can be used toachieve better load distribution
The extent of the first limitation and itsimpact on the CDN server selection isdealt with To determine the associationsbetween clients and their LDNS a sim-ple non-intrusive and efficient map-ping technique was developed The datacollected was used to study the impact ofproximity on DNS-based server selec-tion using four different proximity met-rics (1) autonomous system (AS)clustering ndash observing whether a client isin the same AS as its LDNS ndash concludedthat LDNS is very good for coarse-grained server selection as 64 of theassociations belong to the same AS(2) network clustering ndash observingwhether a client is in the same network-aware cluster (NAC) ndash implied that DNSis less useful for finer-grained serverselection since only 16 of the clientand LDNS are in the same NAC (3)traceroute divergence ndash the length of thedivergent paths to the client and itsLDNS from a probe point using tracer-oute ndash implies that most clients aretopologically close to their LDNS asviewed from a randomly chosen probesite (4) round-trip time (RTT) correla-tion (some CDNs select severs based onRTT between the CDN server and theclientrsquos LDNS) examines the correlationbetween the message RTTs from a probepoint to the client and its local DNS
results imply this is a reasonable metricto use to avoid really distant servers
A study of the impact that client-LDNSassociations have on DNS-based serverselection concludes that knowing theclientrsquos IP address would allow moreaccurate server selection in a large num-ber of cases The optimality of the serverselection also depends on the serverdensity placement and selection algo-rithms
Further information can be found athttpwwweecsberkeleyedu~zmaomyresearchhtml or by contactingzmaoeecsberkeleyedu
GEOGRAPHIC PROPERTIES OF INTERNET
ROUTING
Lakshminarayanan SubramanianVenkata N Padmanabhan MicrosoftResearch Randy H Katz University ofCalifornia at Berkeley
Geographic information can provideinsights into the structure and function-ing of the Internet including interac-tions between different autonomoussystems by analyzing certain propertiesof Internet routing It can be used tomeasure and quantify certain routingproperties such as circuitous routinghot-potato routing and geographic faulttolerance
Traceroute has been used to gather therequired data and Geotrack tool hasbeen used to determine the location ofthe nodes along each network path Thisenables the computation of ldquolinearizeddistancesrdquo which is the sum of the geo-graphic distances between successivepairs of routers along the path
In order to measure the circuitousnessof a path a metric ldquodistance ratiordquo hasbeen defined as the ratio of the lin-earized distance of a path to the geo-graphic distance between the source anddestination of the path From the data ithas been observed that the circuitous-ness of a route depends on both geo-graphic and network location of the endhosts A large value of the distance ratioenables us to flag paths that are highly
USENIX 2002
circuitous possibly (though not neces-sarily) because of routing anomalies Itis also shown that the minimum delaybetween end hosts is strongly correlatedwith the linearized distance of the path
Geographic information can be used tostudy various aspects of wide-area Inter-net paths that traverse multiple ISPs Itwas found that end-to-end Internetpaths tend to be more circuitous thanintra-ISP paths the cause for this beingthe peering relationships between ISPsAlso paths traversing substantial dis-tances within two or more ISPs tend tobe more circuitous than paths largelytraversing only a single ISP Anotherfinding is that ISPs generally employhot-potato routing
Geographic information can also beused to capture the fact that two seem-ingly unrelated routers can be suscepti-ble to correlated failures By using thegeographic information of routers wecan construct a geographic topology ofan ISP Using this we can find the toler-ance of an ISPrsquos network to the total net-work failure in a geographic region
For further information contactlakmecsberkeleyedu
PROVIDING PROCESS ORIGIN INFORMATION
TO AID IN NETWORK TRACEBACK
Florian P Buchholz Purdue UniversityClay Shields Georgetown University
Network traceback is currently limitedbecause host audit systems do not main-tain enough information to matchincoming network traffic to outgoingnetwork traffic The talk presented analternative assigning origin informationto every process and logging it duringinteractive login creation
The current implementation concen-trates mainly on interactive sessions inwhich an event is logged when a newconnection is established using SSH orTelnet The method is effective andcould successfully determine steppingstones and reliably detect the source of aDDoS attack The speaker then talkedabout the related work done in this area
74 Vol 27 No 5 login
which led to discussion of the latestdevelopment in the area of ldquohostcausalityrdquo
The speaker ended with the limitationsand future directions of the frameworkThis framework could be extended andcould find many applications in the areaof security Current implementationdoesnrsquot take care of startup scripts andcron jobs but incorporating the origininformation in FS could solve this prob-lem In the current implementation log-ging is just implemented as a proof ofconcept It could be made safe in manyways and this could be another impor-tant aspect of future work
PROGRAMMING
Summarized by Amit Purohit
CYCLONE A SAFE DIALECT OF C
Trevor Jim ATampT Labs ndash Research GregMorrisett Dan Grossman MichaelHicks James Cheney Yanling WangCornell University
Cyclone is designed to provide protec-tion against attacks such as buffer over-flows format string attacks andmemory management errors The cur-rent structure of C allows programmersto write vulnerable programs Cycloneextends C so that it has the safety guar-antee of Java while keeping C syntaxtypes and semantics untouched
The Cyclone compiler performs staticanalysis of the source code and insertsruntime checks into the compiled out-put at places where the analysis cannotdetermine that the code execution willnot violate safety constraints Cycloneimposes many restrictions to preservesafety such as NULL checksThesechecks do not exist in normal C
The speaker then talked about somesample code written in Cyclone and howit tackles safety problems Porting anexisting C application to Cyclone ispretty easy with fewer than 10 changerequired The current implementationconcentrates more on safety than onperformance ndash hence the performance
penalty is substantial Cyclone was ableto find many lingering bugs The finalstep of the project will be to write acompiler to convert a normal C programto an equivalent Cyclone program
COOPERATIVE TASK MANAGEMENT WITHOUT
MANUAL STACK MANAGEMENT
Atul Adya Jon Howell MarvinTheimer William J Bolosky John RDouceur Microsoft Research
The speaker described the definitionsand motivations behind the distinctconcepts of task management stackmanagement IO management conflictmanagement and data partitioningConventional concurrent programminguses preemptive task management andexploits the automatic stack manage-ment of a standard language In the sec-ond approach cooperative tasks areorganized as event handlers that yieldcontrol by returning control to the eventscheduler manually unrolling theirstacks In this project they have adopteda hybrid approach that makes it possiblefor both stack management styles tocoexist Thus programmers can codeassuming one or the other of these stackmanagement styles will operate depend-ing upon the application The speakeralso gave a detailed example of how touse their mechanism to switch betweenthe two styles
The talk then continued with the imple-mentation They were able to preemptmany subtle concurrency problems byusing cooperative task managementPaying a cost up-front to reduce a subtlerace condition proved to be a goodinvestment
Though the choice of task managementis fundamental the choice of stack man-agement can be left to individual tasteThis project enables use of any type ofstack management in conjunction withcooperative task management
IMPROVING WAIT-FREE ALGORITHMS FOR
INTERPROCESS COMMUNICATION IN
EMBEDDED REAL-TIME SYSTEMS
Hai Huang Padmanabhan Pillai KangG Shin University of Michigan
The main characteristic of the real-timetime-sensitive system is its pre-dictable response time But concurrencymanagement creates hurdles to achiev-ing this goal because of the use of locksto maintain consistency To solve thisproblem many wait-free algorithmshave been developed but these are typi-cally high-cost solutions By takingadvantage of the temporal characteris-tics of the system however the time andspace overhead can be reduced
The speaker presented an algorithm fortemporal concurrency control andapplied this technique to improve threewait-free algorithms A single-writermultiple-reader wait-free algorithm anda double-buffer algorithm were pro-posed
Using their transformation mechanismthey achieved an improvement of17ndash66 in ACET and a 14ndash70 reduc-tion in memory requirements for IPCalgorithms This mechanism is extensi-ble and can be applied to other non-blocking IPC algorithms as well Futurework involves reducing the synchroniz-ing overheads in more general IPC algo-rithms with multiple-writer semantics
MOBILITY
Summarized by Praveen Yalagandula
ROBUST POSITIONING ALGORITHMS FOR
DISTRIBUTED AD-HOC WIRELESS SENSOR
NETWORKS
Chris Savarese Jan Rabaey Universityof California Berkeley Koen Langen-doen Delft University of Technology
The Pico Radio Network comprisesmore than a hundred sensors monitorsand actuators equipped with wirelessconnectivity with the following proper-ties (1) no infrastructure (2) computa-tion in a distributed fashion instead ofcentralized computation (3) dynamictopology (4) limited radio range
75October 2002 login
C
ON
FERE
NC
ERE
PORT
S(5) nodes that can act as repeaters(6) devices of low cost and with lowpower and (7) sparse anchor nodes ndashnodes with GPS capability A positioningalgorithm determines the geographicalposition of each node in the network
In two-dimensional space each nodeneeds three reference positions to esti-mate its geographical position There aretwo problems that make positioning dif-ficult in the Pico Radio Network typesetting (1) a sparse anchor node prob-lem and (2) a range error problem
The two-phase approach that theauthors have taken in solving the prob-lem consists of (1) a hop-terrain algo-rithm ndash in this first phase each noderoughly guesses its location by the dis-tance calculated using multi-hops to theanchor points and (2) a refinementalgorithm ndash in this second phase eachnode uses its neighborsrsquo positions torefine its own position estimate Toguarantee the convergence thisapproach uses confidence metricsassigning a value of 10 for anchor nodesand 01 to start for other nodes andincreasing these with each iteration
A simulation tool OMNet++ was usedfor both phases and in various scenariosThe results show that they achievedposition errors of less than 33 in a sce-nario with 5 anchor nodes an averageconnectivity of 7 and 5 range mea-surement error
Guidelines for anchor node deploymentare high connectivity (gt10) a reason-able fraction of anchor nodes (gt 5)and for anchor placement coverededges
APPLICATION-SPECIFIC NETWORK
MANAGEMENT FOR ENERGY-AWARE
STREAMING OF POPULAR MULTIMEDIA
FORMATS
Surendar Chandra University of Geor-gia and Amin Vahdat Duke University
The main hindrance in supporting theincreasing demand for mobile multime-dia on PDA is the battery capacity of the
small devices The idle-time power con-sumption is almost the same as thereceive power consumption on typicalwireless network cards (1319mW vs1425mW) while the sleep state con-sumes far less power (177mW) To letthe system consume energy proportionalto the stream quality the network cardshould transition to sleep state aggres-sively between each packet
The speaker presented previous workwhere different multimedia streamswere studied for PDAs MS Media RealMedia and Quicktime It was found thatif the inter-packet gap is predictablethen huge savings are possible forexample about 80 savings in MSmedia streams with few packet lossesThe limitation of the IEEE 80211bpower-save mode comes into play whenthere are two or more nodes producinga delay between beacon time and whenthe node receivessends packets Thisbadly affects the streams since higherenergy is consumed while waiting
The authors propose traffic shaping forenergy conservation where this is doneby proxy such that packets arrive at reg-ular intervals This is achieved using alocal proxy in the access point and aclient-side proxy in the mobile hostSimulations show that traffic shapingreduces energy consumption and alsoreveals that the higher bandwidthstreams have a lower energy metric(mJkB)
More information is at httpgreenhousecsugaedu
CHARACTERIZING ALERT AND BROWSE SER-
VICES OF MOBILE CLIENTS
Atul Adya Paramvir Bahl and Lili QiuMicrosoft Research
Even though there is a dramatic increasein Internet access from wireless devicesthere are not many studies done oncharacterizing this traffic In this paperthe authors characterize the trafficobserved on the MSN Web site withboth notification and browse traces
USENIX 2002
Around 33 million browsing accessesand about 325 million notificationentries are present in the traces Threetypes of analyses are done for each oneof these two services content analysisconcerning the most popular contentcategories and their distribution popu-larity analysis or the popularity distri-bution of documents and user behavioranalysis
The analysis of the notification logsshows that document access rates followa Zipf-like distribution with most of theaccesses concentrated on a small num-ber of messages and the accesses exhibitgeographical locality ndash users from samelocality tend to receive similar notifica-tion content The browser log analysisshows that a smaller set of URLs areaccessed most times though the accesspattern does not fit any Zipf curve andthe highly accessed URLS remain stableA correlation study between the notifi-cation and browsing services shows thatwireless users have a moderate correla-tion of 012
FREENIX TRACK SESSIONS
BUILDING APPLICATIONS
Summarized by Matt Butner
INTERACTIVE 3-D GRAPHICS APPLICATIONS
FOR TCL
Oliver Kersting Juumlrgen Doumlllner HassoPlattner Institute for Software SystemsEngineering University of Potsdam
The integration of 3-D image renderingfunctionality into a scripting languagepermits interactive and animated 3-Ddevelopment and application withoutthe formalities and precision demandedby low-level CC++ graphics and visual-ization libraries The large and complexC++ API of the Virtual Rendering Sys-tem (VRS) can be combined with theconveniences of the Tcl scripting lan-guage The mapping of class interfaces isdone via an automated process and gen-erates respective wrapper classes all ofwhich ensures complete API accessibilityand functionality without any signifi-
76 Vol 27 No 5 login
cant performance retribution The gen-eral C++-to-Tcl mapper SWIG grantsthe necessary object and hierarchal abili-ties without the use of object-orientedTcl extensions The final API mappingtechniques address C++ features such asclasses overloaded methods enumera-tions and inheritance relations All areimplemented in a proof-of-concept thatmaps the complete C++ API of the VRSto Tcl and are showcased in a completeinteractive 3-D-map system
THE AGFL GRAMMAR WORK LAB
Cornelis HA Coster Erik VerbruggenUniversity of Nijmegen (KUN)
The growth and implementation of Nat-ural Language Processing (NLP) is thecornerstone of the continued evolutionand implementation of truly intelligentsearch machines and services In partdue to the growing collections of com-puter-stored human-readable docu-ments in the public domain theimplementation of linguistic analysiswill become necessary for desirable pre-cision and resolution Subsequentlysuch tools and linguistic resources mustbe released into the public domain andthey have done so with the release of theAGFL Grammar Work Lab under theGNU Public License as a tool for lin-guistic research and the development ofNLP-based applications
The AGFL (Affix Grammars over aFinite Lattice) Grammar Work Labmeshes context-free grammars withfinite set-valued features that are accept-able to a range of languages In com-puter science terms ldquoSyntax rules areprocedures with parameters and a non-deterministic executionrdquo The EnglishPhrases for Information Retrieval(EP4IR) released with the AGFL-GWLas a robust grammar of English is anAGFL-GWL generated English parserthat outputs ldquoHeadModifiedrdquo framesThe sentences ldquoCompanyX sponsoredthis conferencerdquo and ldquoThis conferencewas sponsored by CompanyXrdquo both gen-erate [CompanyX[sponsored confer-
ence]] The hope is that public availabil-ity of such tools will encourage furtherdevelopment of grammar and lexicalsoftware systems
SWILL A SIMPLE EMBEDDED WEB SERVER
LIBRARY
Sotiria Lampoudi David M BeazleyUniversity of Chicago
This paper won the FREENIX Best Stu-dent Paper award
SWILL (Simple Web Interface LinkLibrary) is a simple Web server in theform of a C library whose developmentwas motivated by a wish to give coolapplications an interface to the Web TheSWILL library provides a simple inter-face that can be efficiently implementedfor tasks that vary from flexible Web-based monitoring to software debuggingand diagnostics Though originallydesigned to be integrated with high-per-formance scientific simulation softwarethe interface is generic enough to allowfor unbounded uses SWILL is a single-threaded server relying upon non-blocking IO through the creation of atemporary server which services IOrequests SWILL does not provide SSL orcryptographic authentication but doeshave HTTP authentication abilities
A fantastic feature of SWILL is its sup-port for SPMD-style parallel applica-tions which utilize MPI provingvaluable for Beowulf clusters and largeparallel machines Another practicalapplication was the implementation ofSWILL in a modified Yalnix emulator byUniversity of Chicago Operating Sys-tems courses which utilized the addedYalnix functionality for OS developmentand debugging SWILL requires minimalmemory overhead and relies upon theHTTP10 protocol
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
lithic kernels Notable future workincludes building components for low-end appliances and the development of areal-time OS component library
BUILDING SERVICES
Summarized by Pradipta De
NINJA A FRAMEWORK FOR NETWORK
SERVICES
J Robert von Behren Eric A BrewerNikita Borisov Michael Chen MattWelsh Josh MacDonald Jeremy Lauand David Culler University of Califor-nia Berkeley
Robert von Behren presented Ninja anongoing project that aims to provide aframework for building robust and scal-able Internet-based network serviceslike Web-hosting instant-messagingemail and file-sharing applicationsRobert drew attention to the difficultiesof writing cluster applications One hasto take care of data consistency andissues of concurrency and for robustapplications there are problems relatedto fault tolerance Ninja works as awrapper to relieve the user of theseproblems The goal of the project isbuilding network services that are scala-ble and highly available maintainingpersistent data providing gracefuldegradation and supporting online evo-lution
The use of clusters distinguishes thissetup from generic distributed systemsin terms of reliability and security aswell as providing a partition-free net-work The next important feature of thisproject is the new programming modelwhich is more restrictive than a generalprogramming model but still expressiveenough to write most of the applica-tions This model described as ldquosingleprogram multiple connectionrdquo usesintertask parallelism instead of multi-threaded concurrency and is character-ized by all nodes running the sameprogram with connections delegated tothe nodes by a centralized connectionmanager (CM) A CM takes care of hid-ing the details of mapping an externalconnection to an internal node Ninja
70 Vol 27 No 5 login
also uses a design called ldquostaged event-driven architecturerdquo where each serviceis broken down into a set of stages con-nected by event queues This architec-ture is suitable for a modular design andhelps in graceful degradation by adap-tive load shedding from the eventqueues
The presentation concluded with exam-ples of implementation and evaluationof a Web server and an email systemcalled NinjaMail to show the ease ofauthoring and the efficacy of using theNinja framework for service develop-ment
USING COHORT-SCHEDULING TO ENHANCE
SERVER PERFORMANCE
James R Larus and Michael ParkesMicrosoft Research
James Larus presented a new schedulingpolicy for increasing the throughput ofserver applications Cohort schedulingbatches execution of similar operationsarising in different server requests Theusual programming paradigm in han-dling server requests is to spawn multi-ple concurrent threads and switch fromone thread to another whenever a threadgets blocked for IO Since the threads ina server mostly execute unrelated piecesof code the locality of reference isreduced hence the effectiveness of dif-ferent caching mechanisms One way tosolve this problem is to throw in morehardware But Larus presented a com-plementary view where the programbehavior is investigated and used toimprove the performance
The problem in this scheme is to iden-tify pieces of code that can be batchedtogether for processing One simple wayis to look at the next program countervalues and use them to club threadstogether However this talk presented anew programming abstraction ldquostagedcomputationrdquo which replaces the threadmodel with ldquostagesrdquo A stage is anabstraction for operations with similarbehavior and locality The StagedServerlibrary can be used for programming inthis model It is a collection of C++
classes that implement staged computa-tion and cohort scheduling on either auniprocessor or multiprocessor It canbe used to define stages
The presentation showed the experi-mental evaluation of the cohort schedul-ing over the thread-based model byimplementing two servers a Web serverwhich is mainly IO bound and a pub-lish-subscribe server which is mainlycompute bound The SURGE bench-mark was used for the first experimentand the Fabret workload for the secondThe results showed that cohort-schedul-ing-based implementation gave a betterthroughput than the thread-basedimplementation at high loads
NETWORK PERFORMANCE
Summarized by Xiaobo Fan
ETE PASSIVE END-TO-END INTERNET SERVICE
PERFORMANCE MONITORING
Yun Fu Amin Vahdat Duke UniversityLudmila Cherkasova Wenting Tang HPLabs
This paper won the Best Student Paperaward
Ludmila Cherkasova began by listingseveral questions most Web serviceproviders want answered in order toimprove service quality She reviewedthe difficulties of making accurate andefficient end-to-end Web service mea-surement and the shortcomings of cur-rently available approaches Theypropose a passive trace-based architec-ture called EtE to monitor Web serverperformance on behalf of end users
The first step is to collect network pack-ets passively The second module recon-structs all TCP connections and extractsHTTP transactions To reconstruct Webpage accesses they first build a knowl-edge base indexed by client IP and URLand then group objects to the relatedWeb pages they are embedded in Statis-tical analysis is used to handle non-matched objects EtE Monitor cangenerate three groups of metrics to mea-sure Web service performance responsetime Web caching and page abortion
To demonstrate the benefits of EtE mon-itor Cherkasova talked about two casestudies and based on the calculatedmetrics gave some insightful explana-tions about whatrsquos happening behind the variations of Web performanceThrough validation they claim theirapproach provides a very close approxi-mation to the real scenario
THE PERFORMANCE OF REMOTE DISPLAY
MECHANISMS FOR THIN-CLIENT COMPUTING
S Jae Yang Jason Nieh Matt SelskyNikhil Tiwari Columbia University
Noting the trend toward thin-clientcomputing the authors compared dif-ferent techniques and design choices inmeasuring the performance of six popu-lar thin-client platforms ndash CitrixMetaFrame Microsoft Terminal Ser-vices Sun Ray Tarantella VNC and XAfter pointing out several challenges inbenchmarking thin clients Yang pro-posed slow-motion benchmarking toachieve non-invasive packet monitoringand consistent visual quality Basicallythey insert delays between separatevisual events in the benchmark applica-tions of the server side so that theclientrsquos display update can catch up withthe serverrsquos processing speed The exper-iments are conducted on an emulatednetwork over a range of network band-widths
Their results show that thin clients canprovide good performance for Webapplications in LAN environments butonly some platforms performed well forvideo benchmark Pixel-based encodingmay achieve better performance andbandwidth efficiency than high-levelgraphics Display caching and compres-sion should be used with care
A MECHANISM FOR TCP-FRIENDLY
TRANSPORT-LEVEL PROTOCOL
COORDINATION
David E Ott and Ketan Mayer-PatelUniversity of North Carolina ChapelHill
A revised transport-level protocol opti-mized for cluster-to-cluster (C-to-C)
71October 2002 login
C
ON
FERE
NC
ERE
PORT
Sapplications is what David Ott tries toexplain in his talk A C-to-C applicationclass is identified as one set of processescommunicating to another set ofprocesses across a common Internetpath The fundamental problem with C-to-C applications is how to coordinateall C-to-C communication flows so thatthey share a consistent view of the com-mon C-to-C network adapt to changingnetwork conditions and cooperate tomeet specific requirements Aggregationpoints (AP) are placed at the first andlast hop of the common data path toprobe network conditions (latencybandwidth loss rate etc) To carry andtransfer this information a new protocolndash Coordination Protocol (CP) ndash isinserted between the network layer (IP)and the transport layer (TCP UCP etc)Ott illustrated how the CP header isupdated and used when packets origi-nate from source and traverse throughlocal and remote AP to arrive at theirdestination and how AP maintains aper-cluster state table and detects net-work conditions Through simulationresults this coordination mechanismappears effective in sharing commonnetwork resources among C-to-C com-munication flows
STORAGE SYSTEMS
Summarized by Praveen Yalagandula
MY CACHE OR YOURS MAKING STORAGE
MORE EXCLUSIVE
Theodore M Wong Carnegie MellonUniversity John Wilkes HP Labs
Theodore Wong explained the ineffi-ciency of current ldquoinclusiverdquo cachingschemes in storage area networks ndashwhen a client accesses a block the blockis read from the disk and is cached atboth the disk array cache and at theclientrsquos cache Then he presented theconcept of ldquoexclusiverdquo caching where ablock accessed by a client is only cachedin that clientrsquos cache On eviction fromthe clientrsquos cache they have come upwith a DEMOTE operation to move thedata block to the tail of the array cache
The new exclusion caching schemes areevaluated on both single-client and mul-tiple-client systems The single-clientresults are presented for two differenttypes of synthetic workloads Random(transaction-processing type workloads)and Zipf (typical Web workloads) Theexclusive policy was quite effective inachieving higher hit rates and lower readlatencies They also showed a 22 timeshit-rate improvement over inclusivetechniques in the case of a real-lifeworkload httpd with a single-client set-ting
The DEMOTE scheme also performedwell in the case of multiple-client sys-tems when the data accessed by clients isdisjointed In the case of shared dataworkloads this scheme performed worsethan the inclusive schemes Theodorethen presented an adaptive exclusivecaching scheme ndash a block accessed by aclient is placed at the tail in the clientrsquoscache and also in the array cache at anappropriate place determined by thepopularity of the block The popularityof the block is measured by maintaininga ghost cache to accumulate the numberof times each block is accessed This newadaptive scheme has achieved a maxi-mum of 52 speedup in the meanlatency in the experiments with real-lifeworkloads
For more information visit httpwwwcscmuedu~tmwongresearch and httpwwwhplhpcomresearchitccslssp
BRIDGING THE INFORMATION GAP IN
STORAGE PROTOCOL STACKS
Timothy E Denehy Andrea C Arpaci-Dusseau Remzi H Arpaci-DusseauUniversity of Wisconsin Madison
Currently there is a huge informationgap between storage systems and file sys-tems The interface exposed by storagesystems to file systems based on blocksand providing only readwrite interfacesis very narrow This leads to poor per-formance as a whole because of dupli-cated functionality in both systems and
USENIX 2002
reduced functionality resulting fromstorage systemsrsquo lack of file information
The speaker presented two enhance-ments ExRaid an exposed RAID andILFS Informed Log-Structured File Sys-tem ExRAID is an enhancement to theblock-based RAID storage system thatexposes the following three types ofinformation to the file system (1)regions ndash contiguous portions of theaddress space comprising one or multi-ple disks (2) performance informationabout queue lengths and throughput ofthe regions revealing disk heterogeneityto the file systems and (3) failure infor-mation ndash dynamically updated informa-tion conveying the number of tolerablefailures in each region
ILFS allows online incremental expan-sion of the storage space performsdynamic parallelism using ExRAIDrsquosperformance information to performwell on heterogeneous storage systemsand provides a range of different mecha-nisms with different granularities forredundancy of files using ExRAIDrsquosregion failure characteristics These newfeatures are added to LFS with only a19 increase in the code size
More informationhttpwwwcswisceduwind
MAXIMIZING THROUGHPUT IN REPLICATED
DISK STRIPING OF VARIABLE
BIT-RATE STREAMS
Stergios V Anastasiadis Duke Univer-sity Kenneth C Sevcik and MichaelStumm University of Toronto
There is an increasing demand for thecontinuous real-time streaming ofmedia files Disk striping is a commontechnique used for supporting manyconnections on the media servers Buteven with 12 million hours of meantime between failures there will be morethan one disk failure per week with 7000disks Fault tolerance can be achieved byusing data replication and reservingextra bandwidth during normal opera-tion This work focused on supportingthe most common variable bit-ratestream formats (eg MPEG)
72 Vol 27 No 5 login
The authors used prototype mediaserver EXEDRA to implement and eval-uate different replication and bandwidthreservation schemes This system sup-ports a variable bit-rate scheme doesstride-based disk allocation and is capa-ble of supporting variable-grain diskstriping Two replication policies arepresented deterministic ndash data blocksare replicated in a round-robin fashionon the secondary replicas random ndashdata blocks are replicated on the ran-domly chosen secondary replicas Twobandwidth reservation techniques arealso presented mirroring reservation ndashdisk bandwidth is reserved for both primary and replicas of the media fileduring the playback and minimumreservation ndash a more efficient scheme inwhich bandwidth is reserved only for thesum of primary data access time and themaximum of the backup data accesstimes required in each round
Experimental results showed that deter-ministic replica placement is better thanrandom placement for small disks mini-mum disk bandwidth reservation istwice as good as mirroring in thethroughput achieved and fault tolerancecan be achieved with a minimal impacton the throughput
TOOLS
Summarized by Li Xiao
SIMPLE AND GENERAL STATISTICAL PROFILING
WITH PCT
Charles Blake and Steven Bauer MIT
This talk introduced a Profile CollectionToolkit (PCT) ndash a sampling-based CPUprofiling facility A novel aspect of PCTis that it allows sampling of semanticallyrich data such as function call stacksfunction parameters local or globalvariables CPU registers or other execu-tion contexts The design objectives ofPCT were driven by user needs and theinadequacies or inaccessibility of priorsystems
The rich data collection capability isachieved via a debugger-controller pro-gram dbct1 The talk then introducedthe profiling collector and data collec-
tion activation PCT file format optionsprovide flexibility and various ways tostore exact states PCT also has analysisoptions two examples were given ndashldquoSimplerdquo and ldquoMixed User and LinuxCoderdquo
The talk then went into how generalsampling helps in the following areas adebugger-controller state machineexample script fragments a simpleexample program and general valuereports in terms of call-path histogramsnumeric value histograms and data gen-erality Related work such as gprof andExpect was then summarized The con-tribution of this work is to provide aportable general value sampling toolThe limitations are that PCT does notsupport much of strip and count inac-curacies happen because of its statisticalnature Based on this limitation -g ispreferred over strip
Possible future directions included sup-porting more debuggers such as dbxand xdb script generators for otherkinds of traces more canned reports forgeneral values and a libgdb-basedlibrary sampler PCT is available athttppdoslcsmitedupct PCT runs onalmost any UNIX-like system
ENGINEERING A DIFFERENCING AND
COMPRESSION DATA FORMAT
David G Korn and Kiem Phong VoATampT Laboratories ndash Research
This talk began with an equation ldquoDif-ferencing + Compression = Delta Com-pressionrdquo Compression removesinformation redundancy in a data setExamples are gzip bzip compress andpack Differencing encodes differencesbetween two data sets Examples are diff -e fdelta bdiff and xdelta Deltacompression compresses a data set givenanother combining differencing andcompression and reduces to pure com-pression when therersquos no commonality
After presenting an overall scheme for adelta compressor and showing deltacompression performance this talk dis-cussed the encoding format of a newlydesigned Vcdiff for delta compression A
Vcdiff instruction code table consists of256 entries of each coding up to a pair ofinstructions and recodes I-byte indicesand any additional data
The talk then showed Vcdiff rsquos perfor-mance with Web data collected fromCNN compared with W3C standardGdiff Gdiff+gzip and diff+gzip whereGdiff was computed using Vcdiff deltainstructions The results of two experi-ments ldquoFirstrdquo and ldquoSuccessiverdquo werepresented In ldquoFirstrdquo each file is com-pressed against the first file collected inldquoSuccessiverdquo each file is compressedagainst the one in the previous hourThe diff+gzip did not work well becausediff was line-oriented Vcdiff performedfavorably compared with other formats
Vcdiff is part of the Vcodex packageVcodex is a platform for all commondata transformations delta compres-sion plain compression encryptiontranscoding (eg uuencode base64) Itis structured in three layers for maxi-mum usability Base library uses Dis-plines and Methods interfaces
The code for Vcdiff can be found athttpwwwresearchattcomswtoolsThey are moving Vcdiff to the IETFstandard as a comprehensive platformfor transforming data Please refer tohttpwwwietforginternet-draftdraft-korn-vcdiff06txt
WHERE IN THE NET
Summarized by Amit Purohit
A PRECISE AND EFFICIENT EVALUATION OF
THE PROXIMITY BETWEEN WEB CLIENTS AND
THEIR LOCAL DNS SERVERS
Zhuoqing Morley Mao University ofCalifornia Berkeley Charles D CranorFred Douglis Michael RabinovichOliver Spatscheck and Jia Wang ATampTLabs ndash Research
Content Distribution Networks (CDNs)attempt to improve Web performance bydelivering Web content to end usersfrom servers located at the edge of thenetwork When a Web client requestscontent the CDN dynamically chooses aserver to route the request to usually the
73October 2002 login
C
ON
FERE
NC
ERE
PORT
Sone that is nearest the client As CDNshave access only to the IP address of thelocal DNS server (LDNS) of the clientthe CDNrsquos authoritative DNS servermaps the clientrsquos LDNS to a geographicregion within a particular network andcombines that with network and serverload information to perform CDNserver selection
This method has two limitations First itis based on the implicit assumption thatclients are close to their LDNS This maynot always be valid Second a singlerequest from an LDNS can represent adiffering number of Web clients ndash calledthe hidden load factor Knowledge of thehidden load factor can be used toachieve better load distribution
The extent of the first limitation and itsimpact on the CDN server selection isdealt with To determine the associationsbetween clients and their LDNS a sim-ple non-intrusive and efficient map-ping technique was developed The datacollected was used to study the impact ofproximity on DNS-based server selec-tion using four different proximity met-rics (1) autonomous system (AS)clustering ndash observing whether a client isin the same AS as its LDNS ndash concludedthat LDNS is very good for coarse-grained server selection as 64 of theassociations belong to the same AS(2) network clustering ndash observingwhether a client is in the same network-aware cluster (NAC) ndash implied that DNSis less useful for finer-grained serverselection since only 16 of the clientand LDNS are in the same NAC (3)traceroute divergence ndash the length of thedivergent paths to the client and itsLDNS from a probe point using tracer-oute ndash implies that most clients aretopologically close to their LDNS asviewed from a randomly chosen probesite (4) round-trip time (RTT) correla-tion (some CDNs select severs based onRTT between the CDN server and theclientrsquos LDNS) examines the correlationbetween the message RTTs from a probepoint to the client and its local DNS
results imply this is a reasonable metricto use to avoid really distant servers
A study of the impact that client-LDNSassociations have on DNS-based serverselection concludes that knowing theclientrsquos IP address would allow moreaccurate server selection in a large num-ber of cases The optimality of the serverselection also depends on the serverdensity placement and selection algo-rithms
Further information can be found athttpwwweecsberkeleyedu~zmaomyresearchhtml or by contactingzmaoeecsberkeleyedu
GEOGRAPHIC PROPERTIES OF INTERNET
ROUTING
Lakshminarayanan SubramanianVenkata N Padmanabhan MicrosoftResearch Randy H Katz University ofCalifornia at Berkeley
Geographic information can provideinsights into the structure and function-ing of the Internet including interac-tions between different autonomoussystems by analyzing certain propertiesof Internet routing It can be used tomeasure and quantify certain routingproperties such as circuitous routinghot-potato routing and geographic faulttolerance
Traceroute has been used to gather therequired data and Geotrack tool hasbeen used to determine the location ofthe nodes along each network path Thisenables the computation of ldquolinearizeddistancesrdquo which is the sum of the geo-graphic distances between successivepairs of routers along the path
In order to measure the circuitousnessof a path a metric ldquodistance ratiordquo hasbeen defined as the ratio of the lin-earized distance of a path to the geo-graphic distance between the source anddestination of the path From the data ithas been observed that the circuitous-ness of a route depends on both geo-graphic and network location of the endhosts A large value of the distance ratioenables us to flag paths that are highly
USENIX 2002
circuitous possibly (though not neces-sarily) because of routing anomalies Itis also shown that the minimum delaybetween end hosts is strongly correlatedwith the linearized distance of the path
Geographic information can be used tostudy various aspects of wide-area Inter-net paths that traverse multiple ISPs Itwas found that end-to-end Internetpaths tend to be more circuitous thanintra-ISP paths the cause for this beingthe peering relationships between ISPsAlso paths traversing substantial dis-tances within two or more ISPs tend tobe more circuitous than paths largelytraversing only a single ISP Anotherfinding is that ISPs generally employhot-potato routing
Geographic information can also beused to capture the fact that two seem-ingly unrelated routers can be suscepti-ble to correlated failures By using thegeographic information of routers wecan construct a geographic topology ofan ISP Using this we can find the toler-ance of an ISPrsquos network to the total net-work failure in a geographic region
For further information contactlakmecsberkeleyedu
PROVIDING PROCESS ORIGIN INFORMATION
TO AID IN NETWORK TRACEBACK
Florian P Buchholz Purdue UniversityClay Shields Georgetown University
Network traceback is currently limitedbecause host audit systems do not main-tain enough information to matchincoming network traffic to outgoingnetwork traffic The talk presented analternative assigning origin informationto every process and logging it duringinteractive login creation
The current implementation concen-trates mainly on interactive sessions inwhich an event is logged when a newconnection is established using SSH orTelnet The method is effective andcould successfully determine steppingstones and reliably detect the source of aDDoS attack The speaker then talkedabout the related work done in this area
74 Vol 27 No 5 login
which led to discussion of the latestdevelopment in the area of ldquohostcausalityrdquo
The speaker ended with the limitationsand future directions of the frameworkThis framework could be extended andcould find many applications in the areaof security Current implementationdoesnrsquot take care of startup scripts andcron jobs but incorporating the origininformation in FS could solve this prob-lem In the current implementation log-ging is just implemented as a proof ofconcept It could be made safe in manyways and this could be another impor-tant aspect of future work
PROGRAMMING
Summarized by Amit Purohit
CYCLONE A SAFE DIALECT OF C
Trevor Jim ATampT Labs ndash Research GregMorrisett Dan Grossman MichaelHicks James Cheney Yanling WangCornell University
Cyclone is designed to provide protec-tion against attacks such as buffer over-flows format string attacks andmemory management errors The cur-rent structure of C allows programmersto write vulnerable programs Cycloneextends C so that it has the safety guar-antee of Java while keeping C syntaxtypes and semantics untouched
The Cyclone compiler performs staticanalysis of the source code and insertsruntime checks into the compiled out-put at places where the analysis cannotdetermine that the code execution willnot violate safety constraints Cycloneimposes many restrictions to preservesafety such as NULL checksThesechecks do not exist in normal C
The speaker then talked about somesample code written in Cyclone and howit tackles safety problems Porting anexisting C application to Cyclone ispretty easy with fewer than 10 changerequired The current implementationconcentrates more on safety than onperformance ndash hence the performance
penalty is substantial Cyclone was ableto find many lingering bugs The finalstep of the project will be to write acompiler to convert a normal C programto an equivalent Cyclone program
COOPERATIVE TASK MANAGEMENT WITHOUT
MANUAL STACK MANAGEMENT
Atul Adya Jon Howell MarvinTheimer William J Bolosky John RDouceur Microsoft Research
The speaker described the definitionsand motivations behind the distinctconcepts of task management stackmanagement IO management conflictmanagement and data partitioningConventional concurrent programminguses preemptive task management andexploits the automatic stack manage-ment of a standard language In the sec-ond approach cooperative tasks areorganized as event handlers that yieldcontrol by returning control to the eventscheduler manually unrolling theirstacks In this project they have adopteda hybrid approach that makes it possiblefor both stack management styles tocoexist Thus programmers can codeassuming one or the other of these stackmanagement styles will operate depend-ing upon the application The speakeralso gave a detailed example of how touse their mechanism to switch betweenthe two styles
The talk then continued with the imple-mentation They were able to preemptmany subtle concurrency problems byusing cooperative task managementPaying a cost up-front to reduce a subtlerace condition proved to be a goodinvestment
Though the choice of task managementis fundamental the choice of stack man-agement can be left to individual tasteThis project enables use of any type ofstack management in conjunction withcooperative task management
IMPROVING WAIT-FREE ALGORITHMS FOR
INTERPROCESS COMMUNICATION IN
EMBEDDED REAL-TIME SYSTEMS
Hai Huang Padmanabhan Pillai KangG Shin University of Michigan
The main characteristic of the real-timetime-sensitive system is its pre-dictable response time But concurrencymanagement creates hurdles to achiev-ing this goal because of the use of locksto maintain consistency To solve thisproblem many wait-free algorithmshave been developed but these are typi-cally high-cost solutions By takingadvantage of the temporal characteris-tics of the system however the time andspace overhead can be reduced
The speaker presented an algorithm fortemporal concurrency control andapplied this technique to improve threewait-free algorithms A single-writermultiple-reader wait-free algorithm anda double-buffer algorithm were pro-posed
Using their transformation mechanismthey achieved an improvement of17ndash66 in ACET and a 14ndash70 reduc-tion in memory requirements for IPCalgorithms This mechanism is extensi-ble and can be applied to other non-blocking IPC algorithms as well Futurework involves reducing the synchroniz-ing overheads in more general IPC algo-rithms with multiple-writer semantics
MOBILITY
Summarized by Praveen Yalagandula
ROBUST POSITIONING ALGORITHMS FOR
DISTRIBUTED AD-HOC WIRELESS SENSOR
NETWORKS
Chris Savarese Jan Rabaey Universityof California Berkeley Koen Langen-doen Delft University of Technology
The Pico Radio Network comprisesmore than a hundred sensors monitorsand actuators equipped with wirelessconnectivity with the following proper-ties (1) no infrastructure (2) computa-tion in a distributed fashion instead ofcentralized computation (3) dynamictopology (4) limited radio range
75October 2002 login
C
ON
FERE
NC
ERE
PORT
S(5) nodes that can act as repeaters(6) devices of low cost and with lowpower and (7) sparse anchor nodes ndashnodes with GPS capability A positioningalgorithm determines the geographicalposition of each node in the network
In two-dimensional space each nodeneeds three reference positions to esti-mate its geographical position There aretwo problems that make positioning dif-ficult in the Pico Radio Network typesetting (1) a sparse anchor node prob-lem and (2) a range error problem
The two-phase approach that theauthors have taken in solving the prob-lem consists of (1) a hop-terrain algo-rithm ndash in this first phase each noderoughly guesses its location by the dis-tance calculated using multi-hops to theanchor points and (2) a refinementalgorithm ndash in this second phase eachnode uses its neighborsrsquo positions torefine its own position estimate Toguarantee the convergence thisapproach uses confidence metricsassigning a value of 10 for anchor nodesand 01 to start for other nodes andincreasing these with each iteration
A simulation tool OMNet++ was usedfor both phases and in various scenariosThe results show that they achievedposition errors of less than 33 in a sce-nario with 5 anchor nodes an averageconnectivity of 7 and 5 range mea-surement error
Guidelines for anchor node deploymentare high connectivity (gt10) a reason-able fraction of anchor nodes (gt 5)and for anchor placement coverededges
APPLICATION-SPECIFIC NETWORK
MANAGEMENT FOR ENERGY-AWARE
STREAMING OF POPULAR MULTIMEDIA
FORMATS
Surendar Chandra University of Geor-gia and Amin Vahdat Duke University
The main hindrance in supporting theincreasing demand for mobile multime-dia on PDA is the battery capacity of the
small devices The idle-time power con-sumption is almost the same as thereceive power consumption on typicalwireless network cards (1319mW vs1425mW) while the sleep state con-sumes far less power (177mW) To letthe system consume energy proportionalto the stream quality the network cardshould transition to sleep state aggres-sively between each packet
The speaker presented previous workwhere different multimedia streamswere studied for PDAs MS Media RealMedia and Quicktime It was found thatif the inter-packet gap is predictablethen huge savings are possible forexample about 80 savings in MSmedia streams with few packet lossesThe limitation of the IEEE 80211bpower-save mode comes into play whenthere are two or more nodes producinga delay between beacon time and whenthe node receivessends packets Thisbadly affects the streams since higherenergy is consumed while waiting
The authors propose traffic shaping forenergy conservation where this is doneby proxy such that packets arrive at reg-ular intervals This is achieved using alocal proxy in the access point and aclient-side proxy in the mobile hostSimulations show that traffic shapingreduces energy consumption and alsoreveals that the higher bandwidthstreams have a lower energy metric(mJkB)
More information is at httpgreenhousecsugaedu
CHARACTERIZING ALERT AND BROWSE SER-
VICES OF MOBILE CLIENTS
Atul Adya Paramvir Bahl and Lili QiuMicrosoft Research
Even though there is a dramatic increasein Internet access from wireless devicesthere are not many studies done oncharacterizing this traffic In this paperthe authors characterize the trafficobserved on the MSN Web site withboth notification and browse traces
USENIX 2002
Around 33 million browsing accessesand about 325 million notificationentries are present in the traces Threetypes of analyses are done for each oneof these two services content analysisconcerning the most popular contentcategories and their distribution popu-larity analysis or the popularity distri-bution of documents and user behavioranalysis
The analysis of the notification logsshows that document access rates followa Zipf-like distribution with most of theaccesses concentrated on a small num-ber of messages and the accesses exhibitgeographical locality ndash users from samelocality tend to receive similar notifica-tion content The browser log analysisshows that a smaller set of URLs areaccessed most times though the accesspattern does not fit any Zipf curve andthe highly accessed URLS remain stableA correlation study between the notifi-cation and browsing services shows thatwireless users have a moderate correla-tion of 012
FREENIX TRACK SESSIONS
BUILDING APPLICATIONS
Summarized by Matt Butner
INTERACTIVE 3-D GRAPHICS APPLICATIONS
FOR TCL
Oliver Kersting Juumlrgen Doumlllner HassoPlattner Institute for Software SystemsEngineering University of Potsdam
The integration of 3-D image renderingfunctionality into a scripting languagepermits interactive and animated 3-Ddevelopment and application withoutthe formalities and precision demandedby low-level CC++ graphics and visual-ization libraries The large and complexC++ API of the Virtual Rendering Sys-tem (VRS) can be combined with theconveniences of the Tcl scripting lan-guage The mapping of class interfaces isdone via an automated process and gen-erates respective wrapper classes all ofwhich ensures complete API accessibilityand functionality without any signifi-
76 Vol 27 No 5 login
cant performance retribution The gen-eral C++-to-Tcl mapper SWIG grantsthe necessary object and hierarchal abili-ties without the use of object-orientedTcl extensions The final API mappingtechniques address C++ features such asclasses overloaded methods enumera-tions and inheritance relations All areimplemented in a proof-of-concept thatmaps the complete C++ API of the VRSto Tcl and are showcased in a completeinteractive 3-D-map system
THE AGFL GRAMMAR WORK LAB
Cornelis HA Coster Erik VerbruggenUniversity of Nijmegen (KUN)
The growth and implementation of Nat-ural Language Processing (NLP) is thecornerstone of the continued evolutionand implementation of truly intelligentsearch machines and services In partdue to the growing collections of com-puter-stored human-readable docu-ments in the public domain theimplementation of linguistic analysiswill become necessary for desirable pre-cision and resolution Subsequentlysuch tools and linguistic resources mustbe released into the public domain andthey have done so with the release of theAGFL Grammar Work Lab under theGNU Public License as a tool for lin-guistic research and the development ofNLP-based applications
The AGFL (Affix Grammars over aFinite Lattice) Grammar Work Labmeshes context-free grammars withfinite set-valued features that are accept-able to a range of languages In com-puter science terms ldquoSyntax rules areprocedures with parameters and a non-deterministic executionrdquo The EnglishPhrases for Information Retrieval(EP4IR) released with the AGFL-GWLas a robust grammar of English is anAGFL-GWL generated English parserthat outputs ldquoHeadModifiedrdquo framesThe sentences ldquoCompanyX sponsoredthis conferencerdquo and ldquoThis conferencewas sponsored by CompanyXrdquo both gen-erate [CompanyX[sponsored confer-
ence]] The hope is that public availabil-ity of such tools will encourage furtherdevelopment of grammar and lexicalsoftware systems
SWILL A SIMPLE EMBEDDED WEB SERVER
LIBRARY
Sotiria Lampoudi David M BeazleyUniversity of Chicago
This paper won the FREENIX Best Stu-dent Paper award
SWILL (Simple Web Interface LinkLibrary) is a simple Web server in theform of a C library whose developmentwas motivated by a wish to give coolapplications an interface to the Web TheSWILL library provides a simple inter-face that can be efficiently implementedfor tasks that vary from flexible Web-based monitoring to software debuggingand diagnostics Though originallydesigned to be integrated with high-per-formance scientific simulation softwarethe interface is generic enough to allowfor unbounded uses SWILL is a single-threaded server relying upon non-blocking IO through the creation of atemporary server which services IOrequests SWILL does not provide SSL orcryptographic authentication but doeshave HTTP authentication abilities
A fantastic feature of SWILL is its sup-port for SPMD-style parallel applica-tions which utilize MPI provingvaluable for Beowulf clusters and largeparallel machines Another practicalapplication was the implementation ofSWILL in a modified Yalnix emulator byUniversity of Chicago Operating Sys-tems courses which utilized the addedYalnix functionality for OS developmentand debugging SWILL requires minimalmemory overhead and relies upon theHTTP10 protocol
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
To demonstrate the benefits of EtE mon-itor Cherkasova talked about two casestudies and based on the calculatedmetrics gave some insightful explana-tions about whatrsquos happening behind the variations of Web performanceThrough validation they claim theirapproach provides a very close approxi-mation to the real scenario
THE PERFORMANCE OF REMOTE DISPLAY
MECHANISMS FOR THIN-CLIENT COMPUTING
S Jae Yang Jason Nieh Matt SelskyNikhil Tiwari Columbia University
Noting the trend toward thin-clientcomputing the authors compared dif-ferent techniques and design choices inmeasuring the performance of six popu-lar thin-client platforms ndash CitrixMetaFrame Microsoft Terminal Ser-vices Sun Ray Tarantella VNC and XAfter pointing out several challenges inbenchmarking thin clients Yang pro-posed slow-motion benchmarking toachieve non-invasive packet monitoringand consistent visual quality Basicallythey insert delays between separatevisual events in the benchmark applica-tions of the server side so that theclientrsquos display update can catch up withthe serverrsquos processing speed The exper-iments are conducted on an emulatednetwork over a range of network band-widths
Their results show that thin clients canprovide good performance for Webapplications in LAN environments butonly some platforms performed well forvideo benchmark Pixel-based encodingmay achieve better performance andbandwidth efficiency than high-levelgraphics Display caching and compres-sion should be used with care
A MECHANISM FOR TCP-FRIENDLY
TRANSPORT-LEVEL PROTOCOL
COORDINATION
David E Ott and Ketan Mayer-PatelUniversity of North Carolina ChapelHill
A revised transport-level protocol opti-mized for cluster-to-cluster (C-to-C)
71October 2002 login
C
ON
FERE
NC
ERE
PORT
Sapplications is what David Ott tries toexplain in his talk A C-to-C applicationclass is identified as one set of processescommunicating to another set ofprocesses across a common Internetpath The fundamental problem with C-to-C applications is how to coordinateall C-to-C communication flows so thatthey share a consistent view of the com-mon C-to-C network adapt to changingnetwork conditions and cooperate tomeet specific requirements Aggregationpoints (AP) are placed at the first andlast hop of the common data path toprobe network conditions (latencybandwidth loss rate etc) To carry andtransfer this information a new protocolndash Coordination Protocol (CP) ndash isinserted between the network layer (IP)and the transport layer (TCP UCP etc)Ott illustrated how the CP header isupdated and used when packets origi-nate from source and traverse throughlocal and remote AP to arrive at theirdestination and how AP maintains aper-cluster state table and detects net-work conditions Through simulationresults this coordination mechanismappears effective in sharing commonnetwork resources among C-to-C com-munication flows
STORAGE SYSTEMS
Summarized by Praveen Yalagandula
MY CACHE OR YOURS MAKING STORAGE
MORE EXCLUSIVE
Theodore M Wong Carnegie MellonUniversity John Wilkes HP Labs
Theodore Wong explained the ineffi-ciency of current ldquoinclusiverdquo cachingschemes in storage area networks ndashwhen a client accesses a block the blockis read from the disk and is cached atboth the disk array cache and at theclientrsquos cache Then he presented theconcept of ldquoexclusiverdquo caching where ablock accessed by a client is only cachedin that clientrsquos cache On eviction fromthe clientrsquos cache they have come upwith a DEMOTE operation to move thedata block to the tail of the array cache
The new exclusion caching schemes areevaluated on both single-client and mul-tiple-client systems The single-clientresults are presented for two differenttypes of synthetic workloads Random(transaction-processing type workloads)and Zipf (typical Web workloads) Theexclusive policy was quite effective inachieving higher hit rates and lower readlatencies They also showed a 22 timeshit-rate improvement over inclusivetechniques in the case of a real-lifeworkload httpd with a single-client set-ting
The DEMOTE scheme also performedwell in the case of multiple-client sys-tems when the data accessed by clients isdisjointed In the case of shared dataworkloads this scheme performed worsethan the inclusive schemes Theodorethen presented an adaptive exclusivecaching scheme ndash a block accessed by aclient is placed at the tail in the clientrsquoscache and also in the array cache at anappropriate place determined by thepopularity of the block The popularityof the block is measured by maintaininga ghost cache to accumulate the numberof times each block is accessed This newadaptive scheme has achieved a maxi-mum of 52 speedup in the meanlatency in the experiments with real-lifeworkloads
For more information visit httpwwwcscmuedu~tmwongresearch and httpwwwhplhpcomresearchitccslssp
BRIDGING THE INFORMATION GAP IN
STORAGE PROTOCOL STACKS
Timothy E Denehy Andrea C Arpaci-Dusseau Remzi H Arpaci-DusseauUniversity of Wisconsin Madison
Currently there is a huge informationgap between storage systems and file sys-tems The interface exposed by storagesystems to file systems based on blocksand providing only readwrite interfacesis very narrow This leads to poor per-formance as a whole because of dupli-cated functionality in both systems and
USENIX 2002
reduced functionality resulting fromstorage systemsrsquo lack of file information
The speaker presented two enhance-ments ExRaid an exposed RAID andILFS Informed Log-Structured File Sys-tem ExRAID is an enhancement to theblock-based RAID storage system thatexposes the following three types ofinformation to the file system (1)regions ndash contiguous portions of theaddress space comprising one or multi-ple disks (2) performance informationabout queue lengths and throughput ofthe regions revealing disk heterogeneityto the file systems and (3) failure infor-mation ndash dynamically updated informa-tion conveying the number of tolerablefailures in each region
ILFS allows online incremental expan-sion of the storage space performsdynamic parallelism using ExRAIDrsquosperformance information to performwell on heterogeneous storage systemsand provides a range of different mecha-nisms with different granularities forredundancy of files using ExRAIDrsquosregion failure characteristics These newfeatures are added to LFS with only a19 increase in the code size
More informationhttpwwwcswisceduwind
MAXIMIZING THROUGHPUT IN REPLICATED
DISK STRIPING OF VARIABLE
BIT-RATE STREAMS
Stergios V Anastasiadis Duke Univer-sity Kenneth C Sevcik and MichaelStumm University of Toronto
There is an increasing demand for thecontinuous real-time streaming ofmedia files Disk striping is a commontechnique used for supporting manyconnections on the media servers Buteven with 12 million hours of meantime between failures there will be morethan one disk failure per week with 7000disks Fault tolerance can be achieved byusing data replication and reservingextra bandwidth during normal opera-tion This work focused on supportingthe most common variable bit-ratestream formats (eg MPEG)
72 Vol 27 No 5 login
The authors used prototype mediaserver EXEDRA to implement and eval-uate different replication and bandwidthreservation schemes This system sup-ports a variable bit-rate scheme doesstride-based disk allocation and is capa-ble of supporting variable-grain diskstriping Two replication policies arepresented deterministic ndash data blocksare replicated in a round-robin fashionon the secondary replicas random ndashdata blocks are replicated on the ran-domly chosen secondary replicas Twobandwidth reservation techniques arealso presented mirroring reservation ndashdisk bandwidth is reserved for both primary and replicas of the media fileduring the playback and minimumreservation ndash a more efficient scheme inwhich bandwidth is reserved only for thesum of primary data access time and themaximum of the backup data accesstimes required in each round
Experimental results showed that deter-ministic replica placement is better thanrandom placement for small disks mini-mum disk bandwidth reservation istwice as good as mirroring in thethroughput achieved and fault tolerancecan be achieved with a minimal impacton the throughput
TOOLS
Summarized by Li Xiao
SIMPLE AND GENERAL STATISTICAL PROFILING
WITH PCT
Charles Blake and Steven Bauer MIT
This talk introduced a Profile CollectionToolkit (PCT) ndash a sampling-based CPUprofiling facility A novel aspect of PCTis that it allows sampling of semanticallyrich data such as function call stacksfunction parameters local or globalvariables CPU registers or other execu-tion contexts The design objectives ofPCT were driven by user needs and theinadequacies or inaccessibility of priorsystems
The rich data collection capability isachieved via a debugger-controller pro-gram dbct1 The talk then introducedthe profiling collector and data collec-
tion activation PCT file format optionsprovide flexibility and various ways tostore exact states PCT also has analysisoptions two examples were given ndashldquoSimplerdquo and ldquoMixed User and LinuxCoderdquo
The talk then went into how generalsampling helps in the following areas adebugger-controller state machineexample script fragments a simpleexample program and general valuereports in terms of call-path histogramsnumeric value histograms and data gen-erality Related work such as gprof andExpect was then summarized The con-tribution of this work is to provide aportable general value sampling toolThe limitations are that PCT does notsupport much of strip and count inac-curacies happen because of its statisticalnature Based on this limitation -g ispreferred over strip
Possible future directions included sup-porting more debuggers such as dbxand xdb script generators for otherkinds of traces more canned reports forgeneral values and a libgdb-basedlibrary sampler PCT is available athttppdoslcsmitedupct PCT runs onalmost any UNIX-like system
ENGINEERING A DIFFERENCING AND
COMPRESSION DATA FORMAT
David G Korn and Kiem Phong VoATampT Laboratories ndash Research
This talk began with an equation ldquoDif-ferencing + Compression = Delta Com-pressionrdquo Compression removesinformation redundancy in a data setExamples are gzip bzip compress andpack Differencing encodes differencesbetween two data sets Examples are diff -e fdelta bdiff and xdelta Deltacompression compresses a data set givenanother combining differencing andcompression and reduces to pure com-pression when therersquos no commonality
After presenting an overall scheme for adelta compressor and showing deltacompression performance this talk dis-cussed the encoding format of a newlydesigned Vcdiff for delta compression A
Vcdiff instruction code table consists of256 entries of each coding up to a pair ofinstructions and recodes I-byte indicesand any additional data
The talk then showed Vcdiff rsquos perfor-mance with Web data collected fromCNN compared with W3C standardGdiff Gdiff+gzip and diff+gzip whereGdiff was computed using Vcdiff deltainstructions The results of two experi-ments ldquoFirstrdquo and ldquoSuccessiverdquo werepresented In ldquoFirstrdquo each file is com-pressed against the first file collected inldquoSuccessiverdquo each file is compressedagainst the one in the previous hourThe diff+gzip did not work well becausediff was line-oriented Vcdiff performedfavorably compared with other formats
Vcdiff is part of the Vcodex packageVcodex is a platform for all commondata transformations delta compres-sion plain compression encryptiontranscoding (eg uuencode base64) Itis structured in three layers for maxi-mum usability Base library uses Dis-plines and Methods interfaces
The code for Vcdiff can be found athttpwwwresearchattcomswtoolsThey are moving Vcdiff to the IETFstandard as a comprehensive platformfor transforming data Please refer tohttpwwwietforginternet-draftdraft-korn-vcdiff06txt
WHERE IN THE NET
Summarized by Amit Purohit
A PRECISE AND EFFICIENT EVALUATION OF
THE PROXIMITY BETWEEN WEB CLIENTS AND
THEIR LOCAL DNS SERVERS
Zhuoqing Morley Mao University ofCalifornia Berkeley Charles D CranorFred Douglis Michael RabinovichOliver Spatscheck and Jia Wang ATampTLabs ndash Research
Content Distribution Networks (CDNs)attempt to improve Web performance bydelivering Web content to end usersfrom servers located at the edge of thenetwork When a Web client requestscontent the CDN dynamically chooses aserver to route the request to usually the
73October 2002 login
C
ON
FERE
NC
ERE
PORT
Sone that is nearest the client As CDNshave access only to the IP address of thelocal DNS server (LDNS) of the clientthe CDNrsquos authoritative DNS servermaps the clientrsquos LDNS to a geographicregion within a particular network andcombines that with network and serverload information to perform CDNserver selection
This method has two limitations First itis based on the implicit assumption thatclients are close to their LDNS This maynot always be valid Second a singlerequest from an LDNS can represent adiffering number of Web clients ndash calledthe hidden load factor Knowledge of thehidden load factor can be used toachieve better load distribution
The extent of the first limitation and itsimpact on the CDN server selection isdealt with To determine the associationsbetween clients and their LDNS a sim-ple non-intrusive and efficient map-ping technique was developed The datacollected was used to study the impact ofproximity on DNS-based server selec-tion using four different proximity met-rics (1) autonomous system (AS)clustering ndash observing whether a client isin the same AS as its LDNS ndash concludedthat LDNS is very good for coarse-grained server selection as 64 of theassociations belong to the same AS(2) network clustering ndash observingwhether a client is in the same network-aware cluster (NAC) ndash implied that DNSis less useful for finer-grained serverselection since only 16 of the clientand LDNS are in the same NAC (3)traceroute divergence ndash the length of thedivergent paths to the client and itsLDNS from a probe point using tracer-oute ndash implies that most clients aretopologically close to their LDNS asviewed from a randomly chosen probesite (4) round-trip time (RTT) correla-tion (some CDNs select severs based onRTT between the CDN server and theclientrsquos LDNS) examines the correlationbetween the message RTTs from a probepoint to the client and its local DNS
results imply this is a reasonable metricto use to avoid really distant servers
A study of the impact that client-LDNSassociations have on DNS-based serverselection concludes that knowing theclientrsquos IP address would allow moreaccurate server selection in a large num-ber of cases The optimality of the serverselection also depends on the serverdensity placement and selection algo-rithms
Further information can be found athttpwwweecsberkeleyedu~zmaomyresearchhtml or by contactingzmaoeecsberkeleyedu
GEOGRAPHIC PROPERTIES OF INTERNET
ROUTING
Lakshminarayanan SubramanianVenkata N Padmanabhan MicrosoftResearch Randy H Katz University ofCalifornia at Berkeley
Geographic information can provideinsights into the structure and function-ing of the Internet including interac-tions between different autonomoussystems by analyzing certain propertiesof Internet routing It can be used tomeasure and quantify certain routingproperties such as circuitous routinghot-potato routing and geographic faulttolerance
Traceroute has been used to gather therequired data and Geotrack tool hasbeen used to determine the location ofthe nodes along each network path Thisenables the computation of ldquolinearizeddistancesrdquo which is the sum of the geo-graphic distances between successivepairs of routers along the path
In order to measure the circuitousnessof a path a metric ldquodistance ratiordquo hasbeen defined as the ratio of the lin-earized distance of a path to the geo-graphic distance between the source anddestination of the path From the data ithas been observed that the circuitous-ness of a route depends on both geo-graphic and network location of the endhosts A large value of the distance ratioenables us to flag paths that are highly
USENIX 2002
circuitous possibly (though not neces-sarily) because of routing anomalies Itis also shown that the minimum delaybetween end hosts is strongly correlatedwith the linearized distance of the path
Geographic information can be used tostudy various aspects of wide-area Inter-net paths that traverse multiple ISPs Itwas found that end-to-end Internetpaths tend to be more circuitous thanintra-ISP paths the cause for this beingthe peering relationships between ISPsAlso paths traversing substantial dis-tances within two or more ISPs tend tobe more circuitous than paths largelytraversing only a single ISP Anotherfinding is that ISPs generally employhot-potato routing
Geographic information can also beused to capture the fact that two seem-ingly unrelated routers can be suscepti-ble to correlated failures By using thegeographic information of routers wecan construct a geographic topology ofan ISP Using this we can find the toler-ance of an ISPrsquos network to the total net-work failure in a geographic region
For further information contactlakmecsberkeleyedu
PROVIDING PROCESS ORIGIN INFORMATION
TO AID IN NETWORK TRACEBACK
Florian P Buchholz Purdue UniversityClay Shields Georgetown University
Network traceback is currently limitedbecause host audit systems do not main-tain enough information to matchincoming network traffic to outgoingnetwork traffic The talk presented analternative assigning origin informationto every process and logging it duringinteractive login creation
The current implementation concen-trates mainly on interactive sessions inwhich an event is logged when a newconnection is established using SSH orTelnet The method is effective andcould successfully determine steppingstones and reliably detect the source of aDDoS attack The speaker then talkedabout the related work done in this area
74 Vol 27 No 5 login
which led to discussion of the latestdevelopment in the area of ldquohostcausalityrdquo
The speaker ended with the limitationsand future directions of the frameworkThis framework could be extended andcould find many applications in the areaof security Current implementationdoesnrsquot take care of startup scripts andcron jobs but incorporating the origininformation in FS could solve this prob-lem In the current implementation log-ging is just implemented as a proof ofconcept It could be made safe in manyways and this could be another impor-tant aspect of future work
PROGRAMMING
Summarized by Amit Purohit
CYCLONE A SAFE DIALECT OF C
Trevor Jim ATampT Labs ndash Research GregMorrisett Dan Grossman MichaelHicks James Cheney Yanling WangCornell University
Cyclone is designed to provide protec-tion against attacks such as buffer over-flows format string attacks andmemory management errors The cur-rent structure of C allows programmersto write vulnerable programs Cycloneextends C so that it has the safety guar-antee of Java while keeping C syntaxtypes and semantics untouched
The Cyclone compiler performs staticanalysis of the source code and insertsruntime checks into the compiled out-put at places where the analysis cannotdetermine that the code execution willnot violate safety constraints Cycloneimposes many restrictions to preservesafety such as NULL checksThesechecks do not exist in normal C
The speaker then talked about somesample code written in Cyclone and howit tackles safety problems Porting anexisting C application to Cyclone ispretty easy with fewer than 10 changerequired The current implementationconcentrates more on safety than onperformance ndash hence the performance
penalty is substantial Cyclone was ableto find many lingering bugs The finalstep of the project will be to write acompiler to convert a normal C programto an equivalent Cyclone program
COOPERATIVE TASK MANAGEMENT WITHOUT
MANUAL STACK MANAGEMENT
Atul Adya Jon Howell MarvinTheimer William J Bolosky John RDouceur Microsoft Research
The speaker described the definitionsand motivations behind the distinctconcepts of task management stackmanagement IO management conflictmanagement and data partitioningConventional concurrent programminguses preemptive task management andexploits the automatic stack manage-ment of a standard language In the sec-ond approach cooperative tasks areorganized as event handlers that yieldcontrol by returning control to the eventscheduler manually unrolling theirstacks In this project they have adopteda hybrid approach that makes it possiblefor both stack management styles tocoexist Thus programmers can codeassuming one or the other of these stackmanagement styles will operate depend-ing upon the application The speakeralso gave a detailed example of how touse their mechanism to switch betweenthe two styles
The talk then continued with the imple-mentation They were able to preemptmany subtle concurrency problems byusing cooperative task managementPaying a cost up-front to reduce a subtlerace condition proved to be a goodinvestment
Though the choice of task managementis fundamental the choice of stack man-agement can be left to individual tasteThis project enables use of any type ofstack management in conjunction withcooperative task management
IMPROVING WAIT-FREE ALGORITHMS FOR
INTERPROCESS COMMUNICATION IN
EMBEDDED REAL-TIME SYSTEMS
Hai Huang Padmanabhan Pillai KangG Shin University of Michigan
The main characteristic of the real-timetime-sensitive system is its pre-dictable response time But concurrencymanagement creates hurdles to achiev-ing this goal because of the use of locksto maintain consistency To solve thisproblem many wait-free algorithmshave been developed but these are typi-cally high-cost solutions By takingadvantage of the temporal characteris-tics of the system however the time andspace overhead can be reduced
The speaker presented an algorithm fortemporal concurrency control andapplied this technique to improve threewait-free algorithms A single-writermultiple-reader wait-free algorithm anda double-buffer algorithm were pro-posed
Using their transformation mechanismthey achieved an improvement of17ndash66 in ACET and a 14ndash70 reduc-tion in memory requirements for IPCalgorithms This mechanism is extensi-ble and can be applied to other non-blocking IPC algorithms as well Futurework involves reducing the synchroniz-ing overheads in more general IPC algo-rithms with multiple-writer semantics
MOBILITY
Summarized by Praveen Yalagandula
ROBUST POSITIONING ALGORITHMS FOR
DISTRIBUTED AD-HOC WIRELESS SENSOR
NETWORKS
Chris Savarese Jan Rabaey Universityof California Berkeley Koen Langen-doen Delft University of Technology
The Pico Radio Network comprisesmore than a hundred sensors monitorsand actuators equipped with wirelessconnectivity with the following proper-ties (1) no infrastructure (2) computa-tion in a distributed fashion instead ofcentralized computation (3) dynamictopology (4) limited radio range
75October 2002 login
C
ON
FERE
NC
ERE
PORT
S(5) nodes that can act as repeaters(6) devices of low cost and with lowpower and (7) sparse anchor nodes ndashnodes with GPS capability A positioningalgorithm determines the geographicalposition of each node in the network
In two-dimensional space each nodeneeds three reference positions to esti-mate its geographical position There aretwo problems that make positioning dif-ficult in the Pico Radio Network typesetting (1) a sparse anchor node prob-lem and (2) a range error problem
The two-phase approach that theauthors have taken in solving the prob-lem consists of (1) a hop-terrain algo-rithm ndash in this first phase each noderoughly guesses its location by the dis-tance calculated using multi-hops to theanchor points and (2) a refinementalgorithm ndash in this second phase eachnode uses its neighborsrsquo positions torefine its own position estimate Toguarantee the convergence thisapproach uses confidence metricsassigning a value of 10 for anchor nodesand 01 to start for other nodes andincreasing these with each iteration
A simulation tool OMNet++ was usedfor both phases and in various scenariosThe results show that they achievedposition errors of less than 33 in a sce-nario with 5 anchor nodes an averageconnectivity of 7 and 5 range mea-surement error
Guidelines for anchor node deploymentare high connectivity (gt10) a reason-able fraction of anchor nodes (gt 5)and for anchor placement coverededges
APPLICATION-SPECIFIC NETWORK
MANAGEMENT FOR ENERGY-AWARE
STREAMING OF POPULAR MULTIMEDIA
FORMATS
Surendar Chandra University of Geor-gia and Amin Vahdat Duke University
The main hindrance in supporting theincreasing demand for mobile multime-dia on PDA is the battery capacity of the
small devices The idle-time power con-sumption is almost the same as thereceive power consumption on typicalwireless network cards (1319mW vs1425mW) while the sleep state con-sumes far less power (177mW) To letthe system consume energy proportionalto the stream quality the network cardshould transition to sleep state aggres-sively between each packet
The speaker presented previous workwhere different multimedia streamswere studied for PDAs MS Media RealMedia and Quicktime It was found thatif the inter-packet gap is predictablethen huge savings are possible forexample about 80 savings in MSmedia streams with few packet lossesThe limitation of the IEEE 80211bpower-save mode comes into play whenthere are two or more nodes producinga delay between beacon time and whenthe node receivessends packets Thisbadly affects the streams since higherenergy is consumed while waiting
The authors propose traffic shaping forenergy conservation where this is doneby proxy such that packets arrive at reg-ular intervals This is achieved using alocal proxy in the access point and aclient-side proxy in the mobile hostSimulations show that traffic shapingreduces energy consumption and alsoreveals that the higher bandwidthstreams have a lower energy metric(mJkB)
More information is at httpgreenhousecsugaedu
CHARACTERIZING ALERT AND BROWSE SER-
VICES OF MOBILE CLIENTS
Atul Adya Paramvir Bahl and Lili QiuMicrosoft Research
Even though there is a dramatic increasein Internet access from wireless devicesthere are not many studies done oncharacterizing this traffic In this paperthe authors characterize the trafficobserved on the MSN Web site withboth notification and browse traces
USENIX 2002
Around 33 million browsing accessesand about 325 million notificationentries are present in the traces Threetypes of analyses are done for each oneof these two services content analysisconcerning the most popular contentcategories and their distribution popu-larity analysis or the popularity distri-bution of documents and user behavioranalysis
The analysis of the notification logsshows that document access rates followa Zipf-like distribution with most of theaccesses concentrated on a small num-ber of messages and the accesses exhibitgeographical locality ndash users from samelocality tend to receive similar notifica-tion content The browser log analysisshows that a smaller set of URLs areaccessed most times though the accesspattern does not fit any Zipf curve andthe highly accessed URLS remain stableA correlation study between the notifi-cation and browsing services shows thatwireless users have a moderate correla-tion of 012
FREENIX TRACK SESSIONS
BUILDING APPLICATIONS
Summarized by Matt Butner
INTERACTIVE 3-D GRAPHICS APPLICATIONS
FOR TCL
Oliver Kersting Juumlrgen Doumlllner HassoPlattner Institute for Software SystemsEngineering University of Potsdam
The integration of 3-D image renderingfunctionality into a scripting languagepermits interactive and animated 3-Ddevelopment and application withoutthe formalities and precision demandedby low-level CC++ graphics and visual-ization libraries The large and complexC++ API of the Virtual Rendering Sys-tem (VRS) can be combined with theconveniences of the Tcl scripting lan-guage The mapping of class interfaces isdone via an automated process and gen-erates respective wrapper classes all ofwhich ensures complete API accessibilityand functionality without any signifi-
76 Vol 27 No 5 login
cant performance retribution The gen-eral C++-to-Tcl mapper SWIG grantsthe necessary object and hierarchal abili-ties without the use of object-orientedTcl extensions The final API mappingtechniques address C++ features such asclasses overloaded methods enumera-tions and inheritance relations All areimplemented in a proof-of-concept thatmaps the complete C++ API of the VRSto Tcl and are showcased in a completeinteractive 3-D-map system
THE AGFL GRAMMAR WORK LAB
Cornelis HA Coster Erik VerbruggenUniversity of Nijmegen (KUN)
The growth and implementation of Nat-ural Language Processing (NLP) is thecornerstone of the continued evolutionand implementation of truly intelligentsearch machines and services In partdue to the growing collections of com-puter-stored human-readable docu-ments in the public domain theimplementation of linguistic analysiswill become necessary for desirable pre-cision and resolution Subsequentlysuch tools and linguistic resources mustbe released into the public domain andthey have done so with the release of theAGFL Grammar Work Lab under theGNU Public License as a tool for lin-guistic research and the development ofNLP-based applications
The AGFL (Affix Grammars over aFinite Lattice) Grammar Work Labmeshes context-free grammars withfinite set-valued features that are accept-able to a range of languages In com-puter science terms ldquoSyntax rules areprocedures with parameters and a non-deterministic executionrdquo The EnglishPhrases for Information Retrieval(EP4IR) released with the AGFL-GWLas a robust grammar of English is anAGFL-GWL generated English parserthat outputs ldquoHeadModifiedrdquo framesThe sentences ldquoCompanyX sponsoredthis conferencerdquo and ldquoThis conferencewas sponsored by CompanyXrdquo both gen-erate [CompanyX[sponsored confer-
ence]] The hope is that public availabil-ity of such tools will encourage furtherdevelopment of grammar and lexicalsoftware systems
SWILL A SIMPLE EMBEDDED WEB SERVER
LIBRARY
Sotiria Lampoudi David M BeazleyUniversity of Chicago
This paper won the FREENIX Best Stu-dent Paper award
SWILL (Simple Web Interface LinkLibrary) is a simple Web server in theform of a C library whose developmentwas motivated by a wish to give coolapplications an interface to the Web TheSWILL library provides a simple inter-face that can be efficiently implementedfor tasks that vary from flexible Web-based monitoring to software debuggingand diagnostics Though originallydesigned to be integrated with high-per-formance scientific simulation softwarethe interface is generic enough to allowfor unbounded uses SWILL is a single-threaded server relying upon non-blocking IO through the creation of atemporary server which services IOrequests SWILL does not provide SSL orcryptographic authentication but doeshave HTTP authentication abilities
A fantastic feature of SWILL is its sup-port for SPMD-style parallel applica-tions which utilize MPI provingvaluable for Beowulf clusters and largeparallel machines Another practicalapplication was the implementation ofSWILL in a modified Yalnix emulator byUniversity of Chicago Operating Sys-tems courses which utilized the addedYalnix functionality for OS developmentand debugging SWILL requires minimalmemory overhead and relies upon theHTTP10 protocol
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
reduced functionality resulting fromstorage systemsrsquo lack of file information
The speaker presented two enhance-ments ExRaid an exposed RAID andILFS Informed Log-Structured File Sys-tem ExRAID is an enhancement to theblock-based RAID storage system thatexposes the following three types ofinformation to the file system (1)regions ndash contiguous portions of theaddress space comprising one or multi-ple disks (2) performance informationabout queue lengths and throughput ofthe regions revealing disk heterogeneityto the file systems and (3) failure infor-mation ndash dynamically updated informa-tion conveying the number of tolerablefailures in each region
ILFS allows online incremental expan-sion of the storage space performsdynamic parallelism using ExRAIDrsquosperformance information to performwell on heterogeneous storage systemsand provides a range of different mecha-nisms with different granularities forredundancy of files using ExRAIDrsquosregion failure characteristics These newfeatures are added to LFS with only a19 increase in the code size
More informationhttpwwwcswisceduwind
MAXIMIZING THROUGHPUT IN REPLICATED
DISK STRIPING OF VARIABLE
BIT-RATE STREAMS
Stergios V Anastasiadis Duke Univer-sity Kenneth C Sevcik and MichaelStumm University of Toronto
There is an increasing demand for thecontinuous real-time streaming ofmedia files Disk striping is a commontechnique used for supporting manyconnections on the media servers Buteven with 12 million hours of meantime between failures there will be morethan one disk failure per week with 7000disks Fault tolerance can be achieved byusing data replication and reservingextra bandwidth during normal opera-tion This work focused on supportingthe most common variable bit-ratestream formats (eg MPEG)
72 Vol 27 No 5 login
The authors used prototype mediaserver EXEDRA to implement and eval-uate different replication and bandwidthreservation schemes This system sup-ports a variable bit-rate scheme doesstride-based disk allocation and is capa-ble of supporting variable-grain diskstriping Two replication policies arepresented deterministic ndash data blocksare replicated in a round-robin fashionon the secondary replicas random ndashdata blocks are replicated on the ran-domly chosen secondary replicas Twobandwidth reservation techniques arealso presented mirroring reservation ndashdisk bandwidth is reserved for both primary and replicas of the media fileduring the playback and minimumreservation ndash a more efficient scheme inwhich bandwidth is reserved only for thesum of primary data access time and themaximum of the backup data accesstimes required in each round
Experimental results showed that deter-ministic replica placement is better thanrandom placement for small disks mini-mum disk bandwidth reservation istwice as good as mirroring in thethroughput achieved and fault tolerancecan be achieved with a minimal impacton the throughput
TOOLS
Summarized by Li Xiao
SIMPLE AND GENERAL STATISTICAL PROFILING
WITH PCT
Charles Blake and Steven Bauer MIT
This talk introduced a Profile CollectionToolkit (PCT) ndash a sampling-based CPUprofiling facility A novel aspect of PCTis that it allows sampling of semanticallyrich data such as function call stacksfunction parameters local or globalvariables CPU registers or other execu-tion contexts The design objectives ofPCT were driven by user needs and theinadequacies or inaccessibility of priorsystems
The rich data collection capability isachieved via a debugger-controller pro-gram dbct1 The talk then introducedthe profiling collector and data collec-
tion activation PCT file format optionsprovide flexibility and various ways tostore exact states PCT also has analysisoptions two examples were given ndashldquoSimplerdquo and ldquoMixed User and LinuxCoderdquo
The talk then went into how generalsampling helps in the following areas adebugger-controller state machineexample script fragments a simpleexample program and general valuereports in terms of call-path histogramsnumeric value histograms and data gen-erality Related work such as gprof andExpect was then summarized The con-tribution of this work is to provide aportable general value sampling toolThe limitations are that PCT does notsupport much of strip and count inac-curacies happen because of its statisticalnature Based on this limitation -g ispreferred over strip
Possible future directions included sup-porting more debuggers such as dbxand xdb script generators for otherkinds of traces more canned reports forgeneral values and a libgdb-basedlibrary sampler PCT is available athttppdoslcsmitedupct PCT runs onalmost any UNIX-like system
ENGINEERING A DIFFERENCING AND
COMPRESSION DATA FORMAT
David G Korn and Kiem Phong VoATampT Laboratories ndash Research
This talk began with an equation ldquoDif-ferencing + Compression = Delta Com-pressionrdquo Compression removesinformation redundancy in a data setExamples are gzip bzip compress andpack Differencing encodes differencesbetween two data sets Examples are diff -e fdelta bdiff and xdelta Deltacompression compresses a data set givenanother combining differencing andcompression and reduces to pure com-pression when therersquos no commonality
After presenting an overall scheme for adelta compressor and showing deltacompression performance this talk dis-cussed the encoding format of a newlydesigned Vcdiff for delta compression A
Vcdiff instruction code table consists of256 entries of each coding up to a pair ofinstructions and recodes I-byte indicesand any additional data
The talk then showed Vcdiff rsquos perfor-mance with Web data collected fromCNN compared with W3C standardGdiff Gdiff+gzip and diff+gzip whereGdiff was computed using Vcdiff deltainstructions The results of two experi-ments ldquoFirstrdquo and ldquoSuccessiverdquo werepresented In ldquoFirstrdquo each file is com-pressed against the first file collected inldquoSuccessiverdquo each file is compressedagainst the one in the previous hourThe diff+gzip did not work well becausediff was line-oriented Vcdiff performedfavorably compared with other formats
Vcdiff is part of the Vcodex packageVcodex is a platform for all commondata transformations delta compres-sion plain compression encryptiontranscoding (eg uuencode base64) Itis structured in three layers for maxi-mum usability Base library uses Dis-plines and Methods interfaces
The code for Vcdiff can be found athttpwwwresearchattcomswtoolsThey are moving Vcdiff to the IETFstandard as a comprehensive platformfor transforming data Please refer tohttpwwwietforginternet-draftdraft-korn-vcdiff06txt
WHERE IN THE NET
Summarized by Amit Purohit
A PRECISE AND EFFICIENT EVALUATION OF
THE PROXIMITY BETWEEN WEB CLIENTS AND
THEIR LOCAL DNS SERVERS
Zhuoqing Morley Mao University ofCalifornia Berkeley Charles D CranorFred Douglis Michael RabinovichOliver Spatscheck and Jia Wang ATampTLabs ndash Research
Content Distribution Networks (CDNs)attempt to improve Web performance bydelivering Web content to end usersfrom servers located at the edge of thenetwork When a Web client requestscontent the CDN dynamically chooses aserver to route the request to usually the
73October 2002 login
C
ON
FERE
NC
ERE
PORT
Sone that is nearest the client As CDNshave access only to the IP address of thelocal DNS server (LDNS) of the clientthe CDNrsquos authoritative DNS servermaps the clientrsquos LDNS to a geographicregion within a particular network andcombines that with network and serverload information to perform CDNserver selection
This method has two limitations First itis based on the implicit assumption thatclients are close to their LDNS This maynot always be valid Second a singlerequest from an LDNS can represent adiffering number of Web clients ndash calledthe hidden load factor Knowledge of thehidden load factor can be used toachieve better load distribution
The extent of the first limitation and itsimpact on the CDN server selection isdealt with To determine the associationsbetween clients and their LDNS a sim-ple non-intrusive and efficient map-ping technique was developed The datacollected was used to study the impact ofproximity on DNS-based server selec-tion using four different proximity met-rics (1) autonomous system (AS)clustering ndash observing whether a client isin the same AS as its LDNS ndash concludedthat LDNS is very good for coarse-grained server selection as 64 of theassociations belong to the same AS(2) network clustering ndash observingwhether a client is in the same network-aware cluster (NAC) ndash implied that DNSis less useful for finer-grained serverselection since only 16 of the clientand LDNS are in the same NAC (3)traceroute divergence ndash the length of thedivergent paths to the client and itsLDNS from a probe point using tracer-oute ndash implies that most clients aretopologically close to their LDNS asviewed from a randomly chosen probesite (4) round-trip time (RTT) correla-tion (some CDNs select severs based onRTT between the CDN server and theclientrsquos LDNS) examines the correlationbetween the message RTTs from a probepoint to the client and its local DNS
results imply this is a reasonable metricto use to avoid really distant servers
A study of the impact that client-LDNSassociations have on DNS-based serverselection concludes that knowing theclientrsquos IP address would allow moreaccurate server selection in a large num-ber of cases The optimality of the serverselection also depends on the serverdensity placement and selection algo-rithms
Further information can be found athttpwwweecsberkeleyedu~zmaomyresearchhtml or by contactingzmaoeecsberkeleyedu
GEOGRAPHIC PROPERTIES OF INTERNET
ROUTING
Lakshminarayanan SubramanianVenkata N Padmanabhan MicrosoftResearch Randy H Katz University ofCalifornia at Berkeley
Geographic information can provideinsights into the structure and function-ing of the Internet including interac-tions between different autonomoussystems by analyzing certain propertiesof Internet routing It can be used tomeasure and quantify certain routingproperties such as circuitous routinghot-potato routing and geographic faulttolerance
Traceroute has been used to gather therequired data and Geotrack tool hasbeen used to determine the location ofthe nodes along each network path Thisenables the computation of ldquolinearizeddistancesrdquo which is the sum of the geo-graphic distances between successivepairs of routers along the path
In order to measure the circuitousnessof a path a metric ldquodistance ratiordquo hasbeen defined as the ratio of the lin-earized distance of a path to the geo-graphic distance between the source anddestination of the path From the data ithas been observed that the circuitous-ness of a route depends on both geo-graphic and network location of the endhosts A large value of the distance ratioenables us to flag paths that are highly
USENIX 2002
circuitous possibly (though not neces-sarily) because of routing anomalies Itis also shown that the minimum delaybetween end hosts is strongly correlatedwith the linearized distance of the path
Geographic information can be used tostudy various aspects of wide-area Inter-net paths that traverse multiple ISPs Itwas found that end-to-end Internetpaths tend to be more circuitous thanintra-ISP paths the cause for this beingthe peering relationships between ISPsAlso paths traversing substantial dis-tances within two or more ISPs tend tobe more circuitous than paths largelytraversing only a single ISP Anotherfinding is that ISPs generally employhot-potato routing
Geographic information can also beused to capture the fact that two seem-ingly unrelated routers can be suscepti-ble to correlated failures By using thegeographic information of routers wecan construct a geographic topology ofan ISP Using this we can find the toler-ance of an ISPrsquos network to the total net-work failure in a geographic region
For further information contactlakmecsberkeleyedu
PROVIDING PROCESS ORIGIN INFORMATION
TO AID IN NETWORK TRACEBACK
Florian P Buchholz Purdue UniversityClay Shields Georgetown University
Network traceback is currently limitedbecause host audit systems do not main-tain enough information to matchincoming network traffic to outgoingnetwork traffic The talk presented analternative assigning origin informationto every process and logging it duringinteractive login creation
The current implementation concen-trates mainly on interactive sessions inwhich an event is logged when a newconnection is established using SSH orTelnet The method is effective andcould successfully determine steppingstones and reliably detect the source of aDDoS attack The speaker then talkedabout the related work done in this area
74 Vol 27 No 5 login
which led to discussion of the latestdevelopment in the area of ldquohostcausalityrdquo
The speaker ended with the limitationsand future directions of the frameworkThis framework could be extended andcould find many applications in the areaof security Current implementationdoesnrsquot take care of startup scripts andcron jobs but incorporating the origininformation in FS could solve this prob-lem In the current implementation log-ging is just implemented as a proof ofconcept It could be made safe in manyways and this could be another impor-tant aspect of future work
PROGRAMMING
Summarized by Amit Purohit
CYCLONE A SAFE DIALECT OF C
Trevor Jim ATampT Labs ndash Research GregMorrisett Dan Grossman MichaelHicks James Cheney Yanling WangCornell University
Cyclone is designed to provide protec-tion against attacks such as buffer over-flows format string attacks andmemory management errors The cur-rent structure of C allows programmersto write vulnerable programs Cycloneextends C so that it has the safety guar-antee of Java while keeping C syntaxtypes and semantics untouched
The Cyclone compiler performs staticanalysis of the source code and insertsruntime checks into the compiled out-put at places where the analysis cannotdetermine that the code execution willnot violate safety constraints Cycloneimposes many restrictions to preservesafety such as NULL checksThesechecks do not exist in normal C
The speaker then talked about somesample code written in Cyclone and howit tackles safety problems Porting anexisting C application to Cyclone ispretty easy with fewer than 10 changerequired The current implementationconcentrates more on safety than onperformance ndash hence the performance
penalty is substantial Cyclone was ableto find many lingering bugs The finalstep of the project will be to write acompiler to convert a normal C programto an equivalent Cyclone program
COOPERATIVE TASK MANAGEMENT WITHOUT
MANUAL STACK MANAGEMENT
Atul Adya Jon Howell MarvinTheimer William J Bolosky John RDouceur Microsoft Research
The speaker described the definitionsand motivations behind the distinctconcepts of task management stackmanagement IO management conflictmanagement and data partitioningConventional concurrent programminguses preemptive task management andexploits the automatic stack manage-ment of a standard language In the sec-ond approach cooperative tasks areorganized as event handlers that yieldcontrol by returning control to the eventscheduler manually unrolling theirstacks In this project they have adopteda hybrid approach that makes it possiblefor both stack management styles tocoexist Thus programmers can codeassuming one or the other of these stackmanagement styles will operate depend-ing upon the application The speakeralso gave a detailed example of how touse their mechanism to switch betweenthe two styles
The talk then continued with the imple-mentation They were able to preemptmany subtle concurrency problems byusing cooperative task managementPaying a cost up-front to reduce a subtlerace condition proved to be a goodinvestment
Though the choice of task managementis fundamental the choice of stack man-agement can be left to individual tasteThis project enables use of any type ofstack management in conjunction withcooperative task management
IMPROVING WAIT-FREE ALGORITHMS FOR
INTERPROCESS COMMUNICATION IN
EMBEDDED REAL-TIME SYSTEMS
Hai Huang Padmanabhan Pillai KangG Shin University of Michigan
The main characteristic of the real-timetime-sensitive system is its pre-dictable response time But concurrencymanagement creates hurdles to achiev-ing this goal because of the use of locksto maintain consistency To solve thisproblem many wait-free algorithmshave been developed but these are typi-cally high-cost solutions By takingadvantage of the temporal characteris-tics of the system however the time andspace overhead can be reduced
The speaker presented an algorithm fortemporal concurrency control andapplied this technique to improve threewait-free algorithms A single-writermultiple-reader wait-free algorithm anda double-buffer algorithm were pro-posed
Using their transformation mechanismthey achieved an improvement of17ndash66 in ACET and a 14ndash70 reduc-tion in memory requirements for IPCalgorithms This mechanism is extensi-ble and can be applied to other non-blocking IPC algorithms as well Futurework involves reducing the synchroniz-ing overheads in more general IPC algo-rithms with multiple-writer semantics
MOBILITY
Summarized by Praveen Yalagandula
ROBUST POSITIONING ALGORITHMS FOR
DISTRIBUTED AD-HOC WIRELESS SENSOR
NETWORKS
Chris Savarese Jan Rabaey Universityof California Berkeley Koen Langen-doen Delft University of Technology
The Pico Radio Network comprisesmore than a hundred sensors monitorsand actuators equipped with wirelessconnectivity with the following proper-ties (1) no infrastructure (2) computa-tion in a distributed fashion instead ofcentralized computation (3) dynamictopology (4) limited radio range
75October 2002 login
C
ON
FERE
NC
ERE
PORT
S(5) nodes that can act as repeaters(6) devices of low cost and with lowpower and (7) sparse anchor nodes ndashnodes with GPS capability A positioningalgorithm determines the geographicalposition of each node in the network
In two-dimensional space each nodeneeds three reference positions to esti-mate its geographical position There aretwo problems that make positioning dif-ficult in the Pico Radio Network typesetting (1) a sparse anchor node prob-lem and (2) a range error problem
The two-phase approach that theauthors have taken in solving the prob-lem consists of (1) a hop-terrain algo-rithm ndash in this first phase each noderoughly guesses its location by the dis-tance calculated using multi-hops to theanchor points and (2) a refinementalgorithm ndash in this second phase eachnode uses its neighborsrsquo positions torefine its own position estimate Toguarantee the convergence thisapproach uses confidence metricsassigning a value of 10 for anchor nodesand 01 to start for other nodes andincreasing these with each iteration
A simulation tool OMNet++ was usedfor both phases and in various scenariosThe results show that they achievedposition errors of less than 33 in a sce-nario with 5 anchor nodes an averageconnectivity of 7 and 5 range mea-surement error
Guidelines for anchor node deploymentare high connectivity (gt10) a reason-able fraction of anchor nodes (gt 5)and for anchor placement coverededges
APPLICATION-SPECIFIC NETWORK
MANAGEMENT FOR ENERGY-AWARE
STREAMING OF POPULAR MULTIMEDIA
FORMATS
Surendar Chandra University of Geor-gia and Amin Vahdat Duke University
The main hindrance in supporting theincreasing demand for mobile multime-dia on PDA is the battery capacity of the
small devices The idle-time power con-sumption is almost the same as thereceive power consumption on typicalwireless network cards (1319mW vs1425mW) while the sleep state con-sumes far less power (177mW) To letthe system consume energy proportionalto the stream quality the network cardshould transition to sleep state aggres-sively between each packet
The speaker presented previous workwhere different multimedia streamswere studied for PDAs MS Media RealMedia and Quicktime It was found thatif the inter-packet gap is predictablethen huge savings are possible forexample about 80 savings in MSmedia streams with few packet lossesThe limitation of the IEEE 80211bpower-save mode comes into play whenthere are two or more nodes producinga delay between beacon time and whenthe node receivessends packets Thisbadly affects the streams since higherenergy is consumed while waiting
The authors propose traffic shaping forenergy conservation where this is doneby proxy such that packets arrive at reg-ular intervals This is achieved using alocal proxy in the access point and aclient-side proxy in the mobile hostSimulations show that traffic shapingreduces energy consumption and alsoreveals that the higher bandwidthstreams have a lower energy metric(mJkB)
More information is at httpgreenhousecsugaedu
CHARACTERIZING ALERT AND BROWSE SER-
VICES OF MOBILE CLIENTS
Atul Adya Paramvir Bahl and Lili QiuMicrosoft Research
Even though there is a dramatic increasein Internet access from wireless devicesthere are not many studies done oncharacterizing this traffic In this paperthe authors characterize the trafficobserved on the MSN Web site withboth notification and browse traces
USENIX 2002
Around 33 million browsing accessesand about 325 million notificationentries are present in the traces Threetypes of analyses are done for each oneof these two services content analysisconcerning the most popular contentcategories and their distribution popu-larity analysis or the popularity distri-bution of documents and user behavioranalysis
The analysis of the notification logsshows that document access rates followa Zipf-like distribution with most of theaccesses concentrated on a small num-ber of messages and the accesses exhibitgeographical locality ndash users from samelocality tend to receive similar notifica-tion content The browser log analysisshows that a smaller set of URLs areaccessed most times though the accesspattern does not fit any Zipf curve andthe highly accessed URLS remain stableA correlation study between the notifi-cation and browsing services shows thatwireless users have a moderate correla-tion of 012
FREENIX TRACK SESSIONS
BUILDING APPLICATIONS
Summarized by Matt Butner
INTERACTIVE 3-D GRAPHICS APPLICATIONS
FOR TCL
Oliver Kersting Juumlrgen Doumlllner HassoPlattner Institute for Software SystemsEngineering University of Potsdam
The integration of 3-D image renderingfunctionality into a scripting languagepermits interactive and animated 3-Ddevelopment and application withoutthe formalities and precision demandedby low-level CC++ graphics and visual-ization libraries The large and complexC++ API of the Virtual Rendering Sys-tem (VRS) can be combined with theconveniences of the Tcl scripting lan-guage The mapping of class interfaces isdone via an automated process and gen-erates respective wrapper classes all ofwhich ensures complete API accessibilityand functionality without any signifi-
76 Vol 27 No 5 login
cant performance retribution The gen-eral C++-to-Tcl mapper SWIG grantsthe necessary object and hierarchal abili-ties without the use of object-orientedTcl extensions The final API mappingtechniques address C++ features such asclasses overloaded methods enumera-tions and inheritance relations All areimplemented in a proof-of-concept thatmaps the complete C++ API of the VRSto Tcl and are showcased in a completeinteractive 3-D-map system
THE AGFL GRAMMAR WORK LAB
Cornelis HA Coster Erik VerbruggenUniversity of Nijmegen (KUN)
The growth and implementation of Nat-ural Language Processing (NLP) is thecornerstone of the continued evolutionand implementation of truly intelligentsearch machines and services In partdue to the growing collections of com-puter-stored human-readable docu-ments in the public domain theimplementation of linguistic analysiswill become necessary for desirable pre-cision and resolution Subsequentlysuch tools and linguistic resources mustbe released into the public domain andthey have done so with the release of theAGFL Grammar Work Lab under theGNU Public License as a tool for lin-guistic research and the development ofNLP-based applications
The AGFL (Affix Grammars over aFinite Lattice) Grammar Work Labmeshes context-free grammars withfinite set-valued features that are accept-able to a range of languages In com-puter science terms ldquoSyntax rules areprocedures with parameters and a non-deterministic executionrdquo The EnglishPhrases for Information Retrieval(EP4IR) released with the AGFL-GWLas a robust grammar of English is anAGFL-GWL generated English parserthat outputs ldquoHeadModifiedrdquo framesThe sentences ldquoCompanyX sponsoredthis conferencerdquo and ldquoThis conferencewas sponsored by CompanyXrdquo both gen-erate [CompanyX[sponsored confer-
ence]] The hope is that public availabil-ity of such tools will encourage furtherdevelopment of grammar and lexicalsoftware systems
SWILL A SIMPLE EMBEDDED WEB SERVER
LIBRARY
Sotiria Lampoudi David M BeazleyUniversity of Chicago
This paper won the FREENIX Best Stu-dent Paper award
SWILL (Simple Web Interface LinkLibrary) is a simple Web server in theform of a C library whose developmentwas motivated by a wish to give coolapplications an interface to the Web TheSWILL library provides a simple inter-face that can be efficiently implementedfor tasks that vary from flexible Web-based monitoring to software debuggingand diagnostics Though originallydesigned to be integrated with high-per-formance scientific simulation softwarethe interface is generic enough to allowfor unbounded uses SWILL is a single-threaded server relying upon non-blocking IO through the creation of atemporary server which services IOrequests SWILL does not provide SSL orcryptographic authentication but doeshave HTTP authentication abilities
A fantastic feature of SWILL is its sup-port for SPMD-style parallel applica-tions which utilize MPI provingvaluable for Beowulf clusters and largeparallel machines Another practicalapplication was the implementation ofSWILL in a modified Yalnix emulator byUniversity of Chicago Operating Sys-tems courses which utilized the addedYalnix functionality for OS developmentand debugging SWILL requires minimalmemory overhead and relies upon theHTTP10 protocol
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
Vcdiff instruction code table consists of256 entries of each coding up to a pair ofinstructions and recodes I-byte indicesand any additional data
The talk then showed Vcdiff rsquos perfor-mance with Web data collected fromCNN compared with W3C standardGdiff Gdiff+gzip and diff+gzip whereGdiff was computed using Vcdiff deltainstructions The results of two experi-ments ldquoFirstrdquo and ldquoSuccessiverdquo werepresented In ldquoFirstrdquo each file is com-pressed against the first file collected inldquoSuccessiverdquo each file is compressedagainst the one in the previous hourThe diff+gzip did not work well becausediff was line-oriented Vcdiff performedfavorably compared with other formats
Vcdiff is part of the Vcodex packageVcodex is a platform for all commondata transformations delta compres-sion plain compression encryptiontranscoding (eg uuencode base64) Itis structured in three layers for maxi-mum usability Base library uses Dis-plines and Methods interfaces
The code for Vcdiff can be found athttpwwwresearchattcomswtoolsThey are moving Vcdiff to the IETFstandard as a comprehensive platformfor transforming data Please refer tohttpwwwietforginternet-draftdraft-korn-vcdiff06txt
WHERE IN THE NET
Summarized by Amit Purohit
A PRECISE AND EFFICIENT EVALUATION OF
THE PROXIMITY BETWEEN WEB CLIENTS AND
THEIR LOCAL DNS SERVERS
Zhuoqing Morley Mao University ofCalifornia Berkeley Charles D CranorFred Douglis Michael RabinovichOliver Spatscheck and Jia Wang ATampTLabs ndash Research
Content Distribution Networks (CDNs)attempt to improve Web performance bydelivering Web content to end usersfrom servers located at the edge of thenetwork When a Web client requestscontent the CDN dynamically chooses aserver to route the request to usually the
73October 2002 login
C
ON
FERE
NC
ERE
PORT
Sone that is nearest the client As CDNshave access only to the IP address of thelocal DNS server (LDNS) of the clientthe CDNrsquos authoritative DNS servermaps the clientrsquos LDNS to a geographicregion within a particular network andcombines that with network and serverload information to perform CDNserver selection
This method has two limitations First itis based on the implicit assumption thatclients are close to their LDNS This maynot always be valid Second a singlerequest from an LDNS can represent adiffering number of Web clients ndash calledthe hidden load factor Knowledge of thehidden load factor can be used toachieve better load distribution
The extent of the first limitation and itsimpact on the CDN server selection isdealt with To determine the associationsbetween clients and their LDNS a sim-ple non-intrusive and efficient map-ping technique was developed The datacollected was used to study the impact ofproximity on DNS-based server selec-tion using four different proximity met-rics (1) autonomous system (AS)clustering ndash observing whether a client isin the same AS as its LDNS ndash concludedthat LDNS is very good for coarse-grained server selection as 64 of theassociations belong to the same AS(2) network clustering ndash observingwhether a client is in the same network-aware cluster (NAC) ndash implied that DNSis less useful for finer-grained serverselection since only 16 of the clientand LDNS are in the same NAC (3)traceroute divergence ndash the length of thedivergent paths to the client and itsLDNS from a probe point using tracer-oute ndash implies that most clients aretopologically close to their LDNS asviewed from a randomly chosen probesite (4) round-trip time (RTT) correla-tion (some CDNs select severs based onRTT between the CDN server and theclientrsquos LDNS) examines the correlationbetween the message RTTs from a probepoint to the client and its local DNS
results imply this is a reasonable metricto use to avoid really distant servers
A study of the impact that client-LDNSassociations have on DNS-based serverselection concludes that knowing theclientrsquos IP address would allow moreaccurate server selection in a large num-ber of cases The optimality of the serverselection also depends on the serverdensity placement and selection algo-rithms
Further information can be found athttpwwweecsberkeleyedu~zmaomyresearchhtml or by contactingzmaoeecsberkeleyedu
GEOGRAPHIC PROPERTIES OF INTERNET
ROUTING
Lakshminarayanan SubramanianVenkata N Padmanabhan MicrosoftResearch Randy H Katz University ofCalifornia at Berkeley
Geographic information can provideinsights into the structure and function-ing of the Internet including interac-tions between different autonomoussystems by analyzing certain propertiesof Internet routing It can be used tomeasure and quantify certain routingproperties such as circuitous routinghot-potato routing and geographic faulttolerance
Traceroute has been used to gather therequired data and Geotrack tool hasbeen used to determine the location ofthe nodes along each network path Thisenables the computation of ldquolinearizeddistancesrdquo which is the sum of the geo-graphic distances between successivepairs of routers along the path
In order to measure the circuitousnessof a path a metric ldquodistance ratiordquo hasbeen defined as the ratio of the lin-earized distance of a path to the geo-graphic distance between the source anddestination of the path From the data ithas been observed that the circuitous-ness of a route depends on both geo-graphic and network location of the endhosts A large value of the distance ratioenables us to flag paths that are highly
USENIX 2002
circuitous possibly (though not neces-sarily) because of routing anomalies Itis also shown that the minimum delaybetween end hosts is strongly correlatedwith the linearized distance of the path
Geographic information can be used tostudy various aspects of wide-area Inter-net paths that traverse multiple ISPs Itwas found that end-to-end Internetpaths tend to be more circuitous thanintra-ISP paths the cause for this beingthe peering relationships between ISPsAlso paths traversing substantial dis-tances within two or more ISPs tend tobe more circuitous than paths largelytraversing only a single ISP Anotherfinding is that ISPs generally employhot-potato routing
Geographic information can also beused to capture the fact that two seem-ingly unrelated routers can be suscepti-ble to correlated failures By using thegeographic information of routers wecan construct a geographic topology ofan ISP Using this we can find the toler-ance of an ISPrsquos network to the total net-work failure in a geographic region
For further information contactlakmecsberkeleyedu
PROVIDING PROCESS ORIGIN INFORMATION
TO AID IN NETWORK TRACEBACK
Florian P Buchholz Purdue UniversityClay Shields Georgetown University
Network traceback is currently limitedbecause host audit systems do not main-tain enough information to matchincoming network traffic to outgoingnetwork traffic The talk presented analternative assigning origin informationto every process and logging it duringinteractive login creation
The current implementation concen-trates mainly on interactive sessions inwhich an event is logged when a newconnection is established using SSH orTelnet The method is effective andcould successfully determine steppingstones and reliably detect the source of aDDoS attack The speaker then talkedabout the related work done in this area
74 Vol 27 No 5 login
which led to discussion of the latestdevelopment in the area of ldquohostcausalityrdquo
The speaker ended with the limitationsand future directions of the frameworkThis framework could be extended andcould find many applications in the areaof security Current implementationdoesnrsquot take care of startup scripts andcron jobs but incorporating the origininformation in FS could solve this prob-lem In the current implementation log-ging is just implemented as a proof ofconcept It could be made safe in manyways and this could be another impor-tant aspect of future work
PROGRAMMING
Summarized by Amit Purohit
CYCLONE A SAFE DIALECT OF C
Trevor Jim ATampT Labs ndash Research GregMorrisett Dan Grossman MichaelHicks James Cheney Yanling WangCornell University
Cyclone is designed to provide protec-tion against attacks such as buffer over-flows format string attacks andmemory management errors The cur-rent structure of C allows programmersto write vulnerable programs Cycloneextends C so that it has the safety guar-antee of Java while keeping C syntaxtypes and semantics untouched
The Cyclone compiler performs staticanalysis of the source code and insertsruntime checks into the compiled out-put at places where the analysis cannotdetermine that the code execution willnot violate safety constraints Cycloneimposes many restrictions to preservesafety such as NULL checksThesechecks do not exist in normal C
The speaker then talked about somesample code written in Cyclone and howit tackles safety problems Porting anexisting C application to Cyclone ispretty easy with fewer than 10 changerequired The current implementationconcentrates more on safety than onperformance ndash hence the performance
penalty is substantial Cyclone was ableto find many lingering bugs The finalstep of the project will be to write acompiler to convert a normal C programto an equivalent Cyclone program
COOPERATIVE TASK MANAGEMENT WITHOUT
MANUAL STACK MANAGEMENT
Atul Adya Jon Howell MarvinTheimer William J Bolosky John RDouceur Microsoft Research
The speaker described the definitionsand motivations behind the distinctconcepts of task management stackmanagement IO management conflictmanagement and data partitioningConventional concurrent programminguses preemptive task management andexploits the automatic stack manage-ment of a standard language In the sec-ond approach cooperative tasks areorganized as event handlers that yieldcontrol by returning control to the eventscheduler manually unrolling theirstacks In this project they have adopteda hybrid approach that makes it possiblefor both stack management styles tocoexist Thus programmers can codeassuming one or the other of these stackmanagement styles will operate depend-ing upon the application The speakeralso gave a detailed example of how touse their mechanism to switch betweenthe two styles
The talk then continued with the imple-mentation They were able to preemptmany subtle concurrency problems byusing cooperative task managementPaying a cost up-front to reduce a subtlerace condition proved to be a goodinvestment
Though the choice of task managementis fundamental the choice of stack man-agement can be left to individual tasteThis project enables use of any type ofstack management in conjunction withcooperative task management
IMPROVING WAIT-FREE ALGORITHMS FOR
INTERPROCESS COMMUNICATION IN
EMBEDDED REAL-TIME SYSTEMS
Hai Huang Padmanabhan Pillai KangG Shin University of Michigan
The main characteristic of the real-timetime-sensitive system is its pre-dictable response time But concurrencymanagement creates hurdles to achiev-ing this goal because of the use of locksto maintain consistency To solve thisproblem many wait-free algorithmshave been developed but these are typi-cally high-cost solutions By takingadvantage of the temporal characteris-tics of the system however the time andspace overhead can be reduced
The speaker presented an algorithm fortemporal concurrency control andapplied this technique to improve threewait-free algorithms A single-writermultiple-reader wait-free algorithm anda double-buffer algorithm were pro-posed
Using their transformation mechanismthey achieved an improvement of17ndash66 in ACET and a 14ndash70 reduc-tion in memory requirements for IPCalgorithms This mechanism is extensi-ble and can be applied to other non-blocking IPC algorithms as well Futurework involves reducing the synchroniz-ing overheads in more general IPC algo-rithms with multiple-writer semantics
MOBILITY
Summarized by Praveen Yalagandula
ROBUST POSITIONING ALGORITHMS FOR
DISTRIBUTED AD-HOC WIRELESS SENSOR
NETWORKS
Chris Savarese Jan Rabaey Universityof California Berkeley Koen Langen-doen Delft University of Technology
The Pico Radio Network comprisesmore than a hundred sensors monitorsand actuators equipped with wirelessconnectivity with the following proper-ties (1) no infrastructure (2) computa-tion in a distributed fashion instead ofcentralized computation (3) dynamictopology (4) limited radio range
75October 2002 login
C
ON
FERE
NC
ERE
PORT
S(5) nodes that can act as repeaters(6) devices of low cost and with lowpower and (7) sparse anchor nodes ndashnodes with GPS capability A positioningalgorithm determines the geographicalposition of each node in the network
In two-dimensional space each nodeneeds three reference positions to esti-mate its geographical position There aretwo problems that make positioning dif-ficult in the Pico Radio Network typesetting (1) a sparse anchor node prob-lem and (2) a range error problem
The two-phase approach that theauthors have taken in solving the prob-lem consists of (1) a hop-terrain algo-rithm ndash in this first phase each noderoughly guesses its location by the dis-tance calculated using multi-hops to theanchor points and (2) a refinementalgorithm ndash in this second phase eachnode uses its neighborsrsquo positions torefine its own position estimate Toguarantee the convergence thisapproach uses confidence metricsassigning a value of 10 for anchor nodesand 01 to start for other nodes andincreasing these with each iteration
A simulation tool OMNet++ was usedfor both phases and in various scenariosThe results show that they achievedposition errors of less than 33 in a sce-nario with 5 anchor nodes an averageconnectivity of 7 and 5 range mea-surement error
Guidelines for anchor node deploymentare high connectivity (gt10) a reason-able fraction of anchor nodes (gt 5)and for anchor placement coverededges
APPLICATION-SPECIFIC NETWORK
MANAGEMENT FOR ENERGY-AWARE
STREAMING OF POPULAR MULTIMEDIA
FORMATS
Surendar Chandra University of Geor-gia and Amin Vahdat Duke University
The main hindrance in supporting theincreasing demand for mobile multime-dia on PDA is the battery capacity of the
small devices The idle-time power con-sumption is almost the same as thereceive power consumption on typicalwireless network cards (1319mW vs1425mW) while the sleep state con-sumes far less power (177mW) To letthe system consume energy proportionalto the stream quality the network cardshould transition to sleep state aggres-sively between each packet
The speaker presented previous workwhere different multimedia streamswere studied for PDAs MS Media RealMedia and Quicktime It was found thatif the inter-packet gap is predictablethen huge savings are possible forexample about 80 savings in MSmedia streams with few packet lossesThe limitation of the IEEE 80211bpower-save mode comes into play whenthere are two or more nodes producinga delay between beacon time and whenthe node receivessends packets Thisbadly affects the streams since higherenergy is consumed while waiting
The authors propose traffic shaping forenergy conservation where this is doneby proxy such that packets arrive at reg-ular intervals This is achieved using alocal proxy in the access point and aclient-side proxy in the mobile hostSimulations show that traffic shapingreduces energy consumption and alsoreveals that the higher bandwidthstreams have a lower energy metric(mJkB)
More information is at httpgreenhousecsugaedu
CHARACTERIZING ALERT AND BROWSE SER-
VICES OF MOBILE CLIENTS
Atul Adya Paramvir Bahl and Lili QiuMicrosoft Research
Even though there is a dramatic increasein Internet access from wireless devicesthere are not many studies done oncharacterizing this traffic In this paperthe authors characterize the trafficobserved on the MSN Web site withboth notification and browse traces
USENIX 2002
Around 33 million browsing accessesand about 325 million notificationentries are present in the traces Threetypes of analyses are done for each oneof these two services content analysisconcerning the most popular contentcategories and their distribution popu-larity analysis or the popularity distri-bution of documents and user behavioranalysis
The analysis of the notification logsshows that document access rates followa Zipf-like distribution with most of theaccesses concentrated on a small num-ber of messages and the accesses exhibitgeographical locality ndash users from samelocality tend to receive similar notifica-tion content The browser log analysisshows that a smaller set of URLs areaccessed most times though the accesspattern does not fit any Zipf curve andthe highly accessed URLS remain stableA correlation study between the notifi-cation and browsing services shows thatwireless users have a moderate correla-tion of 012
FREENIX TRACK SESSIONS
BUILDING APPLICATIONS
Summarized by Matt Butner
INTERACTIVE 3-D GRAPHICS APPLICATIONS
FOR TCL
Oliver Kersting Juumlrgen Doumlllner HassoPlattner Institute for Software SystemsEngineering University of Potsdam
The integration of 3-D image renderingfunctionality into a scripting languagepermits interactive and animated 3-Ddevelopment and application withoutthe formalities and precision demandedby low-level CC++ graphics and visual-ization libraries The large and complexC++ API of the Virtual Rendering Sys-tem (VRS) can be combined with theconveniences of the Tcl scripting lan-guage The mapping of class interfaces isdone via an automated process and gen-erates respective wrapper classes all ofwhich ensures complete API accessibilityand functionality without any signifi-
76 Vol 27 No 5 login
cant performance retribution The gen-eral C++-to-Tcl mapper SWIG grantsthe necessary object and hierarchal abili-ties without the use of object-orientedTcl extensions The final API mappingtechniques address C++ features such asclasses overloaded methods enumera-tions and inheritance relations All areimplemented in a proof-of-concept thatmaps the complete C++ API of the VRSto Tcl and are showcased in a completeinteractive 3-D-map system
THE AGFL GRAMMAR WORK LAB
Cornelis HA Coster Erik VerbruggenUniversity of Nijmegen (KUN)
The growth and implementation of Nat-ural Language Processing (NLP) is thecornerstone of the continued evolutionand implementation of truly intelligentsearch machines and services In partdue to the growing collections of com-puter-stored human-readable docu-ments in the public domain theimplementation of linguistic analysiswill become necessary for desirable pre-cision and resolution Subsequentlysuch tools and linguistic resources mustbe released into the public domain andthey have done so with the release of theAGFL Grammar Work Lab under theGNU Public License as a tool for lin-guistic research and the development ofNLP-based applications
The AGFL (Affix Grammars over aFinite Lattice) Grammar Work Labmeshes context-free grammars withfinite set-valued features that are accept-able to a range of languages In com-puter science terms ldquoSyntax rules areprocedures with parameters and a non-deterministic executionrdquo The EnglishPhrases for Information Retrieval(EP4IR) released with the AGFL-GWLas a robust grammar of English is anAGFL-GWL generated English parserthat outputs ldquoHeadModifiedrdquo framesThe sentences ldquoCompanyX sponsoredthis conferencerdquo and ldquoThis conferencewas sponsored by CompanyXrdquo both gen-erate [CompanyX[sponsored confer-
ence]] The hope is that public availabil-ity of such tools will encourage furtherdevelopment of grammar and lexicalsoftware systems
SWILL A SIMPLE EMBEDDED WEB SERVER
LIBRARY
Sotiria Lampoudi David M BeazleyUniversity of Chicago
This paper won the FREENIX Best Stu-dent Paper award
SWILL (Simple Web Interface LinkLibrary) is a simple Web server in theform of a C library whose developmentwas motivated by a wish to give coolapplications an interface to the Web TheSWILL library provides a simple inter-face that can be efficiently implementedfor tasks that vary from flexible Web-based monitoring to software debuggingand diagnostics Though originallydesigned to be integrated with high-per-formance scientific simulation softwarethe interface is generic enough to allowfor unbounded uses SWILL is a single-threaded server relying upon non-blocking IO through the creation of atemporary server which services IOrequests SWILL does not provide SSL orcryptographic authentication but doeshave HTTP authentication abilities
A fantastic feature of SWILL is its sup-port for SPMD-style parallel applica-tions which utilize MPI provingvaluable for Beowulf clusters and largeparallel machines Another practicalapplication was the implementation ofSWILL in a modified Yalnix emulator byUniversity of Chicago Operating Sys-tems courses which utilized the addedYalnix functionality for OS developmentand debugging SWILL requires minimalmemory overhead and relies upon theHTTP10 protocol
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
circuitous possibly (though not neces-sarily) because of routing anomalies Itis also shown that the minimum delaybetween end hosts is strongly correlatedwith the linearized distance of the path
Geographic information can be used tostudy various aspects of wide-area Inter-net paths that traverse multiple ISPs Itwas found that end-to-end Internetpaths tend to be more circuitous thanintra-ISP paths the cause for this beingthe peering relationships between ISPsAlso paths traversing substantial dis-tances within two or more ISPs tend tobe more circuitous than paths largelytraversing only a single ISP Anotherfinding is that ISPs generally employhot-potato routing
Geographic information can also beused to capture the fact that two seem-ingly unrelated routers can be suscepti-ble to correlated failures By using thegeographic information of routers wecan construct a geographic topology ofan ISP Using this we can find the toler-ance of an ISPrsquos network to the total net-work failure in a geographic region
For further information contactlakmecsberkeleyedu
PROVIDING PROCESS ORIGIN INFORMATION
TO AID IN NETWORK TRACEBACK
Florian P Buchholz Purdue UniversityClay Shields Georgetown University
Network traceback is currently limitedbecause host audit systems do not main-tain enough information to matchincoming network traffic to outgoingnetwork traffic The talk presented analternative assigning origin informationto every process and logging it duringinteractive login creation
The current implementation concen-trates mainly on interactive sessions inwhich an event is logged when a newconnection is established using SSH orTelnet The method is effective andcould successfully determine steppingstones and reliably detect the source of aDDoS attack The speaker then talkedabout the related work done in this area
74 Vol 27 No 5 login
which led to discussion of the latestdevelopment in the area of ldquohostcausalityrdquo
The speaker ended with the limitationsand future directions of the frameworkThis framework could be extended andcould find many applications in the areaof security Current implementationdoesnrsquot take care of startup scripts andcron jobs but incorporating the origininformation in FS could solve this prob-lem In the current implementation log-ging is just implemented as a proof ofconcept It could be made safe in manyways and this could be another impor-tant aspect of future work
PROGRAMMING
Summarized by Amit Purohit
CYCLONE A SAFE DIALECT OF C
Trevor Jim ATampT Labs ndash Research GregMorrisett Dan Grossman MichaelHicks James Cheney Yanling WangCornell University
Cyclone is designed to provide protec-tion against attacks such as buffer over-flows format string attacks andmemory management errors The cur-rent structure of C allows programmersto write vulnerable programs Cycloneextends C so that it has the safety guar-antee of Java while keeping C syntaxtypes and semantics untouched
The Cyclone compiler performs staticanalysis of the source code and insertsruntime checks into the compiled out-put at places where the analysis cannotdetermine that the code execution willnot violate safety constraints Cycloneimposes many restrictions to preservesafety such as NULL checksThesechecks do not exist in normal C
The speaker then talked about somesample code written in Cyclone and howit tackles safety problems Porting anexisting C application to Cyclone ispretty easy with fewer than 10 changerequired The current implementationconcentrates more on safety than onperformance ndash hence the performance
penalty is substantial Cyclone was ableto find many lingering bugs The finalstep of the project will be to write acompiler to convert a normal C programto an equivalent Cyclone program
COOPERATIVE TASK MANAGEMENT WITHOUT
MANUAL STACK MANAGEMENT
Atul Adya Jon Howell MarvinTheimer William J Bolosky John RDouceur Microsoft Research
The speaker described the definitionsand motivations behind the distinctconcepts of task management stackmanagement IO management conflictmanagement and data partitioningConventional concurrent programminguses preemptive task management andexploits the automatic stack manage-ment of a standard language In the sec-ond approach cooperative tasks areorganized as event handlers that yieldcontrol by returning control to the eventscheduler manually unrolling theirstacks In this project they have adopteda hybrid approach that makes it possiblefor both stack management styles tocoexist Thus programmers can codeassuming one or the other of these stackmanagement styles will operate depend-ing upon the application The speakeralso gave a detailed example of how touse their mechanism to switch betweenthe two styles
The talk then continued with the imple-mentation They were able to preemptmany subtle concurrency problems byusing cooperative task managementPaying a cost up-front to reduce a subtlerace condition proved to be a goodinvestment
Though the choice of task managementis fundamental the choice of stack man-agement can be left to individual tasteThis project enables use of any type ofstack management in conjunction withcooperative task management
IMPROVING WAIT-FREE ALGORITHMS FOR
INTERPROCESS COMMUNICATION IN
EMBEDDED REAL-TIME SYSTEMS
Hai Huang Padmanabhan Pillai KangG Shin University of Michigan
The main characteristic of the real-timetime-sensitive system is its pre-dictable response time But concurrencymanagement creates hurdles to achiev-ing this goal because of the use of locksto maintain consistency To solve thisproblem many wait-free algorithmshave been developed but these are typi-cally high-cost solutions By takingadvantage of the temporal characteris-tics of the system however the time andspace overhead can be reduced
The speaker presented an algorithm fortemporal concurrency control andapplied this technique to improve threewait-free algorithms A single-writermultiple-reader wait-free algorithm anda double-buffer algorithm were pro-posed
Using their transformation mechanismthey achieved an improvement of17ndash66 in ACET and a 14ndash70 reduc-tion in memory requirements for IPCalgorithms This mechanism is extensi-ble and can be applied to other non-blocking IPC algorithms as well Futurework involves reducing the synchroniz-ing overheads in more general IPC algo-rithms with multiple-writer semantics
MOBILITY
Summarized by Praveen Yalagandula
ROBUST POSITIONING ALGORITHMS FOR
DISTRIBUTED AD-HOC WIRELESS SENSOR
NETWORKS
Chris Savarese Jan Rabaey Universityof California Berkeley Koen Langen-doen Delft University of Technology
The Pico Radio Network comprisesmore than a hundred sensors monitorsand actuators equipped with wirelessconnectivity with the following proper-ties (1) no infrastructure (2) computa-tion in a distributed fashion instead ofcentralized computation (3) dynamictopology (4) limited radio range
75October 2002 login
C
ON
FERE
NC
ERE
PORT
S(5) nodes that can act as repeaters(6) devices of low cost and with lowpower and (7) sparse anchor nodes ndashnodes with GPS capability A positioningalgorithm determines the geographicalposition of each node in the network
In two-dimensional space each nodeneeds three reference positions to esti-mate its geographical position There aretwo problems that make positioning dif-ficult in the Pico Radio Network typesetting (1) a sparse anchor node prob-lem and (2) a range error problem
The two-phase approach that theauthors have taken in solving the prob-lem consists of (1) a hop-terrain algo-rithm ndash in this first phase each noderoughly guesses its location by the dis-tance calculated using multi-hops to theanchor points and (2) a refinementalgorithm ndash in this second phase eachnode uses its neighborsrsquo positions torefine its own position estimate Toguarantee the convergence thisapproach uses confidence metricsassigning a value of 10 for anchor nodesand 01 to start for other nodes andincreasing these with each iteration
A simulation tool OMNet++ was usedfor both phases and in various scenariosThe results show that they achievedposition errors of less than 33 in a sce-nario with 5 anchor nodes an averageconnectivity of 7 and 5 range mea-surement error
Guidelines for anchor node deploymentare high connectivity (gt10) a reason-able fraction of anchor nodes (gt 5)and for anchor placement coverededges
APPLICATION-SPECIFIC NETWORK
MANAGEMENT FOR ENERGY-AWARE
STREAMING OF POPULAR MULTIMEDIA
FORMATS
Surendar Chandra University of Geor-gia and Amin Vahdat Duke University
The main hindrance in supporting theincreasing demand for mobile multime-dia on PDA is the battery capacity of the
small devices The idle-time power con-sumption is almost the same as thereceive power consumption on typicalwireless network cards (1319mW vs1425mW) while the sleep state con-sumes far less power (177mW) To letthe system consume energy proportionalto the stream quality the network cardshould transition to sleep state aggres-sively between each packet
The speaker presented previous workwhere different multimedia streamswere studied for PDAs MS Media RealMedia and Quicktime It was found thatif the inter-packet gap is predictablethen huge savings are possible forexample about 80 savings in MSmedia streams with few packet lossesThe limitation of the IEEE 80211bpower-save mode comes into play whenthere are two or more nodes producinga delay between beacon time and whenthe node receivessends packets Thisbadly affects the streams since higherenergy is consumed while waiting
The authors propose traffic shaping forenergy conservation where this is doneby proxy such that packets arrive at reg-ular intervals This is achieved using alocal proxy in the access point and aclient-side proxy in the mobile hostSimulations show that traffic shapingreduces energy consumption and alsoreveals that the higher bandwidthstreams have a lower energy metric(mJkB)
More information is at httpgreenhousecsugaedu
CHARACTERIZING ALERT AND BROWSE SER-
VICES OF MOBILE CLIENTS
Atul Adya Paramvir Bahl and Lili QiuMicrosoft Research
Even though there is a dramatic increasein Internet access from wireless devicesthere are not many studies done oncharacterizing this traffic In this paperthe authors characterize the trafficobserved on the MSN Web site withboth notification and browse traces
USENIX 2002
Around 33 million browsing accessesand about 325 million notificationentries are present in the traces Threetypes of analyses are done for each oneof these two services content analysisconcerning the most popular contentcategories and their distribution popu-larity analysis or the popularity distri-bution of documents and user behavioranalysis
The analysis of the notification logsshows that document access rates followa Zipf-like distribution with most of theaccesses concentrated on a small num-ber of messages and the accesses exhibitgeographical locality ndash users from samelocality tend to receive similar notifica-tion content The browser log analysisshows that a smaller set of URLs areaccessed most times though the accesspattern does not fit any Zipf curve andthe highly accessed URLS remain stableA correlation study between the notifi-cation and browsing services shows thatwireless users have a moderate correla-tion of 012
FREENIX TRACK SESSIONS
BUILDING APPLICATIONS
Summarized by Matt Butner
INTERACTIVE 3-D GRAPHICS APPLICATIONS
FOR TCL
Oliver Kersting Juumlrgen Doumlllner HassoPlattner Institute for Software SystemsEngineering University of Potsdam
The integration of 3-D image renderingfunctionality into a scripting languagepermits interactive and animated 3-Ddevelopment and application withoutthe formalities and precision demandedby low-level CC++ graphics and visual-ization libraries The large and complexC++ API of the Virtual Rendering Sys-tem (VRS) can be combined with theconveniences of the Tcl scripting lan-guage The mapping of class interfaces isdone via an automated process and gen-erates respective wrapper classes all ofwhich ensures complete API accessibilityand functionality without any signifi-
76 Vol 27 No 5 login
cant performance retribution The gen-eral C++-to-Tcl mapper SWIG grantsthe necessary object and hierarchal abili-ties without the use of object-orientedTcl extensions The final API mappingtechniques address C++ features such asclasses overloaded methods enumera-tions and inheritance relations All areimplemented in a proof-of-concept thatmaps the complete C++ API of the VRSto Tcl and are showcased in a completeinteractive 3-D-map system
THE AGFL GRAMMAR WORK LAB
Cornelis HA Coster Erik VerbruggenUniversity of Nijmegen (KUN)
The growth and implementation of Nat-ural Language Processing (NLP) is thecornerstone of the continued evolutionand implementation of truly intelligentsearch machines and services In partdue to the growing collections of com-puter-stored human-readable docu-ments in the public domain theimplementation of linguistic analysiswill become necessary for desirable pre-cision and resolution Subsequentlysuch tools and linguistic resources mustbe released into the public domain andthey have done so with the release of theAGFL Grammar Work Lab under theGNU Public License as a tool for lin-guistic research and the development ofNLP-based applications
The AGFL (Affix Grammars over aFinite Lattice) Grammar Work Labmeshes context-free grammars withfinite set-valued features that are accept-able to a range of languages In com-puter science terms ldquoSyntax rules areprocedures with parameters and a non-deterministic executionrdquo The EnglishPhrases for Information Retrieval(EP4IR) released with the AGFL-GWLas a robust grammar of English is anAGFL-GWL generated English parserthat outputs ldquoHeadModifiedrdquo framesThe sentences ldquoCompanyX sponsoredthis conferencerdquo and ldquoThis conferencewas sponsored by CompanyXrdquo both gen-erate [CompanyX[sponsored confer-
ence]] The hope is that public availabil-ity of such tools will encourage furtherdevelopment of grammar and lexicalsoftware systems
SWILL A SIMPLE EMBEDDED WEB SERVER
LIBRARY
Sotiria Lampoudi David M BeazleyUniversity of Chicago
This paper won the FREENIX Best Stu-dent Paper award
SWILL (Simple Web Interface LinkLibrary) is a simple Web server in theform of a C library whose developmentwas motivated by a wish to give coolapplications an interface to the Web TheSWILL library provides a simple inter-face that can be efficiently implementedfor tasks that vary from flexible Web-based monitoring to software debuggingand diagnostics Though originallydesigned to be integrated with high-per-formance scientific simulation softwarethe interface is generic enough to allowfor unbounded uses SWILL is a single-threaded server relying upon non-blocking IO through the creation of atemporary server which services IOrequests SWILL does not provide SSL orcryptographic authentication but doeshave HTTP authentication abilities
A fantastic feature of SWILL is its sup-port for SPMD-style parallel applica-tions which utilize MPI provingvaluable for Beowulf clusters and largeparallel machines Another practicalapplication was the implementation ofSWILL in a modified Yalnix emulator byUniversity of Chicago Operating Sys-tems courses which utilized the addedYalnix functionality for OS developmentand debugging SWILL requires minimalmemory overhead and relies upon theHTTP10 protocol
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
IMPROVING WAIT-FREE ALGORITHMS FOR
INTERPROCESS COMMUNICATION IN
EMBEDDED REAL-TIME SYSTEMS
Hai Huang Padmanabhan Pillai KangG Shin University of Michigan
The main characteristic of the real-timetime-sensitive system is its pre-dictable response time But concurrencymanagement creates hurdles to achiev-ing this goal because of the use of locksto maintain consistency To solve thisproblem many wait-free algorithmshave been developed but these are typi-cally high-cost solutions By takingadvantage of the temporal characteris-tics of the system however the time andspace overhead can be reduced
The speaker presented an algorithm fortemporal concurrency control andapplied this technique to improve threewait-free algorithms A single-writermultiple-reader wait-free algorithm anda double-buffer algorithm were pro-posed
Using their transformation mechanismthey achieved an improvement of17ndash66 in ACET and a 14ndash70 reduc-tion in memory requirements for IPCalgorithms This mechanism is extensi-ble and can be applied to other non-blocking IPC algorithms as well Futurework involves reducing the synchroniz-ing overheads in more general IPC algo-rithms with multiple-writer semantics
MOBILITY
Summarized by Praveen Yalagandula
ROBUST POSITIONING ALGORITHMS FOR
DISTRIBUTED AD-HOC WIRELESS SENSOR
NETWORKS
Chris Savarese Jan Rabaey Universityof California Berkeley Koen Langen-doen Delft University of Technology
The Pico Radio Network comprisesmore than a hundred sensors monitorsand actuators equipped with wirelessconnectivity with the following proper-ties (1) no infrastructure (2) computa-tion in a distributed fashion instead ofcentralized computation (3) dynamictopology (4) limited radio range
75October 2002 login
C
ON
FERE
NC
ERE
PORT
S(5) nodes that can act as repeaters(6) devices of low cost and with lowpower and (7) sparse anchor nodes ndashnodes with GPS capability A positioningalgorithm determines the geographicalposition of each node in the network
In two-dimensional space each nodeneeds three reference positions to esti-mate its geographical position There aretwo problems that make positioning dif-ficult in the Pico Radio Network typesetting (1) a sparse anchor node prob-lem and (2) a range error problem
The two-phase approach that theauthors have taken in solving the prob-lem consists of (1) a hop-terrain algo-rithm ndash in this first phase each noderoughly guesses its location by the dis-tance calculated using multi-hops to theanchor points and (2) a refinementalgorithm ndash in this second phase eachnode uses its neighborsrsquo positions torefine its own position estimate Toguarantee the convergence thisapproach uses confidence metricsassigning a value of 10 for anchor nodesand 01 to start for other nodes andincreasing these with each iteration
A simulation tool OMNet++ was usedfor both phases and in various scenariosThe results show that they achievedposition errors of less than 33 in a sce-nario with 5 anchor nodes an averageconnectivity of 7 and 5 range mea-surement error
Guidelines for anchor node deploymentare high connectivity (gt10) a reason-able fraction of anchor nodes (gt 5)and for anchor placement coverededges
APPLICATION-SPECIFIC NETWORK
MANAGEMENT FOR ENERGY-AWARE
STREAMING OF POPULAR MULTIMEDIA
FORMATS
Surendar Chandra University of Geor-gia and Amin Vahdat Duke University
The main hindrance in supporting theincreasing demand for mobile multime-dia on PDA is the battery capacity of the
small devices The idle-time power con-sumption is almost the same as thereceive power consumption on typicalwireless network cards (1319mW vs1425mW) while the sleep state con-sumes far less power (177mW) To letthe system consume energy proportionalto the stream quality the network cardshould transition to sleep state aggres-sively between each packet
The speaker presented previous workwhere different multimedia streamswere studied for PDAs MS Media RealMedia and Quicktime It was found thatif the inter-packet gap is predictablethen huge savings are possible forexample about 80 savings in MSmedia streams with few packet lossesThe limitation of the IEEE 80211bpower-save mode comes into play whenthere are two or more nodes producinga delay between beacon time and whenthe node receivessends packets Thisbadly affects the streams since higherenergy is consumed while waiting
The authors propose traffic shaping forenergy conservation where this is doneby proxy such that packets arrive at reg-ular intervals This is achieved using alocal proxy in the access point and aclient-side proxy in the mobile hostSimulations show that traffic shapingreduces energy consumption and alsoreveals that the higher bandwidthstreams have a lower energy metric(mJkB)
More information is at httpgreenhousecsugaedu
CHARACTERIZING ALERT AND BROWSE SER-
VICES OF MOBILE CLIENTS
Atul Adya Paramvir Bahl and Lili QiuMicrosoft Research
Even though there is a dramatic increasein Internet access from wireless devicesthere are not many studies done oncharacterizing this traffic In this paperthe authors characterize the trafficobserved on the MSN Web site withboth notification and browse traces
USENIX 2002
Around 33 million browsing accessesand about 325 million notificationentries are present in the traces Threetypes of analyses are done for each oneof these two services content analysisconcerning the most popular contentcategories and their distribution popu-larity analysis or the popularity distri-bution of documents and user behavioranalysis
The analysis of the notification logsshows that document access rates followa Zipf-like distribution with most of theaccesses concentrated on a small num-ber of messages and the accesses exhibitgeographical locality ndash users from samelocality tend to receive similar notifica-tion content The browser log analysisshows that a smaller set of URLs areaccessed most times though the accesspattern does not fit any Zipf curve andthe highly accessed URLS remain stableA correlation study between the notifi-cation and browsing services shows thatwireless users have a moderate correla-tion of 012
FREENIX TRACK SESSIONS
BUILDING APPLICATIONS
Summarized by Matt Butner
INTERACTIVE 3-D GRAPHICS APPLICATIONS
FOR TCL
Oliver Kersting Juumlrgen Doumlllner HassoPlattner Institute for Software SystemsEngineering University of Potsdam
The integration of 3-D image renderingfunctionality into a scripting languagepermits interactive and animated 3-Ddevelopment and application withoutthe formalities and precision demandedby low-level CC++ graphics and visual-ization libraries The large and complexC++ API of the Virtual Rendering Sys-tem (VRS) can be combined with theconveniences of the Tcl scripting lan-guage The mapping of class interfaces isdone via an automated process and gen-erates respective wrapper classes all ofwhich ensures complete API accessibilityand functionality without any signifi-
76 Vol 27 No 5 login
cant performance retribution The gen-eral C++-to-Tcl mapper SWIG grantsthe necessary object and hierarchal abili-ties without the use of object-orientedTcl extensions The final API mappingtechniques address C++ features such asclasses overloaded methods enumera-tions and inheritance relations All areimplemented in a proof-of-concept thatmaps the complete C++ API of the VRSto Tcl and are showcased in a completeinteractive 3-D-map system
THE AGFL GRAMMAR WORK LAB
Cornelis HA Coster Erik VerbruggenUniversity of Nijmegen (KUN)
The growth and implementation of Nat-ural Language Processing (NLP) is thecornerstone of the continued evolutionand implementation of truly intelligentsearch machines and services In partdue to the growing collections of com-puter-stored human-readable docu-ments in the public domain theimplementation of linguistic analysiswill become necessary for desirable pre-cision and resolution Subsequentlysuch tools and linguistic resources mustbe released into the public domain andthey have done so with the release of theAGFL Grammar Work Lab under theGNU Public License as a tool for lin-guistic research and the development ofNLP-based applications
The AGFL (Affix Grammars over aFinite Lattice) Grammar Work Labmeshes context-free grammars withfinite set-valued features that are accept-able to a range of languages In com-puter science terms ldquoSyntax rules areprocedures with parameters and a non-deterministic executionrdquo The EnglishPhrases for Information Retrieval(EP4IR) released with the AGFL-GWLas a robust grammar of English is anAGFL-GWL generated English parserthat outputs ldquoHeadModifiedrdquo framesThe sentences ldquoCompanyX sponsoredthis conferencerdquo and ldquoThis conferencewas sponsored by CompanyXrdquo both gen-erate [CompanyX[sponsored confer-
ence]] The hope is that public availabil-ity of such tools will encourage furtherdevelopment of grammar and lexicalsoftware systems
SWILL A SIMPLE EMBEDDED WEB SERVER
LIBRARY
Sotiria Lampoudi David M BeazleyUniversity of Chicago
This paper won the FREENIX Best Stu-dent Paper award
SWILL (Simple Web Interface LinkLibrary) is a simple Web server in theform of a C library whose developmentwas motivated by a wish to give coolapplications an interface to the Web TheSWILL library provides a simple inter-face that can be efficiently implementedfor tasks that vary from flexible Web-based monitoring to software debuggingand diagnostics Though originallydesigned to be integrated with high-per-formance scientific simulation softwarethe interface is generic enough to allowfor unbounded uses SWILL is a single-threaded server relying upon non-blocking IO through the creation of atemporary server which services IOrequests SWILL does not provide SSL orcryptographic authentication but doeshave HTTP authentication abilities
A fantastic feature of SWILL is its sup-port for SPMD-style parallel applica-tions which utilize MPI provingvaluable for Beowulf clusters and largeparallel machines Another practicalapplication was the implementation ofSWILL in a modified Yalnix emulator byUniversity of Chicago Operating Sys-tems courses which utilized the addedYalnix functionality for OS developmentand debugging SWILL requires minimalmemory overhead and relies upon theHTTP10 protocol
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
Around 33 million browsing accessesand about 325 million notificationentries are present in the traces Threetypes of analyses are done for each oneof these two services content analysisconcerning the most popular contentcategories and their distribution popu-larity analysis or the popularity distri-bution of documents and user behavioranalysis
The analysis of the notification logsshows that document access rates followa Zipf-like distribution with most of theaccesses concentrated on a small num-ber of messages and the accesses exhibitgeographical locality ndash users from samelocality tend to receive similar notifica-tion content The browser log analysisshows that a smaller set of URLs areaccessed most times though the accesspattern does not fit any Zipf curve andthe highly accessed URLS remain stableA correlation study between the notifi-cation and browsing services shows thatwireless users have a moderate correla-tion of 012
FREENIX TRACK SESSIONS
BUILDING APPLICATIONS
Summarized by Matt Butner
INTERACTIVE 3-D GRAPHICS APPLICATIONS
FOR TCL
Oliver Kersting Juumlrgen Doumlllner HassoPlattner Institute for Software SystemsEngineering University of Potsdam
The integration of 3-D image renderingfunctionality into a scripting languagepermits interactive and animated 3-Ddevelopment and application withoutthe formalities and precision demandedby low-level CC++ graphics and visual-ization libraries The large and complexC++ API of the Virtual Rendering Sys-tem (VRS) can be combined with theconveniences of the Tcl scripting lan-guage The mapping of class interfaces isdone via an automated process and gen-erates respective wrapper classes all ofwhich ensures complete API accessibilityand functionality without any signifi-
76 Vol 27 No 5 login
cant performance retribution The gen-eral C++-to-Tcl mapper SWIG grantsthe necessary object and hierarchal abili-ties without the use of object-orientedTcl extensions The final API mappingtechniques address C++ features such asclasses overloaded methods enumera-tions and inheritance relations All areimplemented in a proof-of-concept thatmaps the complete C++ API of the VRSto Tcl and are showcased in a completeinteractive 3-D-map system
THE AGFL GRAMMAR WORK LAB
Cornelis HA Coster Erik VerbruggenUniversity of Nijmegen (KUN)
The growth and implementation of Nat-ural Language Processing (NLP) is thecornerstone of the continued evolutionand implementation of truly intelligentsearch machines and services In partdue to the growing collections of com-puter-stored human-readable docu-ments in the public domain theimplementation of linguistic analysiswill become necessary for desirable pre-cision and resolution Subsequentlysuch tools and linguistic resources mustbe released into the public domain andthey have done so with the release of theAGFL Grammar Work Lab under theGNU Public License as a tool for lin-guistic research and the development ofNLP-based applications
The AGFL (Affix Grammars over aFinite Lattice) Grammar Work Labmeshes context-free grammars withfinite set-valued features that are accept-able to a range of languages In com-puter science terms ldquoSyntax rules areprocedures with parameters and a non-deterministic executionrdquo The EnglishPhrases for Information Retrieval(EP4IR) released with the AGFL-GWLas a robust grammar of English is anAGFL-GWL generated English parserthat outputs ldquoHeadModifiedrdquo framesThe sentences ldquoCompanyX sponsoredthis conferencerdquo and ldquoThis conferencewas sponsored by CompanyXrdquo both gen-erate [CompanyX[sponsored confer-
ence]] The hope is that public availabil-ity of such tools will encourage furtherdevelopment of grammar and lexicalsoftware systems
SWILL A SIMPLE EMBEDDED WEB SERVER
LIBRARY
Sotiria Lampoudi David M BeazleyUniversity of Chicago
This paper won the FREENIX Best Stu-dent Paper award
SWILL (Simple Web Interface LinkLibrary) is a simple Web server in theform of a C library whose developmentwas motivated by a wish to give coolapplications an interface to the Web TheSWILL library provides a simple inter-face that can be efficiently implementedfor tasks that vary from flexible Web-based monitoring to software debuggingand diagnostics Though originallydesigned to be integrated with high-per-formance scientific simulation softwarethe interface is generic enough to allowfor unbounded uses SWILL is a single-threaded server relying upon non-blocking IO through the creation of atemporary server which services IOrequests SWILL does not provide SSL orcryptographic authentication but doeshave HTTP authentication abilities
A fantastic feature of SWILL is its sup-port for SPMD-style parallel applica-tions which utilize MPI provingvaluable for Beowulf clusters and largeparallel machines Another practicalapplication was the implementation ofSWILL in a modified Yalnix emulator byUniversity of Chicago Operating Sys-tems courses which utilized the addedYalnix functionality for OS developmentand debugging SWILL requires minimalmemory overhead and relies upon theHTTP10 protocol
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
NETWORK PERFORMANCE
Summarized by Florian Buchholz
LINUX NFS CLIENT WRITE PERFORMANCE
Chuck Lever Network Appliance PeterHoneyman CITI University of Michi-gan
Lever introduced a benchmark to mea-sure an NFS client write performanceClient performance is difficult to mea-sure due to hindrances such as poorhardware or bandwidth limitations Fur-thermore measuring application perfor-mance does not identify weaknessesspecifically at the client side Thus abenchmark was developed trying toexercise only data transfers in one direc-tion between server and application Forthis purpose the benchmark was basedon the block sequential write portion ofthe Bonnie file system benchmark Oncea benchmark for NFS clients is estab-lished it can be used to improve clientperformance
The performance measurements wereperformed with an SMP Linux clientand both a Linux NFS server and a Net-work Appliance F85 filer During testinga periodic jump in write latency timewas discovered This was due to a ratherlarge number of pending write opera-tions that were scheduled to be writtenafter certain threshold values wereexceeded By introducing a separate dae-mon that flushes the cached writerequest the spikes could be eliminatedbut as a result the average latency growsover time The problem could be tracedto a function that scans a linked list ofwrite requests After having added ahashtable to improve lookup perfor-mance the latency improved consider-ably
The improved client was then used tomeasure throughput against the twoservers A discrepancy between theLinux server and the filer test wasnoticed and the reason for that tracedback to a global kernel lock that wasunnecessarily held when accessing thenetwork stack After correcting this per-formance improved further However
77October 2002 login
C
ON
FERE
NC
ERE
PORT
Sthe measurements showed that a clientmay run slower when paired with fastservers on fast networks This is due toheavy client interrupt loads more net-work processing on the client side andmore global kernel lock contention
The source code of the project is avail-able at httpwwwcitiumicheduprojectsnfs-perfpatches
A STUDY OF THE RELATIVE COSTS OF
NETWORK SECURITY PROTOCOLS
Stefan Miltchev and Sotiris IoannidisUniversity of Pennsylvania AngelosKeromytis Columbia University
With the increasing need for securityand integrity of remote network ser-vices it becomes important to quantifythe communication overhead of IPSecand compare it to alternatives such asSSH SCP and HTTPS
For this purpose the authors set upthree testing networks direct link twohosts separated by two gateways andthree hosts connecting through onegateway Protocols were compared ineach setup and manual keying was usedto eliminate connection setup costs ForIPSec the different encryption algo-rithms AES DES 3DES hardware DESand hardware 3DES were used In detailFTP was compared to SFTP SCP andFTP over IPSec HTTP to HTTPS andHTTP over IPSec and NFS to NFS overIPSec and local disk performance
The result of the measurements werethat IPSec outperforms other popularencryption schemes Overall unen-crypted communication was fastest butin some cases like FTP the overhead canbe small The use of crypto hardwarecan significantly improve performanceFor future work the inclusion of setupcosts hardware-accelerated SSL SFTPand SSH were mentioned
CONGESTION CONTROL IN LINUX TCP
Pasi Sarolahti University of HelsinkiAlexey Kuznetsov Institute for NuclearResearch at Moscow
Having attended the talk and read thepaper I am still unclear about whetherthe authors are merely describing thedesign decisions of TCP congestion con-trol or whether they are actually the cre-ators of that part of the Linux code Myguess leans toward the former
In the talk the speaker compared theTCP protocol congestion control meas-ures according to IETF and RFC specifi-cations with the actual Linux implemen-tation which does conform to the basicprinciples but nevertheless has differ-ences A specific emphasis was placed onretransmission mechanisms and thecongestion window Also several TCPenhancements ndash the NewReno algo-rithm Selective ACKs (SACK) ForwardACKs (FACK) ndash were discussed andcompared
In some instances Linux does not con-form to the IETF specifications The fastrecovery does not fully follow RFC 2582since the threshold for triggering re-transmit is adjusted dynamically and thecongestion windowrsquos size is not changedAlso the roundtrip-time estimator andthe RTO calculation differ from RFC2988 since it uses more conservativeRTT estimates and a minimum RTO of200ms The performance measuresshowed that with the additional Linux-specific features enabled slightly higherthroughput more steady data flow andfewer unnecessary retransmissions canbe achieved
XTREME XCITEMENT
Summarized by Steve Bauer
THE FUTURE IS COMING WHERE THE X
WINDOW SHOULD GO
Jim Gettys Compaq Computer Corp
Jim Gettys one of the principal authorsof the X Window System outlined thenear-term objectives for the system pri-marily focusing on the changes andinfrastructure required to enable replica-tion and migration of X applicationsProviding better support for this func-tionality would enable users to retrieveor duplicate X applications betweentheir servers at home and work
USENIX 2002
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
One interesting example of an applica-tion that currently is capable of migra-tion and replication is Emacs To create anew frame on DISPLAY try ldquoM-x make-frame-on-display ltReturngt DISPLAYltReturngtrdquo
However technical challenges makereplication and migration difficult ingeneral These include the ldquomajorheadachesrdquo of server-side fonts thenonuniformity of X servers and screensizes and the need to appropriatelyretrofit toolkits Authentication andauthorization issues are obviously alsoimportant The rest of the talk delvedinto some of the details of these interest-ing technical challenges
HACKING IN THE KERNEL
Summarized by Hai Huang
AN IMPLEMENTATION OF SCHEDULER
ACTIVATIONS ON THE NETBSD OPERATING
SYSTEM
Nathan J Williams Wasabi Systems
Scheduler activation is an old idea Basi-cally there are benefits and drawbacks tousing solely kernel-level or user-levelthreading Scheduler activation is able tocombine the two layers of control toprovide more concurrency in the sys-tem
In his talk Nathan gave a fairly detaileddescription of the implementation ofscheduler activation in the NetBSD ker-nel One important change in the imple-mentation is to differentiate the threadcontext from the process context This isdone by defining a separate data struc-ture for these threads and relocatingsome of the information that wasembedded in the process context tothese thread contexts Stack was espe-cially a concern due to the upcall Spe-cial handling must be done to make surethat the upcall doesnrsquot mess up the stackso that the preempted user-level threadcan continue afterwards Lastly Nathanexplained that signals were handled byupcalls
78 Vol 27 No 5 login
AUTHORIZATION AND CHARGING IN PUBLIC
WLANS USING FREEBSD AND 8021X
Pekka Nikander Ericsson ResearchNomadicLab
8021x standards are well known in thewireless community as link-layerauthentication protocols In this talkPekka explained some novel ways ofusing the 8021x protocols that might beof interest to people on the move It ispossible to set up a public WLAN thatwould support various chargingschemes via virtual tokens which peoplecan purchase or earn and later use
This is implemented on FreeBSD usingthe netgraph utility It is basically a filterin the link layer that would differentiatetraffic based on the MAC address of theclient node which is either authenti-cated denied or let through The over-head for this service is fairly minimal
ACPI IMPLEMENTATION ON FREEBSD
Takanori Watanabe Kobe University
ACPI (Advanced Configuration andPower Management Interface) was pro-posed as a joint effort by Intel Toshibaand Microsoft to provide a standard andfiner-control method of managingpower states of individual devices withina system Such low-level power manage-ment is especially important for thosemobile and embedded systems that arepowered by fixed-capacity energy batter-ies
Takanori explained that the ACPI speci-fication is composed of three partstables BIOS and registers He was ableto implement some functionalities ofACPI in a FreeBSD kernel ACPI Com-ponent Architecture was implementedby Intel and it provides a high-levelACPI API to the operating systemTakanorirsquos ACPI implementation is builtupon this underlying layer of APIs
ACCESS CONTROL
Summarized by Florian Buchholz
DESIGN AND PERFORMANCE OF THE
OPENBSD STATEFUL PACKET FILTER (PF)
Daniel Hartmeier Systor AG
Daniel Hartmeier described the newstateful packet filter (pf) that replacedIPFilter in the OpenBSD 30 releaseIPFilter could no longer be included dueto licensing issues and thus there was aneed to write a new filter making use ofoptimized data structures
The filter rules are implemented as alinked list which is traversed from startto end Two actions may be takenaccording to the rules ldquopassrdquo or ldquoblockrdquoA ldquopassrdquo action forwards the packet anda ldquoblockrdquo action will drop it Wheremore than one rule matches a packetthe last rule wins Rules that are markedas ldquofinalrdquo will immediately terminate any further rule evaluation An opti-mization called ldquoskip-stepsrdquo was alsoimplemented where blocks of similarrules are skipped if they cannot matchthe current packet These skip-steps arecalculated when the rule set is loadedFurthermore a state table keeps track ofTCP connections Only packets thatmatch the sequence numbers areallowed UDP and ICMP queries andreplies are considered in the state tablewhere initial packets will create an entryfor a pseudo-connection with a low ini-tial timeout value The state table isimplemented using a balanced binarysearch tree NAT mappings are alsostored in the state table whereas appli-cation proxies reside in user space Thepacket filter also is able to perform frag-ment reassembly and to modulate TCPsequence numbers to protect hostsbehind the firewall
Pf was compared against IPFilter as wellas Linuxrsquos Iptables by measuringthroughput and latency with increasingtraffic rates and different packet sizes Ina test with a fixed rule set size of 100Iptables outperformed the other two fil-ters whose results were close to each
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
other In a second test where the rule setsize was continually increased Iptablesconsistently had about twice thethroughput of the other two (whichevaluate the rule set on both the incom-ing and outgoing interfaces) A third testcompared only pf and IPFilter using asingle rule that created state in the statetable with a fixed-state entry size Pfreached an overloaded state much laterthan IPFilter The experiment wasrepeated with a variable-state entry sizeand pf performed much better thanIPFilter for a small number of states
In general rule set evaluation is expen-sive and benchmarks only reflectextreme cases whereas in real life otherbehavior should be observed Further-more the benchmarks show that statefulfiltering can actually improve perfor-mance due to cheap state-table lookupsas compared to rule evaluation
ENHANCING NFS CROSS-ADMINISTRATIVE
DOMAIN ACCESS
Joseph Spadavecchia and Erez ZadokStony Brook University
The speaker presented modification toan NFS server that allows an improvedNFS access between administrativedomains A problem lies in the fact thatNFS assumes a shared UIDGID spacewhich makes it unsafe to export filesoutside the administrative domain ofthe server Design goals were to leaveprotocol and existing clients unchangeda minimum amount of server changesflexibility and increased performance
To solve the problem two techniques areutilized ldquorange-mappingrdquo and ldquofile-cloakingrdquo Range-mapping maps IDsbetween client and server The mappingis performed on a per-export basis andhas to be manually set up in an exportfile The mappings can be 1-1 N-N orN-1 In file-cloaking the server restrictsfile access based on UIDGID and spe-cial cloaking-mask bits Here users canonly access their own file permissionsThe policy on whether or not the file isvisible to others is set by the cloakingmask which is logically ANDed with the
79October 2002 login
C
ON
FERE
NC
ERE
PORT
Sfilersquos protection bits File-cloaking onlyworks however if the client doesnrsquot holdcached copies of directory contents andfile-attributes Because of this the clientsare forced to re-read directories byincrementing the mtime value of thedirectory each time it is listed
To test the performance of the modifiedserver five different NFS configurationswere evaluated An unmodified NFSserver was compared against one serverwith the modified code included but notused one with only range-mappingenabled one with only file-cloakingenabled and one version with all modi-fications enabled For each setup differ-ent file system benchmarks were runThe results show only a small overheadwhen the modifications are used gener-ally an increase of below 5 Anotherexperiment tested the performance ofthe system with an increasing number ofmapped or cloaked entries on a systemThe results show that an increase from10 to 1000 entries resulted in a maxi-mum of about 14 cost in performance
One member of the audience pointedout that if clients choose to ignore thechanged mtimes from the server andthus still hold caches of the directoryentries the file-cloaking mechanismcould be defeated After a rather lengthydebate the speaker had to concede thatthe model doesnrsquot add any extra securityAnother question was asked about scala-bility of the setup of range mappingThe speaker referred to application-leveltools that could be developed for thatpurpose
The software is available atftpftpfslcssunysbedupubenf
ENGINEERING OPEN SOURCE
SOFTWARE
Summarized by Teri Lampoudi
NINGAUI A LINUX CLUSTER FOR BUSINESS
Andrew Hume ATampT Labs ndash ResearchScott Daniels Electronic Data SystemsCorp
Ningaui is a general purpose highlyavailable resilient architecture built
from commodity software and hard-ware Emphatically however Ningaui isnot a Beowulf Hume calls the clusterdesign the ldquoSwiss canton modelrdquo inwhich there are a number of looselyaffiliated independent nodes with datareplicated among them Jobs areassigned by bidding and leases and clus-ter services done as session-based serversare sited via generic job assignment Theemphasis is on keeping the architectureend-to-end checking all work via check-sums and logging everything Theresilience and high availability requiredby their goal of 8-5 maintenance ndash vsthe typical 24-7 model where people getpaged whenever the slightest thing goeswrong regardless of the time of day ndash isachieved by job restartability Finally allcomputation is performed on local datawithout the use of NFS or networkattached storage
Humersquos message is a hopeful onedespite the many problems encounteredndash things like kernel and service limitsauto-installation problems TCP stormsand the like ndash the existence of sourcecode and the paranoid practice of log-ging and checksumming everything hashelped The final product performs rea-sonably well and it appears that theresilient techniques employed do make adifference One drawback however isthat the software mentioned in thepaper is not yet available for download
CPCMS A CONFIGURATION MANAGEMENT
SYSTEM BASED ON CRYPTOGRAPHIC NAMES
Jonathan S Shapiro and John Vander-burgh Johns Hopkins University
This paper won the FREENIX BestPaper award
The basic notion behind the project isthe fact that everyone has a pet com-plaint about CVS and yet it is currentlythe configuration manager in mostwidespread use Shapiro has unleashedan alternative Interestingly he did notbegin but ended with the usual host ofreasons why CVS is bad The talk insteadbegan abruptly by characterizing the job
USENIX 2002
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
of a software configuration managercontinued by stating the namespaceswhich it must handle and wrapped upwith the challenges faced
X MEETS Z VERIFYING CORRECTNESS IN THE
PRESENCE OF POSIX THREADS
Bart Massey Portland State UniversityRobert T Bauer Rational SoftwareCorp
Massey delivered a humorous talk onthe insight gained from applying Z for-mal specification notation to systemsoftware design rather than the moreinformal analysis and design processnormally used
The story is told with respect to writingXCB which replaces the Xlib protocollayer and is supposed to be threadfriendly But where threads are con-cerned deadlock avoidance becomes ahard problem that cannot be solved inan ad-hoc manner But full modelchecking is also too hard In this caseMassey resorted to Z specification tomodel the XCB lower layer abstractaway locking and data transfers andlocate fundamental issues Essentiallythe difficulties of searching the literatureand locating information relevant to theproblem at hand were overcome AsMassey put it ldquothe formal method savedthe dayrdquo
FILE SYSTEMS
Summarized by Bosko Milekic
PLANNED EXTENSIONS TO THE LINUX
EXT2EXT3 FILESYSTEM
Theodore Y Tsrsquoo IBM StephenTweedie Red Hat
The speaker presented improvements tothe Linux Ext2 file system with the goalof allowing for various expansions whilestriving to maintain compatibility witholder code Improvements have beenfacilitated by a few extra superblockfields that were added to Ext2 not longbefore the Linux 20 kernel was releasedThe fields allow for file system featuresto be added without compromising theexisting setup this is done by providing
80 Vol 27 No 5 login
bits indicating the impact of the addedfeatures with respect to compatibilityNamely a file system with the ldquoincom-patrdquo bit marked is not allowed to bemounted Similarly a ldquoread-onlyrdquo mark-ing would only allow the file system tobe mounted read-only
Directory indexing changes linear direc-tory searches with a faster search using afixed-depth tree and hashed keys Filesystem size can be dynamicallyincreased and the expanded inode dou-bled from 128 to 256 bytes allows formore extensions
Other potential improvements were dis-cussed as well in particular pre-alloca-tion for contiguous files which allowsfor better performance in certain setupsby pre-allocating contiguous blocksSecurity-related modifications extendedattributes and ACLs were mentionedAn implementation of these featuresalready exists but has not yet beenmerged into the mainline Ext23 code
RECENT FILESYSTEM OPTIMISATIONS ON
FREEBSD
Ian Dowse Corvil Networks DavidMalone CNRI Dublin Institute of Technology
David Malone presented four importantfile system optimizations for FreeBSDOS soft updates dirpref vmiodir anddirhash It turns out that certain combi-nations of the optimizations (beautifullyillustrated in the paper) may yield per-formance improvements of anywherebetween 2 and 10 orders of magnitudefor real-world applications
All four of the optimizations deal withfile system metadata Soft updates allowfor asynchronous metadata updatesDirpref changes the way directories areorganized attempting to place childdirectories closer to their parentsthereby increasing locality of referenceand reducing disk-seek times Vmiodirtrades some extra memory in order toachieve better directory caching Finallydirhash which was implemented by IanDowse changes the way in which entries
are searched in a directory SpecificallyFFS uses a linear search to find an entryby name dirhash builds a hashtable ofdirectory entries on the fly For a direc-tory of size n with a working set of mfiles a search that in certain cases couldhave been O(nm) has been reduceddue to dirhash to effectively O(n + m)
FILESYSTEM PERFORMANCE AND SCALABILITY
IN LINUX 2417
Ray Bryant SGI Ruth Forester IBMLTC John Hawkes SGI
This talk focused on performance evalu-ation of a number of file systems avail-able and commonly deployed on Linuxmachines Comparisons under variousconfigurations of Ext2 Ext3 ReiserFSXFS and JFS were presented
The benchmarks chosen for the datagathering were pgmeter filemark andAIM Benchmark Suite VII Pgmetermeasures the rate of data transfer ofreadswrites of a file under a syntheticworkload Filemark is similar to post-mark in that it is an operation-intensivebenchmark although filemark isthreaded and offers various other fea-tures that postmark lacks AIM VIImeasures performance for various file-system-related functionalities it offersvarious metrics under an imposedworkload thus stressing the perfor-mance of the file system not only underIO load but also under significant CPUload
Tests were run on three different setupsa small a medium and a large configu-ration ReiserFS and Ext2 appear at thetop of the pile for smaller and mediumsetups Notably XFS and JFS performworse for smaller system configurationsthan the others although XFS clearlyappears to generate better numbers thanJFS It should be noted that XFS seemsto scale well under a higher load Thiswas most evident in the large-systemresults where XFS appears to offer thebest overall results
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
THINGS TO THINK ABOUT
Summarized by Bosko Milekic
SPEEDING UP KERNEL SCHEDULER BY REDUC-
ING CACHE MISSES
Shuji Yamamura Akira Hirai MitsuruSato Masao Yamamoto Akira NaruseKouichi Kumon Fujitsu Labs
This was an interesting talk pertaining tothe effects of cache coloring for taskstructures in the Linux kernel scheduler(Linux kernel 24x) The speaker firstpresented some interesting benchmarknumbers for the Linux scheduler show-ing that as the number of processes onthe task queue was increased the perfor-mance decreased The authors usedsome really nifty hardware to measurethe number of bus transactionsthroughout their tests and were thusable to reasonably quantify the impactthat cache misses had in the Linuxscheduler
Their experiments led them to imple-ment a cache coloring scheme for taskstructures which were previouslyaligned on 8KB boundaries and there-fore were being eventually mapped tothe same cache lines This unfortunateplacement of task structures in memoryinduced a significant number of cachemisses as the number of tasks grew inthe scheduler
The implemented solution consisted ofaligning task structures to more evenlydistribute cached entries across the L2cache The result was inevitably fewercache misses in the scheduler Some neg-ative effects were observed in certain sit-uations These were primarily due tomore cache slots being used by taskstructure data in the scheduler thusforcing data previously cached there tobe pushed out
81October 2002 login
C
ON
FERE
NC
ERE
PORT
SWORK-IN-PROGRESS REPORTS
Summarized by Brennan Reynolds
RESOURCE VIRTUALIZATION TECHNIQUES FOR
WIDE-AREA OVERLAY NETWORKS
Kartik Gopalan University Stony Brook
This work addressed the issue of provi-sioning a maximum number of virtualoverlay networks (VON) with diversequality of service (QoS) requirementson a single physical network Each of theVONs is logically isolated from others toensure the QoS requirements Gopalanmentioned several critical researchissues with this problem that are cur-rently being investigated Dealing withhow to provision the network at variouslevels (link route or path) and thenenforce the provisioning at run-time isone of the toughest challenges Cur-rently Gopalan has developed severalalgorithms to handle admission controlend-to-end QoS route selection sched-uling and fault tolerance in the net-work
For more information visithttpwwwecslcssunysbedu
VISUALIZING SOFTWARE INSTABILITY
Jennifer Bevan University of CaliforniaSanta Cruz
Detection of instability in software hastypically been an afterthought Thepoint in the development cycle when thesoftware is reviewed for instability isusually after it is difficult and costly togo back and perform major modifica-tions to the code base Bevan has devel-oped a technique to allow thevisualization of unstable regions of codethat can be used much earlier in thedevelopment cycle Her technique cre-ates a time series of dependent graphsthat include clusters and lines calledfault lines From the graphs a developeris able to easily determine where theunstable sections of code are and proac-tively restructure them She is currentlyworking on a prototype implementa-tion
For more information visit httpwwwcseucscecu~jbevanevo_viz
RELIABLE AND SCALABLE PEER-TO-PEER WEB
DOCUMENT SHARING
Li Xiao William and Mary College
The idea presented by Xiao would allowend users to share the content of theWeb browser caches with neighboringInternet users The rationale for this isthat todayrsquos end users are increasinglyconnected to the Internet over high-speed links and the browser caches arebecoming large storage systems There-fore if an individual accesses a pagewhich does not exist in their local cacheXiao is suggesting that they first queryother end users for the content beforetrying to access the machine hosting theoriginal This strategy does have someserious problems associated with it thatstill need to be addressed includingensuring the integrity of the content andprotecting the identity and privacy ofthe end users
SEREL FAST BOOTING FOR UNIX
Leni Mayo Fast Boot Software
Serel is a tool that generates a visual rep-resentation of a UNIX machinersquos boot-up sequence It can be used to identifythe critical path and can show if a par-ticular service or process blocks for anextended period of time This informa-tion could be used to determine wherethe largest performance gains could berealized by tuning the order of executionat boot-up Serel creates a dependencygraph expressed in XML during boot-up This graph is then used to create thevisual representation Currently the toolonly works on POSIX-compliant sys-tems but Mayo stated that he would beporting it to other platforms Otherextensions that were mentionedincluded having the metadata includethe use of shared libraries and monitor-ing the suspendresume sequence ofportable machines
For more information visithttpwwwfastbootorg
USENIX 2002
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
BERKELEY DB XML
John Merrells Sleepycat Software
Merrells gave a quick introduction andoverview of the new XML library forBerkeley DB The library specializes instorage and retrieval of XML contentthrough a tool called XPath The libraryallows for multiple containers per docu-ment and stores everything natively asXML The user is also given a wide rangeof elements to create indices withincluding edges elements text stringsor presence The XPath tool consists of aquery parser generator optimizer andexecution engine To conclude his pre-sentation Merrell gave a live demonstra-tion of the software
For more information visithttpwwwsleepycatcomxml
CLUSTER-ON-DEMAND (COD)
Justin Moore Duke University
Modern clusters are growing at a rapidrate Many have pushed beyond the 5000-machine mark and deployingthem results in large expenses as well asmanagement and provisioning issuesFurthermore if the cluster is ldquorentedrdquoout to various users it is very time-con-suming to configure it to a userrsquos specsregardless of how long they need to useit The COD work presented createsdynamic virtual clusters within a givenphysical cluster The goal was to have aprovisioning tool that would automati-cally select a chunk of available nodesand install the operating system andmiddleware specified by the customer ina short period of time This would allowa greater use of resources since multiplevirtual clusters can exist at once Byusing a virtual cluster the size can bechanged dynamically Moore stated thatthey have created a working prototypeand are currently testing and bench-marking its performance
For more information visithttpwwwcsdukeedu~justincod
82 Vol 27 No 5 login
CATACOMB
Elias Sinderson University of CaliforniaSanta Cruz
Catacomb is a project to develop a data-base-backed DASL module for theApache Web server It was designed as areplacement for the WebDAV The initialrelease of the module only contains sup-port for a MySQL database but could beextended to others Sinderson brieflytouched on the performance of hermodule It was comparable to themod_dav Apache module for all querytypes but search The presentation wasconcluded with remarks about addingsupport for the lock method and includ-ing ACL specifications in the future
For more information visithttpoceancseucsceducatacomb
SELF-ORGANIZING STORAGE
Dan Ellard Harvard University
Ellardrsquos presentation introduced a stor-age system that tuned itself based on theworkload of the system without requir-ing the intervention of the user Theintelligence was implemented as a vir-tual self-organizing disk that residesbelow any file system The virtual diskobserves the access patterns exhibited bythe system and then attempts to predictwhat information will be accessed nextAn experiment was done using an NFStrace at a large ISP on one of their emailservers Ellardrsquos self-organizing storagesystem worked well which he attributesto the fact that most of the files beingrequested were large email boxes Areasof future work include exploration ofthe length and detail of the data collec-tion stage as well as the CPU impact ofrunning the virtual disk layer
VISUAL DEBUGGING
John Costigan Virginia Tech
Costigan feels that the state of currentdebugging facilities in the UNIX worldis not as good as it should be He pro-poses the addition of several elements toprograms like ddd and gdb The firstaddition is including separate heap and
stack components Another is to displayonly certain fields of a data structureand have the ability to zoom in and outif needed Finally the debugger shouldprovide the ability to visually presentcomplex data structures includinglinked lists trees etc He said that a betaversion of a debugger with these abilitiesis currently available
For more information visit httpinfoviscsvtedudatastruct
IMPROVING APPLICATION PERFORMANCE
THROUGH SYSTEM-CALL COMPOSITION
Amit Purohit University of Stony Brook
Web servers perform a huge number ofcontext switches and internal data copy-ing during normal operation These twoelements can drastically limit the perfor-mance of an application regardless ofthe hardware platform it is run on TheCompound System Call (CoSy) frame-work is an attempt to reduce the perfor-mance penalty for context switches anddata copies It includes a user-levellibrary and several kernel facilities thatcan be used via system calls The libraryprovides the programmer with a com-plete set of memory-management func-tions Performance tests using the CoSyframework showed a large saving forcontext switches and data copies Theonly area where the savings betweenconventional libraries and CoSy werenegligible was for very large file copies
For more information visithttpwwwcssunysbedu~purohit
ELASTIC QUOTAS
John Oscar Columbia University
Oscar began by stating that mostresources are flexible but that to datedisks have not been While approxi-mately 80 percent of files are short-liveddisk quotas are hard limits imposed byadministrators The idea behind elasticquotas is to have a non-permanent stor-age area that each user can temporarilyuse Oscar suggested the creation of aehome directory structure to be used indeploying elastic quotas Currently he
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
has implemented elastic quotas as astackable file system There were alsoseveral trends apparent in the dataMany users would ssh to their remotehost (good) but then launch an emailreader on the local machine whichwould connect via POP to the same hostand send the password in clear text(bad) This situation can easily be reme-died by tunneling protocols like POPthrough SSH but it appeared that manypeople were not aware this could bedone While most of his comments onthe use of the wireless network werenegative the list of passwords he hadcollected showed that people wereindeed using strong passwords His rec-ommendations were to educate andencourage people to use protocols likeIPSec SSH and SSL when conductingwork over a wireless network becauseyou never know who else is listening
SYSTRACE
Niels Provos University of Michigan
How can one be sure the applicationsone is using actually do exactly whattheir developers said they do Shortanswer you canrsquot People today are usinga large number of complex applicationswhich means it is impossible to checkeach application thoroughly for securityvulnerabilities There are tools out therethat can help though Provos has devel-oped a tool called systrace that allows auser to generate a policy of acceptablesystem calls a particular application canmake If the application attempts tomake a call that is not defined in thepolicy the user is notified and allowed tochoose an action Systrace includes anauto-policy generation mechanismwhich uses a training phase to record theactions of all programs the user exe-cutes If the user chooses not to be both-ered by applications breaking policysystrace allows default enforcementactions to be set Currently systrace isimplemented for FreeBSD and NetBSDwith a Linux version coming out shortly
83October 2002 login
C
ON
FERE
NC
ERE
PORT
SFor more information visit httpwwwcitiumicheduuprovossystrace
VERIFIABLE SECRET REDISTRIBUTION FOR
SURVIVABLE STORAGE SYSTEMS
Ted Wong Carnegie Mellon University
Wong presented a protocol that can beused to re-create a file distributed over aset of servers even if one of the serversis damaged In his scheme the user mustchoose how many shares the file is splitinto and the number of servers it will bestored across The goal of this work is toprovide persistent secure storage ofinformation even if it comes underattack Wong stated that his design ofthe protocol was complete and he is cur-rently building a prototype implementa-tion of it
For more information visit httpwwwcscmuedu~tmwongresearch
THE GURU IS IN SESSIONS
LINUX ON LAPTOPPDA
Bdale Garbee HP Linux Systems Operation
Summarized by Brennan Reynolds
This was a roundtable-like discussionwith Garbee acting as moderator He iscurrently in charge of the Debian distri-bution of Linux and is involved in port-ing Linux to platforms other than i386He has successfully help port the kernelto the Alpha Sparc and ARM and iscurrently working on a version to runon VAX machines
A discussion was held on which file sys-tem is best for battery life The Riser andExt3 file systems were discussed peoplecommented that when using Ext3 ontheir laptops the disc would never spindown and thus was always consuming alarge amount of power One suggestionwas to use a RAM-based file system forany volume that requires a large numberof writes and to only have the file systemwritten to disk at infrequent intervals orwhen the machine is shut down or sus-pended
A question was directed to Garbee abouthis knowledge of how the HP-Compaqmerger would affect Linux Garbeethought that the merger was good forLinux both within the company and forthe open source community as a wholeHe stated that the new company wouldbe the number one shipper of systemswith Linux as the base operating systemHe also fielded a question about thesupport of older HP servers and theirability to run Linux saying that indeedLinux has been ported to them and hepersonally had several machines in hisbasement running it
A question which sparked a large num-ber of responses concerned problemspeople had with running Linux onmobile platforms Most people actuallydid not have many problems at allThere were a few cases of a particularmachine model not working but therewere only two widespread issues dock-ing stations and the new ACPI powermanagement scheme Docking stationsstill appear to be somewhat of aheadache but most people had devel-oped ad-hoc solutions for getting themto work The ACPI power managementscheme developed by Intel does notappear to have a quick solution TedTrsquoso one of the head kernel developersstated that there are fundamental archi-tectural problems with the 24 series ker-nel that do not easily allow ACPI to beadded However he also stated thatACPI is supported in the newest devel-opment kernel 25 and will be includedin 26
The final topic was the possibility ofpurchasing a laptop without paying anychargetax for Microsoft Windows Theconsensus was that currently this is notpossible Even if the machine comeswith Linux or without any operatingsystem the vendors have contracts inplace which require them to payMicrosoft for each machine they ship ifthat machine is capable of running aMicrosoft operating system The onlyway this will change is if the demand for
USENIX 2002
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
other operating systems increases to thepoint that vendors renegotiate their con-tracts with Microsoft and no one sawthis happening in the near future
LARGE CLUSTERS
Andrew Hume ATampT Labs ndash Research
Summarized by Teri Lampoudi
This was a session on clusters big dataand resilient computing Given that theaudience was primarily interested inclusters and not necessarily big data orbeating nodes to a pulp the conversa-tion revolved mainly around what ittakes to make a heterogeneous clusterresilient Hume also presented aFREENIX paper on his current clusterproject which explained in more detailmuch of what was abstractly claimed inthe guru session
Humersquos use of the word ldquoclusterrdquoreferred not to what I had assumedwould be a Beowulf-type system but to aloosely coupled farm of machines inwhich concurrency was much less of anissue than it would be in a Beowulf Infact the architecture Hume describedwas designed to deal with large amountsof independent transaction processingessentially the process of billing callswhich requires no interprocess messagepassing or anything of the sort a parallelscientific application might
Where does the large data come inSince the transactions in question con-sist of large numbers of flat files amechanism for getting the files ontonodes and off-loading the results is nec-essary In this particular cluster namedNingaui this task is handled by theldquoreplication managerrdquo which generateda large amount of interest in the audi-ence All management functions in thecluster as well as job allocation consti-tute services that are to be bid for andleased out to nodes Furthermore fail-ures are handled by simply repeating thefailed task until it is completed properlyif that is possible and if it is not thenlooking through detailed logs to discover
84 Vol 27 No 5 login
what went wrong Logging and testingfor the successful completion of eachtask are what make this model resilientEvery management operation is loggedat some appropriate level of detail gen-erating 120MBday and every single filetransfer is md5 checksummed This wascharacterized by Hume as a ldquopatientrdquostyle of management where no one con-trols the cluster scheduling behavior isemergent rather than stringentlyplanned and nodes are free to drop inand out of the cluster a downside to thismodel is the increase in latency offset bythe fact that the cluster continues tofunction unattended outside of 8-to-5support hours
In response to questions about the pro-jected scalability of such a schemeHume said the system would presum-ably scale from the present 60 nodes toabout 100 without modifications butthat scaling higher would be a matter forfurther design The guru session endedon a somewhat unrelated note regardingthe problems of getting data off main-frames and onto Linux machines ndashissues of fixed- vs variable-lengthencoding of data blocks resulting in use-ful information being stripped by FTPand problems in converting COBOLcopybooks to something useful on a C-based architecture To reiterate Humersquosclaim there are homemade solutions forsome of these problems and the readershould feel free to contact Mr Hume forthem
WRITING PORTABLE APPLICATIONS
Nick Stoughton MSB Consultants
Summarized by Josh Lothian
Nick Stoughton addressed a concernthat developers are facing more andmore writing portable applications Heproposed that there is no such thing as aldquoportable applicationrdquo only those thathave been ported Developers are still insearch of the Holy Grail of applicationsprogramming a write-once compile-anywhere program Languages such asJava are leading up to this but virtual
machines still have to be ported pathswill still be different etc Web-basedapplications are another area that couldbe promising for portable applications
Nick went on to talk about the future ofportability The recently released POSIX2001 standards are expected to help thesituation The 2001 revision expands theoriginal POSIX sepc into seven volumesEven though POSIX 2001 is more spe-cific than the previous release Nickpointed out that even this standardallows for small differences where ven-dors may define their own behaviorsThis can only hurt developers in thelong run Along these lines it waspointed out that there may be room forautomated tools that have the capabilityto check application code and determinethe level of portability and standardscompliance of the source
NETWORK MANAGEMENT SYSTEM
PERFORMANCE TUNING
Jeff Allen Tellme Networks Inc
Summarized by Matt Selsky
Jeff Allen author of Cricket discussednetwork tuning The basic idea of tuningis measure twiddle and measure againCricket can be used for large installa-tions but itrsquos overkill for smaller instal-lations for which Mrtg is better suitedYou donrsquot need Cricket to do measure-ment Doing something like a shellscript that dumps data to Excel is goodbut you need to get some sort of mea-surement Cricket was not designed forbilling it was meant to help answerquestions
Useful tools and techniques coveredincluded looking for the differencebetween machines in order to identifythe causes for the differences measuringbut also thinking about what yoursquoremeasuring and why remembering touse strace and tcpdump to identifyproblems Problems repeat themselvesperformance tuning requires an intricateunderstanding of all the layers involvedto solve the problem and simplify theautomation (Having a computer science
background helps but is not essential) Ifyou can reproduce the problem youthen should investigate each piece tofind the bottleneck If you canrsquot repro-duce the problem then measure Youneed to understand all the system inter-actions Measurement can help deter-mine whether things are actually slow orif the user is imagining it
Troubleshooting begins with a hunchbut scientific processes are essential Youshould be able to determine the eventstream or when each event occurs hav-ing observation logs can help Somehints provided include checking cron forperiodic anomalies slowing downbinrm to avoid IO overload and look-ing for unusual kernel CPU usage
Also try to use lightweight monitoringto reduce overhead You donrsquot need tomonitor every resource on every systembut you should monitor those resourceson those systems that are essential Donrsquotcheck things too often since you canintroduce overhead that way Lots ofinformation can be gathered from theproc file system interface to the kernel
BSD BOF
Host Kirk McKusick Author and Consultant
Summarized by Bosko Milekic
The BSD Birds of a Feather session atUSENIX started off with five quick andinformative talks and ended with anexciting discussion of licensing issues
Christos Zoulas for the NetBSD Projectwent over the primary goals of theNetBSD Project (namely portability aclean architecture and security) andthen proceeded to briefly discuss someof the challenges that the project hasencountered The successes and areas forimprovement of 16 were then exam-ined NetBSD has recently seen variousimprovements in its cross-build systempackaging system and third-party toolsupport For what concerns the kernel azero-copy TCPUDP implementationhas been integrated and a new pmap
85October 2002 login
C
ON
FERE
NC
ERE
PORT
Scode for arm32 has been introduced ashave some other rather major changes tovarious subsystems More information isavailable in Christosrsquos slides which arenow available at httpwwwnetbsdorggalleryeventsusenix2002
Theo de Raadt of the OpenBSD Projectpresented a rundown of various newthings that the OpenBSD and OpenSSHteams have been looking at Theorsquos talkincluded an amusing description (andpictures) of some of the events thattranspired during OpenBSDrsquos hack-a-thon which occurred the week beforeUSENIX rsquo02 Finally Theo mentionedthat following complications with theIPFilter license the OpenBSD team haddone a fairly extensive license audit
Robert Watson of the FreeBSD Projectbrought up various examples ofFreeBSD being used in the real worldHe went on to describe a wide variety ofchanges that will surface with theupcoming FreeBSD release 50 Notably50 will introduce a re-architecting ofthe kernel aimed at providing muchmore scalable support for SMPmachines 50 will also include an earlyimplementation of KSE FreeBSDrsquos ver-sion of scheduler activations largeimprovements to pccard and significantframework bits from the TrustedBSDproject
Mike Karels from Wind River Systemspresented an overview of how BSDOShas been evolving following the WindRiver Systems acquisition of BSDirsquos soft-ware assets Ernie Prabhakar from Applediscussed Applersquos OS X and its successon the desktop He explained how OS Xaims to bring BSD to the desktop whilestriving to maintain a strong and stablecore one that is heavily based on BSDtechnology
The BoF finished with a discussion onlicensing specifically some folks ques-tioned the impact of Applersquos licensingfor code from their core OS (the DarwinProject) on the BSD community as awhole
CLOSING SESSION
HOW FLIES FLY
Michael H Dickinson University ofCalifornia Berkeley
Summarized by JD Welch
In this lively and offbeat talk Dickinsondiscussed his research into the mecha-nisms of flight in the fruit fly (Droso-phila melanogaster) and related theautonomic behavior to that of a techno-logical system Insects are the most suc-cessful organisms on earth due in nosmall part to their early adoption offlight Flies travel through space onstraight trajectories interrupted by sac-cades jumpy turning motions some-what analogous to the fast jumpymovements of the human eye Using avariety of monitoring techniquesincluding high-speed video in a ldquovirtualreality flight arenardquo Dickinson and hiscolleagues have observed that fliesrespond to visual cues to decide when tosaccade during flight For example morevisual texture makes flies saccade earlier(ie further away from the arena wall)described by Dickinson as a ldquocollisioncontrol algorithmrdquo
Through a combination of sensoryinputs flies can make decisions abouttheir flight path Flies possess a visualldquoexpansion detectorrdquo which at a certainthreshold causes the animal to turn acertain direction However expansion infront of the fly sometimes causes it toland How does the fly decide Using thevirtual reality device to replicate variousvisual scenarios Dickinson observedthat the flies fixate on vertical objectsthat move back and forth across thefield Expansion on the left side of theanimal causes it to turn right and viceversa while expansion directly in frontof the animal triggers the legs to flareout in preparation for landing
If the eyes detect these changes how arethe responses implemented The fliesrsquowings have ldquopower musclesrdquo controlledby mechanical resonance (as opposed tothe nervous system) which drive the
USENIX 2002
wings combined with neurally activatedldquosteering musclesrdquo which change theconfiguration of the wing joints Subtlevariations in the timing of impulses cor-respond to changes in wing movementcontrolled by sensors in the ldquowing-pitrdquoA small wing-like structure the halterecontrols equilibrium by beating con-stantly
Voluntary control is accomplished bythe halteres whose signals can interferewith the autonomic control of the strokecycle The halteres have ldquosteering mus-clesrdquo as well and information derivedfrom the visual system can turn off thehaltere or trick it into a ldquovirtualrdquo prob-lem requiring a response
Dickinson has also studied the aerody-namics of insect flight using a devicecalled the ldquoRobo Flyrdquo an oversizedmechanical insect wing suspended in alarge tank of mineral oil InterestinglyDickinson observed that the average liftgenerated by the fliesrsquo wings is under itsbody weight the flies use three mecha-nisms to overcome this including rotat-ing the wings (rotational lift)
Insects are extraordinarily robust crea-tures and because Dickinson analyzedthe problem in a systems-oriented waythese observations and analysis areimmediately applicable to technologythe response system can be used as anefficient search algorithm for controlsystems in autonomous vehicles forexample
THE AFS WORKSHOP
Summarized by Garry Zacheiss MIT
Love Hornquist-Astrand of the Arladevelopment team presented the Arlastatus report Arla 0358 has beenreleased Scheduled for release soon areimproved support for Tru64 UNIXMacOS X and FreeBSD improved vol-ume handling and implementation ofmore of the vospts subcommands Itwas stressed that MacOS X is consideredan important platform and that a GUIconfiguration manager for the cache
86 Vol 27 No 5 login
manager a GUI ACL manager for thefinder and a graphical login that obtainsKerberos tickets and AFS tokens at logintime were all under development
Future goals planned for the underlyingAFS protocols include GSSAPISPNEGOsupport for Rx performance improve-ments to Rx an enhanced disconnectedmode and IPv6 support for Rx anexperimental patch is already availablefor the latter Future Arla-specific goalsinclude improved performance partialfile reading and increased stability forseveral platforms Work is also inprogress for the RXGSS protocol exten-sions (integrating GSSAPI into Rx) Apartial implementation exists and workcontinues as developers find time
Derrick Brashear of the OpenAFS devel-opment team presented the OpenAFSstatus report OpenAFS was releasedimmediately prior to the conferenceOpenAFS 125 fixed a remotelyexploitable denial-of-service attack inseveral OpenAFS platforms mostnotably IRIX and AIX Future workplanned for OpenAFS includes bettersupport for MacOS X including work-ing around Finder interaction issuesBetter support for the BSDs is alsoplanned FreeBSD has a partially imple-mented client NetBSD and OpenBSDhave only server programs availableright now AIX 5 Tru64 51A andMacOS X 102 (aka Jaguar) are allplanned for the future Other plannedenhancements include nested groups inthe ptserver (code donated by the Uni-versity of Michigan awaits integration)disconnected AFS and further work ona native Windows client Derrick statedthat the guiding principles of OpenAFSwere to maintain compatibility withIBM AFS support new platforms easethe administrative burden and add newfunctionality
AFS performance was discussed Theopenafsorg cell consists of two fileservers one in Stockholm Sweden andone in Pittsburgh PA AFS works rea-sonably well over trans-Atlantic links
although OpenAFS clients donrsquot deter-mine which file server to talk to veryeffectively Arla clients use RTTs to theserver to determine the optimal fileserver to fetch replicated data fromModifications to OpenAFS to supportthis behavior in the future are desired
Jimmy Engelbrecht and Harald Barth ofKTH discussed their AFSCrawler scriptwritten to determine how many AFScells and clients were in the world whatimplementationsversions they were(Arla vs IBM AFS vs OpenAFS) andhow much data was in AFS The scriptunfortunately triggered a bug in IBMAFS 36ndashderived code causing someclients to panic while handling a specificRPC This has since been fixed in OpenAFS 125 and the most recent IBMAFS patch level and all AFS users arestrongly encouraged to upgrade Norelease of Arla is vulnerable to this par-ticular denial-of-service attack Therewas an extended discussion of the use-fulness of this exploration Many sitesbelieved this was useful information andsuch scanning should continue in thefuture but only on an opt-in basis
Many sites face the problem of manag-ing KerberosAFS credentials for batch-scheduled jobs Specifically most batchprocessing software needs to be modi-fied to forward tickets as part of thebatch submission process renew ticketsand tokens while the job is in the queueand for the lifetime of the job and prop-erly destroy credentials when the jobcompletes Ken Hornstein of NRL wasable to pay a commercial vendor to sup-port Kerberos 45 credential manage-ment in their product although they didnot implement AFS token managementMIT has implemented some of thedesired functionality in OpenPBS andmight be able to make it available toother interested sites
Tools to simplify AFS administrationwere discussed including
AFS Balancer A tool written byCMU to automate the process ofbalancing disk usage across all
servers in a cell Available fromftpftpandrewcmuedupubAFS-Toolsbalance-11btargz
Themis Themis is KTHrsquos enhancedversion of the AFS tool ldquopackagerdquofor updating files on local disk froma central AFS image KTHrsquosenhancements include allowing thedeletion of files simplifying theprocess of adding a file and allow-ing the merging of multiple rulesets for determining which files areupdated Themis is available fromthe Arla CVS repository
Stanford was presented as an example ofa large AFS site Stanfordrsquos AFS usageconsists of approximately 14TB of datain AFS in the form of approximately100000 volumes 33TB of storage isavailable in their primary cell irstan-fordedu Their file servers consistentirely of Solaris machines running acombination of Transarc 36 patch-level3 and OpenAFS 12x while their data-base servers run OpenAFS 122 Theircell consists of 25 file servers using acombination of EMC and Sun StorEdgehardware Stanford continues to use thekaserver for their authentication infra-structure with future plans to migrateentirely to an MIT Kerberos 5 KDC
Stanford has approximately 3400 clientson their campus not including SLAC(Stanford Linear Accelerator) approxi-mately 2100 AFS clients from outsideStanford contact their cell every monthTheir supported clients are almostentirely IBM AFS 36 clients althoughthey plan to release OpenAFS clientssoon Stanford currently supports onlyUNIX clients There is some on-campuspresence of Windows clients but Stan-ford has never publicly released or sup-ported it They do intend to release andsupport the MacOS X client in the nearfuture
All Stanford students faculty and staffare assigned AFS home directories witha default quota of 50MB for a total ofapproximately 550GB of user homedirectories Other significant uses of AFS
87October 2002 login
C
ON
FERE
NC
ERE
PORT
Sstorage are data volumes for workstationsoftware (400GB) and volumes forcourse Web sites and assignments(100GB)
AFS usage at Intel was also presentedIntel has been an AFS site since 1994They had bad experiences with the IBM35 Linux client their experience withOpenAFS on Linux 24x kernels hasbeen much better They use and are sat-isfied with the OpenAFS IA64 Linuxport Intel has hundreds of OpenAFS123 and 124 clients in many produc-tion cells accessing data stored on IBMAFS file servers They have not encoun-tered any interoperability issues Intelhas some concerns about OpenAFS theywould like to purchase commercial sup-port for OpenAFS and to see OpenAFSsupport for HP-UX on both PA-RISCand Itanium hardware HP-UX supportis currently unavailable due to a specificHP-UX header file being unavailablefrom HP this may be available soonIntel has not yet committed to migratingtheir file servers to OpenAFS and areunsure if they will do so without com-mercial support
Backups are a traditional topic of dis-cussion at AFS workshops and this timewas no exception Many users complainthat the traditional AFS backup tools(ldquobackuprdquo and ldquobutcrdquo) are complex anddifficult to automate requiring manyhome-grown scripts and much userintervention for error recovery An addi-tional complaint was that the traditionalAFS tools do not support file- or direc-tory-level backups and restores datamust be backed up and restored at thevolume level
Mitch Collinsworth of Cornell presentedwork to make AMANDA the freebackup software from the University ofMaryland suitable for AFS backupsUsing AMANDA for AFS backups allowsone to share AFS backup tapes withnon-AFS backups easily run multiplebackups in parallel automate errorrecovery and provide a robust degradedmode that prevents tape errors from
stopping backups altogether Theirimplementation allows for full volumerestores as well as individual directoryand file restores They have finished cod-ing this work and are in the process oftesting and documenting it
Peter Honeyman of CITI at the Univer-sity of Michigan spoke about work hehas proposed to replace Rx with RPC-SEC GSS in OpenAFS this would allowAFS to use a TCP-based transport mech-anism rather than the UDP-based Rxand possibly gain better congestion con-trol dynamic adaptation and fragmen-tation avoidance as a result RPCSECGSS uses the GSSAPI to authenticateSUN ONC RPC RPCSEC GSS is trans-port agnostic provides strong security isa developing Internet standard and hasmultiple open source implementationsBackward compatibility with existingAFS servers and clients is an importantgoal of this project
Top Related