Statement Stephen Kell 200603

download Statement Stephen Kell 200603

of 12

Transcript of Statement Stephen Kell 200603

  • 8/3/2019 Statement Stephen Kell 200603

    1/12

    Personal Statement

    and Outline of Proposed Research

    to support a PhD application

    Stephen Kell

    March 2006

    1 Interests

    1.1 General Interests

    My primary research interests lie in the fields of operating systems, distributedsystems and programming languages. I also have interests in software engi-neering, networks, continuous media applications, sentient environments andhuman-computer interaction. I intend to pursue a career in research, and dur-ing 20056 have been a Research Assistant at the Computer Laboratory of theUniversity of Cambridge. Further details may be found in my CV.

    1.2 Immediate InterestsI am particularly interested in the ways in which the design and implementationof operating systems and programming languages affect the development and useof applications. This includes, for example, supporting application-level quality-of-service, reliability and security guarantees, and facilitating the adaptation,re-use and redeployment of software components.

    In todays increasingly complex, mobile and distributed systems, these prop-erties are ever more important. Their absence in real systems leads to recurrentproblems in system development [14], deployment [15] and evolution; their im-portance in achieving reliability [16, 3], quality-of-service [4] and security [3] isalso well-known.

    Modularity is a structural property which I will define loosely (for the mo-

    ment) as the ability to adapt, combine, re-use and redeploy components of asoftware system. These abilities are not only useful in their own right, for in-stance in reducing development and administration costs, but can contribute tothe provision of quality-of-service, reliability, verifiability and security, amongothers. A modular approach can enable the structural enforcement of theseproperties across the entire system.

    1

  • 8/3/2019 Statement Stephen Kell 200603

    2/12

    In contrast to traditional software engineering work, my focus is on dy-

    namic modularity, by which I mean modularity among an open-ended set ofindependently-developed components. This kind of modularity is dependent onthe support of the operating systems linker and system call interface, sincethese enforce the boundaries between components. For my PhD I intend toexplore techniques for providing modularity within such systems.

    2 Problems

    Conventional programming models make it hard to write highly modular ap-plications. For example, the Unix-like model espoused by most programminglanguages standard libraries is inadequate in the following ways.

    There is insufficient indirection when accessing services external to a par-ticular component. Here a component is a run-time instance of anyunit of code: the state corresponding to a single source file, a library, anexecutable or some other grouping. Conventionally, a variety of low-levelinterfaces provide IPC, I/O and intra-process linkage in the case of Unix,these are devices, files, sockets, processes and the linker. Each of these hasits own namespace and is restricted to a particular set of implementations Unixs heavyweight processes, and the sets of devices, VFS implemen-tations and socket types known to the operating system. The applicationprogrammer must commit to these implementations at development time.

    The low-level nature of these interfaces forces applications to layer theirown abstractions on top. This leads to many similar but mutually incom-

    patible conventions for data encoding, type systems, procedure call, re-source management, communication protocols and so on. This precludesdirect interoperability between components which do not use the sameconventions. For example, it is hard for an application written in onelanguage to make use of a library written in another. Likewise, an appli-cation targetting one networking stack or file format cannot be made touse another, even when the application uses only those abstract featureswhich are common to both (e.g. IPv6 versus IPv4 [15]).

    As a result of the above, software at higher layers of a system is tightlycoupled to particular lower-layer implementations. This is witnessed indeployment problems (e.g. IPv6) and in code duplication. Even withinthe open-source community, where code is freely available, it is common

    for libraries to be duplicated in different languages, or applications tobe dependent on a particular windowing toolkit or run-time system. Toovercome these dense inter-module dependencies may require relinking,recompilation or ad-hoc coding effort [17].

    The implementation-specific nature of these interfaces leads programmersto make assumptions which subsequently limit re-use and scalability. For

  • 8/3/2019 Statement Stephen Kell 200603

    3/12

    example, distributed filesystems on Unix are notoriously problematic be-

    cause programmers conventionally assume access to a local file to be fast,and ignore partial failure modes [18]. Similarly, shared libraries frequentlyexpose data structures directly, assuming that clients reside in the sameaddress space.

    The only provision for access control, quality of service, reliability andother cross-cutting concerns is implementation-specific. For example, theUnix file system provides access control at the API level, but for net-work communication this must be implemented in the application. Useof a shared library precludes memory-based fault isolation since it im-plies a shared address space. There is frequently no means of propagatingquality-of-service requirements to foreign modules. Lack of a pervasivetype system limits the potential for static analysis across module bound-

    aries, leading to undetected bugs and security exploits.

    Existing middlewares [19], virtual machines [20] and component systems [5]attempt to solve problems of dynamic modularity, with some success. How-ever, they are typically mutually incompatible, have high compulsory overheads[21], and still carry dependencies on particular network protocols, programmingparadigms or other implementation details. Moreover, since middlewares areperceived as (and often focussed towards) tackling only problems of distributionrather than the more general modularity, they are not popular among softwaredevelopers, except where the need for both modularity and distribution is ob-vious from the design stages.

    Since modularitys primary goals of reusability, adaptibility and replaceabil-ity all address problems caused by a lack of foresight among developers, there is

    a strong argument that modularity should be naturally provided by default,within the most basic programming models targeted by application developers,and without requiring pre-commitment to particular implementations.

    Accordingly, I intend to research ways of supporting modularity at the oper-ating system level, by devising new programming models which lead naturallyto modular applications. This should include demonstrable improvements toboth internal (reusability, replaceability, adaptibility) and one or more external(quality-of-service, security, reliability) characteristics, and be accessible to allkinds of applications and supporting user-space code.

    3 Ideas and Approach

    3.1 Foundations

    My approach will be based on the following principles, motivated by well-knownexisting research.

    1. Separation of interface from implementation: this principle, introduced byParnas as information hiding [9], is now well-accepted. It is embodied in

  • 8/3/2019 Statement Stephen Kell 200603

    4/12

    many programming languages [22] and operating systems [6, 23]. Nemesis

    provides a particularly strong separation in its programming model [1],and is a useful starting point. However, it does not solve the interfacemismatch problem, since it employs direct linking to pre-defined inter-faces specified using an IDL.

    2. Correct placement of abstraction: Engler et al [10] argue that operat-ing systems should not include compulsory abstractions, since they limitapplication-level flexibility and, ultimately, compromise performance andreliability. This can be inverted: applications should not build in abstrac-tions themselves, since this hinders flexibility, reusability and portabil-ity. This argument motivates a three-layered approach, where a middlelayer of abstraction implementations sits underneath applications. In thismodel, operating system services are directed towards the middle layer,

    rather than towards applications themselves.

    3. First-class consideration of connectivity, separate from the componentsthemselves: this principle follows naturally from the point 2. Shaw [8] ap-proaches the same issue, albeit from a static closed-project perspective,and covers many directly relevant problems. In summary, she concludesthat a software component should contain as little specification as possi-ble of how it connects to others, but that these should be formalised in aseparate domain, supporting abstractions analogous to (but distinct from)those in component languages. I add that to allow dynamic modularity,this formalisation must be supported by the run-time system, specificallythe operating system.

    4. Unification of interfaces and namespaces: Unix [11] achieved much of itspower and elegance by partially unifying the programming interfaces usedto access files, devices and communication streams. Many later devel-opments, including the VFS interface [12] and Plan 9 [7], extended thisidea by noting that the abstract data type exposed by a Unix file is verygeneral. Use of a single API also enforces a consistent interface to accesscontrol. However, extending this simple storage-oriented interface toofar causes problems: programmers may make incorrect assumptions (e.g.about distribution-hiding interfaces [18]) and all interactions must some-how be characterised as read or write operations (a problem first acknowl-edged, but hardly solved, by Unixs ioctl()).1 It is worth exploring thetrade-off here: increased unification of programming interfaces may offerbetter modularity, but the resulting interfaces may also be more difficult

    to use, since they allow fewer assumptions on the part of the programmer.

    5. The benefits of reflection: reflection [25] is a technique by which the in-ternal workings of a system are rendered tractable from within that sys-tems computational processes. This has been applied to programming

    1In fact, the Unix filesystem API is often found inadequate even for storage applications.Policroniades [24] presents one argument.

  • 8/3/2019 Statement Stephen Kell 200603

    5/12

    languages [13] and middlewares [4] to enable run-time extension, adapta-

    tion and other aspects of dynamic modularity. Reflection is often realisedas a unified programming interface (referring back to point 4): for exam-ple, consider Javas fixed set of interfaces for manipulation of the JVMsrun-time type metadata.

    6. The importance of names: influential papers by Saltzer [2], Needham [26]and others [27] motivate the importance of naming within systems. Namesare crucial to both sharing and protection, since any component may onlyaccess that which it can name. As the fundamental mechanism for in-direction, names are also crucial to abstraction: using a more abstractnamespace removes dependency on particular implementations. Delayingbinding until link-, load- or run-time provides is a common technique foradding flexibility: examples include dynamic linking, virtual functions in

    C++[22], environment variables in Unix [11], the Internets domain namesystem and countless others. Naming is a particularly deep area, andthere is a rich taxonomy of names: pure or impure; structured or flat;well-known or secret.

    7. The benefits of type systems: static type-checking is well known as a usefulway to detect and avoid bugs during software development. However,retaining typing information at run-time is also useful in any applicationwhere logic dealing in higher-level semantic concerns may be replacedor extended at run-time. This includes security policies [3], applicationscripting, dynamic extensibility and adaptation [28, 4]. Additionally, apervasive type system aids verification across module boundaries and atrun-time: static analysis, perhaps augmented by trusted toolchains and

    proof-carrying code, can be used to guarantee correctness and reliabilityproperties without the need for heavyweight run-time checks [28, 20, 29].

    3.2 Novel Contributions

    I contribute two possibly-novel suggestions.Firstly, I propose that to counter interface mismatch problems, we begin by

    admitting defeat. Political solutions (i.e. standardisation) cannot succeed inensuring interface matching when there is no common administration betweencomponent developers. Instead, I propose a technical solution which makesexplicit provision for mismatched interfaces. Developers should not need, andin fact should not attempt, to target their code at existing concrete interfaces.Rather, they should devise their own abstract interface, and rely on the run-time support of the operating system to allow this to be joined together withthe interfaces exposed by supporting components. Selection of these componentsshould be left until run-time.

    Under such a system, each module exposes its own interface to higher-layercomponents, by exporting a namespace of typed objects. To these, the run-timesystem will glue the abstract interfaces targeted by client components. This

  • 8/3/2019 Statement Stephen Kell 200603

    6/12

    embodies point 3 above: inter-module connectivity is given first-class consider-

    ation and run-time support. Making this sufficiently powerful and efficient willbe a substantial part of the proposed research.Secondly, I present a generalisation of the familiar concept of name to a

    naming expression. Names are typically a subcategory of expressions in formallanguages: while expressions are tree-structured entities evaluated against anenvironment by some well-defined reduction process, names are atomic or linear-structured objects resolved against a context by some well-defined resolutionalgorithm.

    By introducing an expression-like name, subscribing roughly to descriptivisttheories of naming2, the dynamism and flexibility provided by names (as out-lined in point 6 above) can be applied to the problems of inter-module connec-tivity and, particularly, interface mismatch. One approach is described in thefollowing section.

    3.3 Outline of Proposed System

    Consider, for example, embedding features of a functional language such as ML[30] within the name service of a file system. Instead of supplying the name ofa pre-existing directory to a call such as opendir(), a program might supply afunction application expression, whose evaluation yields the set of objects whichto open. In this way, the programs logic may be applied to a set of objectswhich do not reside in a physical directory on disk. In other words, by increasingthe expressivity of names, we have removed some implementation dependencyand hence improved the programs flexibility.

    With a suitable language design, this technique may be extended to pro-

    vide arbitrary transformations of all kinds of interfaces, not just filesystems,albeit possibly requiring complex naming expressions. The key idea is thatthe naming expression supplied by the user specifies how to adapt the foreignmodules exported interface into the abstract one targeted by the local mod-ule. Since adaptation and glue code is most easily specified in functional orscripting-oriented languages, this is a convenient approach. Crucially, the nec-essary connection logic can be supplied at run time, allowing components to bereplaced without the need for recompilation or relinking.3

    Design of the naming language, including its type system and computationalpower (e.g. ability to express recursion), is a matter for research. Some otherremaining questions concern how to integrate this system with the programminglanguages used to write components. These questions include the following.

    What does a naming expression denote? This question effectively askswhat primitives the model should include. In the spirit of informationhiding, it is expected that the model will be oriented around functions (orsets of functions) rather than plain data. There must also be some notion

    2For an example see Russell, On Denoting, Mind, 1910.3This does not preclude grouping of particularly useful transformations into libraries, where

    each would appear as a named function.

  • 8/3/2019 Statement Stephen Kell 200603

    7/12

    of environment, i.e. a function mapping from names to interfaces, where

    environment is itself an interface similar to Nemesiss Context. How does code get hold of a name? Some names will be explicitly rep-

    resented in input data; others, for example command-line arguments orUnixs standard I/O streams, must be assumed by the program and arebound at run time, often implicitly. These implicit bindings effectivelybootstrap a component, by defining the initial space of nameable inter-faces. One implementation could involve some sort of hereditary environ-ment, similar to the Unix environment or Nemesiss per-domain contexts.

    How is a name resolved by a components code? A call analogous toopen() is a possibility; rather than specifying an access mode and (im-plicitly) calling identity, as in Unix, the caller should specify an authorityand an interface type.4 Dynamic type checking can confirm whether thename resolves to an object exporting the specified interface, hence allowingsome level of type-safety guarantee.

    How are foreign objects accessed by the component code? An open() callmust return some kind of reference to the foreign interface. This wouldprobably include an unforgeable token identifying the interface referenceto the system i.e. a capability, or the analogue of a file handle. It mustthen be possible to invoke() named operations, and close() the inter-face. This raises many questions about how arguments and return valuesare represented, what other operations are required, and how componentlanguages might abstract away from these basic operations.

    How does a component export its own interfaces? In general this is doneby updating some naming environment which is accessible to potentialclients. A writable environment might be a subtype of the basic environ-ment, supporting bind() and (optionally) unbind() operations. Thesewould allow a local object to be exported into some widely-accessible en-vironment.

    How are the type systems of the naming language and component languageresolved? Clearly, a correspondence must be known for each componentlanguage, and the naming languages types must have some run-time rep-resentation within the component language. Note that the run-time sys-tem need only be ported once per language, and from that point will allowfree interoperability between that language and all other supported lan-

    guages. By contrast, current systems typically involve binding effort perlibrary as well as per language, although the necessary code can sometimesbe autogenerated by tools.

    4Note that this is a natural generalisation of Unixs open(), where the programmer specifiesan access mode but must assume other interface characteristics, such as support for particularioctl() or seek() operations. These unstated assumptions force the programmer to handleadditional error cases; making the assumptions explicit removes the need for this.

  • 8/3/2019 Statement Stephen Kell 200603

    8/12

    How is resource management performed? The open() and close() pat-

    tern puts bounds on the period of an objects use by each client. Thesecalls themselves may be hidden by the component languages usual re-source management constructs, e.g. scope-based as in C++, or collector-based as in Java. The usual problems of partial failure and resource leakagewill be subjects of research.

    How are access control and quality-of-service features integrated into themodel? Use of interface references naturally suggests a capability-basedapproach to access control; possible implementations will be a subject ofresearch. Quality-of-service information might be integrated into the no-tion of interface: when specifying a set of operations which the namedentity must perform, a client may annotate these with service-level re-quirements, enabling admission control to be performed during the call to

    open() analogously with type-checking.

    4 Implementation and Evaluation

    The discussion so far has identified the following requirements.

    The programming model must be naturally modular.

    Applications developed against the model should not be subject to sig-nificant performance penalties, relative to applications developed conven-tionally.

    There must be demonstrable, quantifiable improvements to modularity

    among applications developed using the model, compared to the nearestequivalent under conventional models.

    There should be demonstrable, quantifiable improvements to one or moreexternally-visible characteristics, i.e. reliability, security or provision forquality of service, among applications developed using the model.

    In addition to these, I suggest the following practical constraints.

    The model must be developed for an existing widely-used system, mostlikely GNU/Linux.

    It must be implemented so as to provide backwards compatibility and

    interoperability with legacy (i.e. conventionally-developed) applicationson the same system.

    It must be possible to show a transition path towards the new model forexisting applications.

    In order to achieve these, the following approaches may be helpful.

  • 8/3/2019 Statement Stephen Kell 200603

    9/12

    To prototype the model, a mock-up could be produced in a high-level

    language, or perhaps as a C library. Toy bindings for a variety of languagescould then be created as proof of concept. Note that this approach willnot provide actual modularity until dependency on the underlying systemcall interface is removed.

    A new class of process could be added to Linux, with a new set of systemcalls. These calls should functionally (but not syntactically) subsume allprevious interfaces, allowing communication with legacy processes butoffering improved modularity. This may be achieved through eliminationof implementation-specific interfaces, and unification of naming. However,it may not be possible to offer high performance without making extensivechanges to the Linux kernel.

    A tool could be developed which splits an existing program, say a mono-lithic application written in C, into a set of modules. This could be doneby static analysis on the dependencies between object files. Note thatthis does not address inter-process module boundaries, and reconstruc-tion of abstract, strongly-typed interfaces between components would beextremely difficult. However, it may be feasible in some limited cases,and is worthy of research. Some existing work on modularising monolithiccode may be helpful [33, 34].

    For evaluation purposes, some or all of the following will also be required.

    A rigorous definition of the kind of modularity under consideration, andone or more corresponding measures. This is remarkably difficult, and is

    not attempted here. Existing software measurement work, such as that ofFenton [31, 32], provides a useful starting point.

    Tools or methods to evaluate the measure on real software.

    Suitable measures for the chosen external characteristics, and the tools tomeasure them.

    Empirical data on the modularity (and other characteristics) of softwaredeveloped using the new model, either from deliberate reimplementationsof existing software or (preferably) experiments conducted on real pro-grammers asked to develop a piece of software using the new model anda set of existing components.

    5 Afterword

    I am currently working on a more detailed proposal, entitled Operating SystemSupport for Application Modularity, which I will forward to supplement myapplication in due course.

  • 8/3/2019 Statement Stephen Kell 200603

    10/12

    References

    [1] T. Roscoe, The Structure of a Multi-Service Operating System, PhDthesis, University of Cambridge Computer Laboratory, April 1995.

    [2] J.H. Saltzer, Naming and Binding of Objects, Lecture Notes in ComputerScience, vol. 60, pp. 99208, 1978.

    [3] T.A. Linden, Operating System Structures to Support Security and Reli-able Software, ACM Computing Survey, 8(4), pp. 409445, 1976.

    [4] G.S. Blair, G. Coulson, P. Robin, M. Papathomas, An Architecture ForNext Generation Middleware, Proceedings of the IFIP International Con-ference on Distributed Systems Platforms and Open Distributed Process-ing, 1998.

    [5] W. Emmerich, Distributed Component Technologies and their SoftwareEngineering Implications, Proceedings of the 24rd International Confer-ence on Software Engineering, 2002.

    [6] I.M. Leslie, D. McAuley, R. Black, T. Roscoe, P. Barham, D. Evers, R.Fairbairns and E. Hyden, The Design and Implementation of an OperatingSystem to Support Distributed Multimedia Applications, IEEE Journalon Selected Areas in Communications, 1996.

    [7] R. Pike, D. Presotto, S. Dorward, R. Flandrena, K. Thompson, H. Trickey,P. Winterbottom, Plan 9 From Bell Labs, Computing Systems, 1995.

    [8] M. Shaw, Procedure Calls Are the Assembly Language of Software Inter-

    connection: Connectors Deserve First-Class Status, ICSE Workshop onStudies of Software Design, 1993.

    [9] D.L. Parnas, On the criteria to be used in decomposing systems intomodules, Communications of the ACM, 15(12) , pp. 10531058, December1972.

    [10] D.R. Engler, M.F. Kaashoek, Exterminate All Operating System Abstrac-tions, Proceedings of the 5th IEEE Workshop on Hot Topics in OperatingSystems, 1995.

    [11] D.M. Ritchie, K. Thompson, The Unix Time-Sharing System, Commu-nications of the ACM, Communications of ACM, 7(7), July 1974

    [12] S.R. Kleiman, Vnodes: An Architecture for Multiple File System Typesin Sun UNIX, USENIX Association Summer Conference Proceedings, At-lanta, 1986.

    [13] J. Gosling, B. Joy, G. Steele, G. Bracha, The Java Language Specifica-tion, second edition, Addison Wesley, 2000.

  • 8/3/2019 Statement Stephen Kell 200603

    11/12

    [14] D. Garlan, R. Allen, J. Ockerbloom, Architectural Mismatch or Why Its

    Hard to Build Systems out of Existing Parts, Proceedings of the 17thInternational Conference on Software Enginneering, pp. 179185, Seattle,Washington, April 1995.

    [15] H. Afifi, L. Toutain, Methods for IPv4-IPv6 transition, Proceedings ofthe Fourth IEEE Symposium on Computers and Communications, p. 478,1999.

    [16] M.M. Swift, B.N. Bershad, H.M. Levy, Improving the Reliability of Com-modity Operating Systems, in Proc. 19th Symp. on Operating SystemsPrinciples (SOSP), October 2003.

    [17] A. Fraser, Orion: Named Flows With Access Control, invited talk, Uni-versity of Cambridge Computer Laboratory, November 2005.

    [18] J. Waldo, G. Wyant, A. Wollrath, S. Kendall, A Note On DistributedComputing, Sun Microsystems Technical Report SMLI TR-94-29, Novem-ber 1994.

    [19] Object Management Group, The Common Object Request Broker: Archi-tecture and Specification, OMG TC Document Number 91.12.1, Revision1.1, December 1991.

    [20] T. Lindholm, F. Yellin, The Java virtual machine specification, AddisonWesley, September 1996.

    [21] S. Lakin, S. Mount, R.M. Newman, Communication in ad hoc networksor: CORBA considered harmful, Workshop on Building Software for Per-vasive Computing at the 19th Annual ACM Conference on Object-OrientedProgramming, Systems, Languages, and Applications (OOPSLA04), Van-couver, Canada, October 2004.

    [22] B. Stroustrup, The Design and Evolution of C++, Addison Wesley, 1994.

    [23] D.A. Solomon, H. Custer, Inside Windows NT, second edition, MicrosoftPress, 1998.

    [24] C. Policroniades, Datom: A Proposal for an Alternative Storage SystemAPI, invited talk, University of Cambridge Computer Laboratory, August2005.

    [25] B.C. Smith, Procedural Reflection in Programming Languages, PhD the-sis, Mass. Inst. of Technology, January 1982.

    [26] R.M. Needham, Names, chapter in S. Mullender (ed.) Distributed Sys-tems, pp. 315-327, Addison Wesley, 1993.

    [27] R. Pike, P. Weinberger, The Hideous Name, USENIX Summer Confer-ence Proceedings 1985, pp 563-568.

  • 8/3/2019 Statement Stephen Kell 200603

    12/12

    [28] G. Hunt, J. Larus, M. Abadi, M. Aiken, P. Barham, M. Fahndrich, C.

    Hawblitzel, O. Hodson, S. Levi, N. Murphy, B. Steensgaard, D. Tarditi,T. Wobber, B. Zill, An Overview of the Singularity Project, MicrosoftResearch Technical Report MSR-TR-2005-135, October 2005.

    [29] B.N. Bershad, S. Savage, P. Pardyak, E.G. Sirer, M.E. Fiuczynski, D.Becker, C. Chambers, S. Eggers, Extensibility, safety and performancein the SPIN operating system, Proceedings of the fifteenth ACM Sympo-sium on Operating Systems Principles, 1995.

    [30] R. Milner, M. Tofte, R. Harper, D. MacQueen, The Definition of StandardML, MIT Press, revised 1997.

    [31] N. Fenton, Software Measurement: A Necessary Scientific Basis, IEEETransactions on Software Engineering, vol. 20, issue 3, pp. 199206, March

    1994.

    [32] N. Fenton, A. Melton, Deriving structurally based software measures,Journal of Systems and Software, vol. 12, issue 3, pp. 177187, July 1990.

    [33] L. Deri, Droplets: Breaking Monolithic Applications Apart, IBM Re-search Report RZ 2799, September 1995.

    [34] R. Schwanke, An intelligent tool for re-engineering software modularity,Proceedings of the 13th International Conference on Software Engineering,pp. 8392, May 1991.