RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1...

22
RECEIVED 1 MOLGEN Report SEP 8 1980 £. A. FEIGENBAUM Introduction This report is the response to Dr. Elke Jordan's request for information concerning 1) the needs of the national community of molecular biologists for the analysis of nucleotide sequence information and 2) what the MOLGEN group of the SUMEX-AIM facility at Stanford would be willing to provide towards that goal. This letter and its contents and appendixes has been prepared by the MOLGEN collaborators Douglas Brutlag, Associate Professor of Biochemistry, Laurence Kedes, Associate Professor of Medicine and Peter Friedland, Research Associate, Department of Computer Science. We have relied to a great extent for detailed projections, cost estimates and general experiential knowledge on the guidance of Tom Rindfleisch, Manager of the SUMEX system and on Ed Feigenbaum, Chairman of Computer Science Department, Principal Investigator of SUMEX. Many individual groups have expressed a need for computer facilities to handle and analyze the exponentially increasing amounts of nucleotide sequence information appearing in the literature. However it may not be clear .that a central facility readily available to all would not only alleviate this current problem, but would markedly stimulate innovative research. From our own experience and that of more than 60 collaborators nationwide that have used a guest account within the MOLGEN project, the availability of sophisticated nucleotide sequence analysis programs has catalyzed their research efforts and markedly increased their productivity. One good example involves the use of computer programs to analyze restriction fragment data in the construction of restriction maps. By using the computer, one only needs to perform the theoretical minimum number of digests (double enzyme digest in all pairwise combinations) rather than to perform many triple enzyme digests as well which are used when such maps are constructed by hand. More importantly, having powerful computer software has opened the way to many experiments which they would never have contemplated otherwise. We could cite the discovery of "TATA" boxes in front of eukaryotic genes; multiple pairing hypotheses for attenuation sequences; the development of "shot-gun" methods for nucleotide sequencing of very large genomes; as well as a host of other developments in molecular genetics would not have been possible without sophisticated pattern recognition programs. In the "shot-gun" sequencing methods developed in Cold Spring Harbor and MRC, the computer plays an essential role as a laboratory tool. We would expect that any DNA sequence analysis facility should have available at the minimum: ' 1) The most recently verified nucleotide sequences on line for instant access.

Transcript of RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1...

Page 1: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

RECEIVED 1MOLGEN Report

SEP 8 1980£. A. FEIGENBAUM

Introduction

This report is the response to Dr. Elke Jordan's request for informationconcerning 1) the needs of the national community of molecular biologists forthe analysis of nucleotide sequence information and 2) what the MOLGEN groupof the SUMEX-AIM facility at Stanford would be willing to provide towards thatgoal. This letter and its contents and appendixes has been prepared by theMOLGEN collaborators Douglas Brutlag, Associate Professor of Biochemistry,Laurence Kedes, Associate Professor of Medicine and Peter Friedland, ResearchAssociate, Department of Computer Science. We have relied to a great extentfor detailed projections, cost estimates and general experiential knowledge onthe guidance of Tom Rindfleisch, Manager of the SUMEX system and on EdFeigenbaum, Chairman of Computer Science Department, Principal Investigator ofSUMEX.

Many individual groups have expressed a need for computer facilities tohandle and analyze the exponentially increasing amounts of nucleotide sequenceinformation appearing in the literature. However it may not be clear .that acentral facility readily available to all would not only alleviate thiscurrent problem, but would markedly stimulate innovative research.

From our own experience and that of more than 60 collaboratorsnationwide that have used a guest account within the MOLGEN project, theavailability of sophisticated nucleotide sequence analysis programs hascatalyzed their research efforts and markedly increased their productivity.One good example involves the use of computer programs to analyze restrictionfragment data in the construction of restriction maps. By using the computer,one only needs to perform the theoretical minimum number of digests (doubleenzyme digest in all pairwise combinations) rather than to perform many tripleenzyme digests as well which are used when such maps are constructed by hand.

More importantly, having powerful computer software has opened the wayto many experiments which they would never have contemplated otherwise. Wecould cite the discovery of "TATA" boxes in front of eukaryotic genes;multiple pairing hypotheses for attenuation sequences; the development of"shot-gun" methods for nucleotide sequencing of very large genomes; as well asa host of other developments in molecular genetics would not have beenpossible without sophisticated pattern recognition programs. In the"shot-gun" sequencing methods developed in Cold Spring Harbor and MRC, thecomputer plays an essential role as a laboratory tool.

We would expect that any DNA sequence analysis facility should haveavailable at the minimum:

' 1) The most recently verified nucleotide sequences online for instant access.

Page 2: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

m

J

2MOLGEN Report

2) A sequence analysis program capable of at. leastsequence printing, restriction site finding and patternrecognition for homologies between and within sequences andfor regions of dyad symmetry.

3) Programs that would translate nucleotide sequences toammo acid sequences or vice versa and evaluate codon usagetables.

4) Tools for storing sequences produced by the usersthemselves and programs that facilitate the maintenance ofpersonal data banks.

5) Programs that would evaluate restriction fragmentlengths and prepare restriction maps for both linear andcircular structures.

6) Programs that would help the investigators in thedetermination of their sequences by representing restrictionmaps, to help in the planning of sequence strategy.

Additional tools of a more specific nature that could be developed bythe user community and could be added at a later date would include:

1) Alignment of evolutionarily related sequences in aminimum number of steps configuration in order to determinerelationships quantitatively.

2) Determination of evolutionary trees from relatednucleotide sequences from many organisms.

3) Programs for the prediction of RNA secondary structurebased on nucleotide sequence.

4) Programs to aide in the construction of hybridizationprobes specific for coding sequences for proteins of knownsequence.

s

5) Tools to help in the automatic or semi-automatic entry

#

of nucleotide sequence data and restriction enzyme digestiondata.

Page 3: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report

3

In the absence of any central facility, most of the. groups working onnucleotide sequences have developed their own protocols and computer programs,each reinventing the wheel literally tens of times across the country andaround the world. For instance, in the United States alone we know of atleast three separate implementations of the Seller's algorithm for alignmentof homologous sequences, none of which are readily exportable to a largenumber of other sites. Within the MOLGEN project we have consciously set outto obtain the best software from others and then improve upon that. By havingone central facility the problem of distribution becomes moot.

A national resource can provide a point of instant exchange of programsand algorithms as soon as they are developed without having to wait forpublication and reproduction. Even more importantly, we have found thathaving 50 other people using your computer programs is one of the very bestways of measuring their accuracy and utility.

Our initial and continuing aim is to insure that the molecularbiology/genetics community is served by the very best modern technologypossible in the area of computer science applications to DNA research. DougBrutlag and Larry Kedes, as molecular geneticists, have developed the verystrong opinion that a computer network and a staff, developed along the linesthat the Artificial Intelligence community has followed with SUMEX, is theimportant goal to attain. We believe Kedes made this point as forcefully aspossible at the recent National Institutes of Health meeting in Bethesda onestablishment of a national resource to deal with these matters.

Since the MOLGEN group has ongoing research and personal commitment inthe application of computers and computer networks to molecular genetics, wefeel a natural desire to assist our colleagues nationwide in pursuing thisgoal. Thus we are happy to offer whatever help we can in the interim and wewill watch with interest the long term proposals of the Steering Committee.

This short report consists of six central parts: i) Analysis of currentuse of the MOLGEN guest account by the national molecular genetics community,ii) Projections of current use into the future with guesses as to ultimateneeds, iii) Three possible interim computer hardware and staffing solutionsto cover what we would consider as minimal, moderate and ideal configurationsfor the national resource, iv) The resources required in the interimoperation, v) A plan for responsible and responsive organization andmanagement of the DNA facility, vi) The nature and value to the facility ofthe contributions of MOLGEN and SUMEX. Specific Budgets and some dataregarding current usage of MOLGEN guests are contained in appendixes to thisreport. This document describes how the MOLGEN group at Stanford, inconjunction with the administrative staff at SUMEX, views our possible role inthe interim solution to the formation of a national DNA analysis facility andand a large user community.

We are working on the assumption that the DNA sequence data base will be

Page 4: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report4

managed and collected by others (presumably Dayhoff and Goad) but that thecomputer facility will be the storage and distribution point of such data inaddition to answering the primary need for the analysis of the sequencesthemselves.

Current use of MOLGEN

Starting in January 1980, with the encouragement of SUMEX, the MOLGENproject made available, on a limited basis, several DNA sequence analysisprograms on line at SUMEX via a prepaid telecommunications network. Inaddition, our database of 200,000 bases represented 80% of the then knownpublished nucleotide sequences was also made available. By word of mouth, asmall number of users nationwide began logging in to SUMEX and using ourprograms.

Following is a table of historical usage data for the MOLGEN guest userssince the directory (GENET) was created in January :

Month CPU Hrs Conn Hrs File Pgs1/80 1.3 21.3 432/80 3.2 32.7 433/80 1.3 51.6 814/80 8.4 117.9 1955/80 9.2 104.5 1526/80 11.1 188.4 2397/80 19.2 342.9 217

Usage has doubled every month during the past quarter (Figures 1, 2, 3).As of early August, the number of daily logons is in the high teens (seeAppendix V) . More than 60 individuals have accessed the system (Appendix IV)and 24 of these have done so five or more times (between 5 and 54 times) . Thenumber of connect and cpu hours logged in July just for Genet users, is morethan any individual non-Stanford based SUMEX project— and some of theseprojects are the full time research efforts of very large groups. In July,GENET's profile became "visible" on the overloaded SUMEX network as our trialusers began to consume appreciable amounts of computer resources. This kindof usage is not compatible with the major focus of artificial intelligence forwhich SUMEX is funded. MOLGEN's guest activities will not be tolerated bySUMEX much longer.

It is very important to note that the guest accounts currentconfiguration has several serious impediments built into it that makeinteraction between MOLGEN/SUMEX and the users very inadequate. One would notknow that GENET is inadequate to hear the unsolicited praises of the users.But we at MOLGEN are aware of these deficiencies and of the much greaterutility of the full power of a modern, unrestricted computer network. Thus,

Page 5: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report5

in evaluating the projections of MOLGEN-like use against current experience,these impediments must be considered as having inhibited very severly thelevel of eventual use and user satisfaction. Despite this we believe theevidence supports the idea that users are enthusiastic and use the system indaily research efforts. The impediments include 1) a public account (GENET)shared by all users: this prohibits private data storage; 2) No ability toprotect data or^ output from being deleted, destroyed or accidently modified byothers; 3) Difficult message and communications between users due to the lackof personal electronic mailboxes; 4) A severe limitation on data storagecapacity (150 disk pages); 5) An incomplete data base (200k DNA bases with300k now known to exist [Brutlag and Kedes have never considered it animportant goal of their own research to collect the DNA sequence data]); 6) Nospecifically identified staff to interact with new and naive users to helpthem overcome the initial trivial errors experienced in logging on to anycomputer network.

These data are hard to project reliably because the usage growth hasreally just begun the last few months. We don't have a very good idea of whatthe saturation point will be. Thus estimates for the future will have, a widerange depending on how optimistic one is and how nonlinear the growth pattern.File space projection is probably least reliable since GENET users have beenseverly constrained as to permanent file storage and some of the largerdatabase work has not even begun.

Projections of Future Usage

Vie can only estimate the degree to which a facility like ours will beused in the future. We believe that the upward curve of usage over the pastfew months is the beginning of an exponential curve. There has been no formalpublished announcement or letter writing to solicit usage. If privateaccounts were given and and an annotated database available, we estimate thatessentially every laboratory dealing with DNA sequence analysis and asubstantial majority of laboratories involved with molecular genetics,evolution, and even protein sequencing would use the system. How manylaboratories is this, 300?, 1000? The National Institutes of Health andNational Science Foundation are in a better position to estimate this numberthan we. What happens when additional analytical programs are developed andare made available on such a system? The current programs are invaluable butbarely scratch the surface of the applications of computer technology to DNAsequence analysis and manipulation.

Alternative Interim Hardware Configurations

As we promised in phone conversations with Dr. Elke Jordan, we suggesthere three alternative hardware configurations to serve as models for theinterim national DNA facility: from a minimal to the best for the purpose.

Page 6: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report

6

These are hardware costs. They are not full budgets: those are found in theAppendixes.

The three configurations are all Digital Equipment Corporation machines.No other hardware would be economically compatible with the telecommunicationsnetwork and software at SUMEX and with which we are familiar. As alsorequested we provide some information on leasing of such hardware. As detailedbelow, direct leases don't exist and essentially consist of annual installmentpayments. DEC and Digital Leasing in San Francisco is the source of thisinformation.

There is no such thing as "rental" of DEC equipment, i.e., where onegets the hardware installed, pays a monthly charge for as long as one wants it(some of which may accrue toward purchase), and then sends it back whenfunding lapses or one decides on something better. All third party "leases"we have been able to find are full payout leases, i.e., time paymentarrangements. One agrees to pay for the equipment in a certain period oftime. At the end of that time one has a 10% residual payment due for fullownership. If one decides to cancel payments anytime during that period, thefull amount becomes due and payable (like a mortgage) . In such an agreement,the difference between purchase price and total lease cost ends up beingdefined as interest. Leasing would also require a commitment by the NationalInstitutes of Health for multi-year support.

For the record, based on current interest rates and expressed as % ofpurchase price, the lease charges are:

Payout Term (yrs)12 3 5

Monthly charge 9.25% 4.73% 3.25% 2.20%(% of purchase price)

Total cost at end of 121% 124% 127% 142%term (includes 10%residual buyout)

Accordingly, we do not feel that leasing of equipment will be a viablealternative to a purchased hardware solution. The following configurationdata and pricing are approximate and are not based on DEC quotations (for somereason DEC has a hard time computing purchase quotations these days) . Thusthe budget information based on them is not to be taken as a rigid commitment.Note that we have tried to be relatively conservative. For example, Stanfordnormally gets an 11% discount and sometimes more depending on the benevolenceof DEC. We have assumed for these calculations that our discount will onlyoffset the sales tax.

The three configurations are 1) a 2020 machine; 2) a stripped-down,

Page 7: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report7

expandable version of the ultimate machine, a 2060; and 3) the full 2060computer. In every case we have considered the desirability if not necessityof of open community access, private accounts for all users, and electronicmail and message service. The 2020 would handle the current GENET usage(based on July 1980) but cannot handle more. We have run an experiment todetermine what happens to the load average (a numerical indication of computerresponse time dependent on how much the machine is being used) when anexisting 2020 is loaded with 1 or more data analysis programs of the kind ourcurrent users run. By adding a fifth simultaneous user, the load average onthe 2020 rises to unacceptable heights and the machine bogs down. Thus weconsider the 2020 to be a poor solution and investment for the intendedpurpose. 2020 's cannot be upgraded or piggy-backed for future expansion. Ifthis machine proved to be the ultimate choice we would want to see rigidcontrols put on usage by the community: a first come first served basis,perhaps, with the fifth user just getting a "busy" signal. This is hardly theway to build a national community of communicating scientists who cannot evenlogon to read or send messages in such circumstances.

2020 Configuration:

2020 GKKS-10 processor, 512K memory, 1 RPO6 diskLA36 console, 16 TTY lines, TAU77 tape(assume 1 RPO6 = 256K mem and

225000

The 2060 computers are the best currently available machines for the DNAnational facility. The stripped down version will do the job in all respectsfor the next 2-3 years. Its advantages over the full 2060 are just initialcost since the stripped version can be expanded to behave like the fullversion. The speed and capacity of these computers is such that 10-20simultaneous heavy jobs can be run on them as well as 20-40 less taxing jobs(e.g. text editors, message programs etc.) without stretching the capacity orresponse time.

Stripped 2060 Configuration:

2060 PAKL-20 processor, 256K memory, 1 RPO6 diskLA36 console, 16 TTY lines 395000

TU77 tape 30000

425000

Full 2060 Configuration:

2060 PAKL-20 processor, 256K memory, 1 RPO6 diskLA36 console, 16 TTY lines 395000

2 TU77 tape 60000

TU4S = TAU77 in price)

Page 8: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report

8

750K memory2 RPO6 disks

12600068000220001200017000

2 RH2O channels32 TTY linesPrinter

700000

We want to establish at the outset that a principal investigator'sinterest in taking on the full responsibility of running these computers andmanaging a national DNA computer facility includes the availability ofsignificant access to the machines for his own research purposes. The currentmodel at SUMEX, and the one we feel is appropriate, is that the localcommunity should have access to 40% of the machine cycles (40% nationalcommunity, 40% local and 20% staff i.e. system programmers). Naturally thiswould not be possible if the 2020 is the machine decided upon. Aninvestigators only interest in the 2020 would lie in the fact that it will befully amortized after two years and would revert to the investigator. Webelieve that for many of the reasons outlined in the section below, such anarrangement could be mutually advantageous.

Resources

Each budget in the appendixes has a small staff commitment. These staffmembers are the fewest number we consider essential to operate and maintainthe proposed systems and the purely DNA oriented programs and user community.However, one must consider the staffs of other computer centers or nationalfacilities like SUMEX. An enormous expenditure of individual effort isrequired for any useful computer center to maintain the large amount ofsystems software required just to keep the computer and the telecommunicationsnetworks operational. An equally large effort goes in to maintaining anddeveloping software packages that add to the convenience, analytical andoperational capability of the system. Consider the necessity for the DNAfacility to have available statistical software packages, computational andmathematical modeling aids, text-editors for entering sequences and writing,search and compare programs, computer-language compilers and software forusers involved in program development etc. None of these utility packages canexist for very long simply imported from outside developers and "put-up on themachine" in an unsupported manner.

s

An important advantage of a community like Stanford, and SUMEX inparticular, is that there are staff and computer scientists in great numberwho do just those kinds of tasks and maintain (and debug) software. Thisresource would be available to the national DNA facility merely by itspresence at Stanford. The SUMEX administrative staff, and in particular itsdirector, Tom Rindfleisch, have dedicated their professional lives to themanagement of computer networks and facilities for the non-computer basedscientific community. Were any configuration of facilities operated by us atMOLGEN, Rindfleisch will be our manager.

Page 9: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report

9

For the interim solution, our discussions with SUMEX PI Ed Feigenbaumand with Tom Rindfleisch suggest that the SUMEX telecommunications networkcould handle the national DNA facilities network users as well and shunt themto the appropriate machine. Obviously final approval of such an arrangementmust pass between Feigenbaum, the SUMEX Management Committee and SUMEX 'sfunding agency, the BRP. Furthermore, if MOLGEN were convinced that any oneof the configurations proposed in this letter were going to be funded here asan interim solution, the SUMEX Advisory Committee has informed us that,subject to the approval of the BRP, the DNA Facility could begin IMMEDIATELYon borrowed Stanford resources while awaiting arrival and installation ofequipment (a 90-120 day process at least). A letter of support from the AIMexecutives is appended to this report.

In any event, the direct cost savings to the National Institutes ofHealth and National Science Foundation from the contribution of staff inputfrom SUMEX and the Stanford computer science community as a whole will proveto be invaluable. The scientific contribution of both MOLGEN and the staff ofSUMEX would make to the national DNA facility are discussed below.

Interim Management of the DNA Facility

A national facility should have national management. The PrincipalInvestigators of the national facility should be members of the ManagementCommittee (MC) along with 4-5 other geneticists and computer scientists.Members of the current steering committee are obvious choices. We feelstrongly that some senior member of the NIH/MSF staff responsible forrepresenting government in this venture should also serve on the MC. The MCwould be responsible for setting broad policy for usage and operations of thefacility. The MC would review the management policies of the Pi's.

The Pi's would have responsibility for all the daily management andstaff supervision of the facility. The Pi's would serve as executive officersand formulate policy for the MC consideration and would carry out MCdirectives.

The Pi's would be responsible for making an annual report to the MC andto the scientific community at large (as well as to funding agencies) .

s

The Value of MOLGEN to the National Community and the DNA Facility

Vhy would we as scientists want to devote a substantial part of our timeto run such a resource? What do we get out of it in terms of tangiblebenefits to help our research or accomplishments as scientists? Part of theformula for success of the SUMEX resource is a system located near the

Page 10: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report

10

organizing scientists. This affords needed controls for system developmentand incentives for the investment of faculty time. The incentive for NIH toinvest in an academic national DNA facility (rather than leasing time from acommercial service) is that NIH can capture the involvement of internationallyknown practicing scientists to guide the development and operation of a usercommunity oriented resource. By putting it at Stanford, they could takeadvantage of an environment already committed to resource sharing andexperienced in national resource operation. In this way, the goals of theresource are guided solely by what is needed to maximize its scientificbenefit and not to show a profit for some parent company. In thedevelopmental stages of the national DNA computer facility, this would seem tobe a prerequisite in order to engage the cooperation and participation of thebroadest and highest quality community.

These concepts and the MOLGEN group's interest in organizing a nationalfacility along the lines presented in this letter, have been enormouslystrengthened by the support MOLGEN has had from the SUMEX-AIM ExecutiveCommittee. In a letter appended to this document the nature of that supportis spelled out clearly. Certainly the MOLGEN group would have little interestin undertaking such a venture if it did not have the promise of the backing ofthe expertise of the SUMEX technical staff.

s

t

Page 11: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report11

Appendix I: DNA RESOURCE (2020 OPTION)

FROM 1/1981 TO 1/1982BUDGET FOR YEAR 1

PERSONNEL% SALARY

1 XXX2 XXX3 XXX4 XXX5 XXX6 XXX

PRINCIPAL INVESTIGATOR 10. 5167SYSTEM FROGRAMMER 30. 930030. 9300.USER CONSULTANT 100. 25833.ADMIN ASSIST 20. 4133.STUD SYST PROG 50. 4908.STUD SYST PROG 50. 4908.

********** SUBTOTAL SALARIES********** STAFF BENEFITS********** TOTAL PERSONNEL

54250.11345.

65595.

EQUIPMENT

101 DEC 2020 COMPUTER AND INSTALL102 ETHERNET, MODEMS, TERMINALS, ETC.

**********TOTAL OF EQUIPMENT

SUPPLIES

103 COMPUTER OPERATIONS104 OFFICE SUPPLIES105 ENGINEERING PARTS

**********TOTAL OF SUPPLIES

TRAVEL

107 DOMESTIC

**********TOTAL OF TRAVEL

OTHER

108 MAINT [DEC 2020]109 OFFICE TELEPHONES110 LOCAL DATAPHONES112 TECHNICAL SERVICES/REPRO./BOOKS113 SYSTEM AND PROGRAM DOCUMENTATION114 NETWORK COMMUNICATIONS117 COLLABORATIVE LINKAGES

**********TOTAL OF OTHER

********************<POTAL poR YEAR 1 369595.80964.

450559.INDIRECT COSTSADJUSTED TOTAL

Page 12: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report12

3-YEAR SUMMARY, 1/1981 THROUGH 12/1983

ITEM YEAR 1 YEAR 2 YEAR 3

PERSONNEL 54250. 59675. 65643.STAFF BEN. 11345.EQUIPMENT 230000.SUPPLIES 15000.

12658. 14320.5000. 5000.

15000. 11350. 11918.TRAVEL 1000. 1050. 1103.OTHER 58000. 73900. 86097.

SUBTOTAL 369595. 163633.92008.

184080.103862.IND. COSTS 80964.

TOTAL 450559. 255641. 287942.

3-YEAR TOTAL DIRECT COSTS3-YEAR TOTAL INDIRECT COSTS

717308.276834.

TOTAL FOR ENTIRE PERIOD 994142.

s

1

Page 13: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report

13

Appendix II: DNA RESOURCE (STRIPPED 2060 OPTION)

BUDGET FOR YEAR 1 FROM 1/1981 TO 1/1982FROM 1/1981 TO 1/1982PERSONNEL

% SALARY1 XXX2 XXX3 XXX4 XXX5 XXX6 XXX

PRINCIPAL INVESTIGATOR 10. 5167.SYSTEM PROGRAMMER 50. 15500.50. 15500.USER CONSULTANT 100. 25833.ADMIN ASSIST 30. 6200.STUD SYST PROG 50. 4908.STUD SYST PROG 50. 4908.

********** SUBTOTAL SALARIES********** STAFF BENEFITS********** TOTAL PERSONNEL

62517.13072.

75589.

EQUIPMENT

101 DEC 2060 COMPUTER (MINIMAL) AND INSTALL102 ETHERNET, MODEMS, TERMINALS, ETC.

**********TOTAL OF EQUIPMENT

SUPPLIES

103 COMPUTER OPERATIONS104 OFFICE SUPPLIES105 ENGINEERING PARTS

**********TOTAL OF SUPPLIES

TRAVEL

107 DOMESTIC

**********TOTAL OF TRAVEL

OTHER

108 MAINT [DEC 2060]109 OFFICE TELEPHONES110 LOCAL DATAPHONES112 TECHNICAL SERVECES/REPRO. /BOOKS113 SYSTEM AND PROGRAM DOCUMENTATION114 NETWORK COMMUNICATIONS117 COLLABORATIVE LINKAGES

**********TOTAL OF OTHER

*******************:MroTAL pQ£ YEAR 1 607589.103001.710590.

INDIRECT COSTSADJUSTED TOTAL

*

Page 14: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report14

3-YEAR SUMMARY, 1/1981 THROUGH 12/1983ITEM YEAR 1

PERSONNEL 62517.STAFF BEN. 13072.EQUIPMENT 430000.SUPPLIES 16000.16000.TRAVEL 1000.OTHER 85000.

SUBTOTAL 607589.IND. COSTS 103001.

TOTAL 710590.

YEAR 268768.14588.5000.

12400.1050.

101750.

203556.115164.

318720.

YEAR 375645.16501.5000.

13020.1103.

114840.

226109.128240.

354349.

3-YEAR TOTAL DIRECT COSTS3-YEAR TOTAL INDIRECT COSTS

1037254.346405.

TOTAL FOR ENTIRE PERIOD 1383659.

4

*

Page 15: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report

15

Appendix III: DNA RESOURCE (FULL 2060 OPTION)

BUDGET FOR YEAR 1 FROM 1/1981 TO 1/1982PERSONNEL

% SALARY1 XXX2 XXX3 XXX4 XXX5 XXX6 XXX

PRINCIPAL INVESTIGATOR 10. 5167SYSTEM PROGRAMMER 100. 31000100. 31000.USER CONSULTANT 100. 25833.ADMIN ASSIST 50. 10333.STUD SYST PROG 50. 4908.STUD SYST PROG 50. 4908.

********** SUBTOTAL SALARIES********** STAFF BENEFITS********** TOTAL PERSONNEL

82150.17178.

99328.

EQUIPMENT

101 DEC 2060 COMPUTER AND INSTALL102 ETHERNET, MODEMS, TERMINALS, ETC.

**********TOTAL OF EQUIPMENT

SUPPLIES

103 COMPUTER OPERATIONS104 OFFICE SUPPLIES105 ENGINEERING PARTS

**********TOTAL OF SUPPLIES

TRAVEL

107 DOMESTIC

**********TOTAL OF TRAVEL

OTHER

108 MAINT [DEC 2060]109 OFFICE TELEPHONES110 LOCAL DATAPHONES112 TECHNICAL SERVECES/REPRO. /BOOKS113 SYSTEM AND PROGRAM DOCUMENTATION114 NETWORK COMMUNICATIONS117 COLLABORATIVE LINKAGES

**********TOTAL OF OTHER

icic-k-k-k-k-kitieic-kickickieick-k-k^fyYpj^ FOR YEAR 1 952328.143450.

1095778.INDIRECT COSTSADJUSTED TOTAL

*

Page 16: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report

16

3-YEAR SUMMARY, 1/1981 THROUGH 12/1983ITEM YEAR 1

PERSONNEL 82150.STAFF BEN. 17178.EQUIPMENT 705000.SUPPLIES 18000.18000.TRAVEL 1000.OTHER 129000.

SUBTOTAL 952328.IND. COSTS 143450.

TOTAL 1095778.

YEAR 290365.19169.5000.

14500.1050.

147450.

277534.158070.

435604.

YEAR 399402.21684.5000.

15226.1103.

162324.

304738.173844.

478582.

3-YEAR TOTAL DIRECT COSTS3-YEAR TOTAL INDIRECT COSTS

1534600.475364.

TOTAL FOR ENTIRE PERIOD 2009964.

4

0

Page 17: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report17

Appendix IV MOLGEN Guest Account Users

NUMBER OF TIME AND DATEOF FIRST LOGONLOGONS NAME AFFILIATION

18 HOLMES SUNYA Sunday, March 16, 1980 09:45:50bob sege Yale university Monday, March 17, 1980 07:27:33hugo martinez ucsf Monday, March 24, 1980 12:24:44howard goodman ucsf Monday, March 24, 1980 12:48:33Perry Nisen Albert Einstein Thursday, March 27, 1980 13:17:01JAKE MAIZEL NIH Friday, April 11, 1980 12:52:26BILL PEARSON JOHNS HOPKINS/MICROBILOGY Friday, April 18, 1980 09:53:50brian fristensky Cornell Monday, April 21, 1980 14:09:38HANS LEHRACH EMBL HEIDELBERG,GERMANY Saturday, April 26, 1980 13:19:59ALLEN PLACE, JOHNS HOPKINS UNIVERSITY Friday, May 9, 1980 11:06:45SCHROEDER UW3ENETICS Wednesday, May 14, 1980 12:08:12FRED BLATTNER UW3ENETICS Wednesday, May 14, 1980 12:34:27ROD BROWN UTAH STATE UNIVERSITY Thursday, May 15, 1980 13:09:16DAVID W. MOUNT UNIVERSITY OF ARIZONA Monday, May 19, 1980 13:43:48FOTIS KAFATOS HARVARD Friday, May 23, 1980 11:34:59JEAN-PIERRE DUMAS CNRS, PARIS FRANCE Wednesday, June 4, 1980 02:44:55DUSKO EHRLICH, UNIVERSITE DE PARIS. Thursday, June 5, 1980 13:36:18DAN DAVISON STONY BROOK Tuesday, June 10, 1980 07:41:20Jeffrey s. haemer mcdb, v of Colorado Saturday, June 21, 1980 11:22:19TOM GINGERAS COLD SPRING HARBOR Sunday, June 22, 1980 08:08:06PAUL ROTHBERG SUNY AT STONY BROOK Tuesday, June 24, 1980 13:24:03Ron Reeder MBL Wednesday, June 25, 1980 07:07:55TOM KELLY JOHNS HOPKINS Wednesday, June 25, 1980 13:42:20CLYDE HUTCHISON U OF NORTH CAROLINA Friday, June 27, 1980 17:43:10ALLAN MAXAM Harvard Medical School Tuesday, July 1, 1980 17:17:53benz Albert Einstein Thursday, July 3, 1980 06:56:50marians Albert Einstein Thursday, July 3, 1980 09:46:14WrayTing baylor col of mcd Thursday, July 3, 1980 14:00:46dirgen A. Einsrein Sunday, July 6, 1980 09:33:57ROBIN GUTELL UCSC,HARRY NOLLER'S LAB Monday, July 7, 1980 23:49:48MIKE KUEHN SUNY AT STONY BROOK,NY Wednesday, July 9, 1980 15:17:10John abelson ucsd Thursday, July 10, 1980 11:24:05

7115461383122123117219411413730222812621 Margaret dayhoff nbrf Monday, July 14, 1980 18:10:45

MARK PTASHNE HARVARD Wednesday, July 16, 1980 07:22:52IRWIN TESSMAN PURDUE UNIVERSITY Saturday, July 19, 1980 13:05:39brad kosiba brandeis Saturday, July 19, 1980 13:16:191

23125 RICK FIRTEL UCSD Monday, July 21, 1980 12:11:59

waiter goad lasl Monday, July 21, 1980 12:15:2314 , STEFAN GUEST Tuesday, July 22, 1980 06:57:41

hung brandeis Tuesday, July 22, 1980 17:04:24SCHWARTZ NBRF Thursday, July 24, 1980 10:46:55

254 HRCHEN ATLAS OF PROTEINS Friday, July 25, 1980 08:31:34

'"STEPHEN BARNES WASHINGTON UNIVERSITY Monday, July 28, 1980 09:35:21ANDREW TAYLOR, UNIVERSITY OF OREGON Monday, July 28, 1980 20:01:32

,GERSHENFELD STANFORD—PATHOLOGY Tuesday, July 29, 1980 17:43:01ANNETTE ROTH JHU JHU Thursday, July 31, 1980 13:12:17

1122832 mcl simon ucsd Saturday, August 2, 1980 15:21:365 M. BOGUSKI WASH U MED SCHOOL BIOCHEM Tuesday, August 5, 1980 14:43:14

KATHLEEN TRIMAN U OF OREGON Friday, August 8, 1980 17:45:341

Page 18: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report

18

14

STONER BRANDEIS BIOCHEM Wednesday, August 13, 1980 06:50:42wydro A. Einstein Thursday, August 14, 1980 10:18:50

Page 19: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report

19

Appendix V: Usage of MOLGEN Guest Account

Day of ofYear Logins

Day ofWeek

75 1 SatSunMonTueWedThuFriSatSunMonTueWedThuFriSatSunMonTueWedThuFriSatSunMonTueWedThuFriSatSunMonTueWedThuFriSatSunMonTueWedThuFriSatSunMonTueWedThuFri

76 277 278 179 280 081 082 083 384 085 086 287 088 089 090 091 092 0 April 193 094 095 196 197 098 099 7100 10101 4102 5103 0104 4105 0106 4107 4108 0109 2110 1111 1112 3113 s 0114 1115 1116 1117 ■"!118 3119 2120 \121 2122 1 1 May123 1

Page 20: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

MOLGEN Report

20

124 2 SatSunMonTueWedThuFriSatSunMonTueWedThuFriSatSunMonTueWedThuFriSatSunMonTueWedThuFriSatSunMonTueWedThuFriSatSunMonTueWedThuFriSatSunMonTueWedThuFriSatSunMonTueWed

125 0126 3127 2128 4129 0130 5131 1132 1133 0134 3135 7136 3137 4138 6139 0140 2141 4142 5143 3144 4145 0146 0147 3148 10149 7150 8151 10152 5153 6 1 June154 8155 3156 4157 2158 10159 1160 1161 7162 7163 4164 3165 6166 4167 s 2168 8169 4170 7171 "-3172 _ 3173 ,3174 1175 8176 4177 6

Page 21: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

21MOLGEN Report

178 11 ThuFriSatSunMonTueWedThuFriSatSunMonTueWedThuFriSatSunMonTueWedThuFriSatSunMonTueWedThuFriSatSunMonTueWedThuFriSatSunMonTue

179 9180 2181 2182 1183 6 1 July

14184185 15186 5187 1188 7189 6190 7191 12192 9193 17194 1195 6196 5197 8198 6199 Air

Airconditioningconditioning

0 Down due toDown due to200 0

201 15202 9203 18204 18205 7206 7207 11208 4209 2210 19211 17212 13213 11214 15 August 1215 7216 5217 18218 17

s

*

Page 22: RECEIVED SEP 8 1980 - Stanford Universityzt002ty1231/zt002ty1231.pdf · MOLGEN Report RECEIVED 1 SEP8 1980 £.A.FEIGENBAUM Introduction This report isthe response toDr.Elke Jordan'srequest

22STANFORD UNIVERSITY MEDICAL CENTERSTANFORD, CALIFORNIA 94305 " (415)497-5141

Stanford University School

of

MedicineSUMEX Computer ProjectDepartments of Geneticsand Computer ScienceProfessor Edward A. FeigenbaumPrincipal Investigator August 29, 1980

Laurence H. Kedes, M.D.Department of MedicineStanford University School of MedicineStanford, California 94305

Dear Professor Kedes:

This letter is a summary of the action taken by the SUMEX-AIM ExecutiveCommittee during its meeting on August- 14, 1980, in support of yourproposed national DNA sequence analysis resource. The presentation youand Professor Brutlag gave on the rapid growth of a user community amongmolecular biologists based on the experimental availabilityof the MOLGENcomputer programs through the SUMEX resource is a -dramatic and gratifyingdemonstration of the utility of these programs and the effectiveness ofa SUMEX-like resource for facilitating communication and collaborationamong scientists.

The AIM Executive Committee has expressed strong support for your planto establish the proposed DNA sequence analysis resource proximate tothe existing SUMEX resource. It has approved the idea of devoting aportion of the talents and energies of the SUMEX management and staff tothis new project subject to the conditions below. Such a cooperativeeffort would be of mutual advantage in assisting your proposed resourcethrough sharing of administrative, system, and operational support andbeing a significant demonstration of the impact of SUMEX-AIM-developedartificial intelligence programs within the molecular genetics community.

However, as you know, the existing SUMEX-AIM resource has been overloadedfor some time and is unable to provide more than initial experimentalsupport for defining the needs of your community within its currentcapacity. The scope of the usage you envision is far beyond what SUMEXcan provide and its more operational character is tangential to the SUMEXmandate as a research resource for the development of new biomedical AIapplications. Thus, this endorsement is contingent upon the proposedDNA sequence analysis resource providing the needed additional hardwareand incremental administrative, technical, and operations staff to serveits community. These costs will be minimized by its relationship to theexisting SUMEX resource.

Very truly yours,

Edward A. FeigenbaumPrincipal InvestigatorSUMEX-AIM Computer ResourceAIM Executive Committee

-t

The SUMEX AIM project is a shared resource

for

research in

artificial

intelligence inmedicine supportedbythe"Biotechnology Resources Procram, NationalInstitutes

of

Health

cc: D.L. Brutlag