8/6/2019 Commercial Bio in for Ma Tics
1/20
1
BIOTECHNOLOGYREVIEW
1 3 M A R C H 2 0 0 0
J A S O N R E E D , P H D .( 2 1 2 ) 5 1 4 - 2 3 4 1
J R E E D @ O S C A R G R U S S . C O M
Trends in Commercial Bioinformatics
Genome ofH. influenzae (TIGR)
KEY POINTS:
Roughly defined, bioinformatics technology is
the backbone computational tools and databasesthat support genomic and related research.
The spectacular rise of the commercial genomicsindustry and the broadening application ofgenomic techniques in biology and medicine hascreated a commercial market for bioinformaticssoftware, hardware and services.
By some estimates, the total market forbioinformatics tools and services, includingcustom databases, could exceed $2.0 billionwithin five years.
In our opinion, bioinformatics technology willbecome an increasingly important competitivedifferentiator for public and private life sciencecompanies going forward.
Bioinformatics is becoming a directly investibletheme. By our estimation, there are now morethan 50 companies which offer bioinformaticsproducts and services of various kinds toexternal customers. Most of these are privatecompanies, but we would not be surprised to seea number of the more mature players go publicin the next 12 months.
8/6/2019 Commercial Bio in for Ma Tics
2/20
Commercial Bioinformatics
2
COMMERCIAL BIOINFORMATICS?
Introduction. The purpose of this document isto provide an overview of the rapidly emergingfield of commercial bioinformatics. We
assume the reader has at least a baselineunderstanding of genomic technologies, andhow they are now being implemented incommercial drug discovery. For purposes ofthis review, we define bioinformatics as thebackbone computational tools and databases thatsupport genomic and related research, whichbroadly encompasses the study of DNAstructure/function, gene expression and proteinproduction/structure/function. The spectacularrise of the commercial genomics industry andthe broadening application of genomictechniques in biology and medicine has created acommercial market for bioinformatics software,hardware and services.
A Flood of DNA Sequence Data. Theinitiation of large-scale genomic researchprojects roughly a decade ago engendered an
intensive effort to create related informationmanagement and analysis tools, largely drivenby academic computer scientists associated withthe institutions involved. One of the first andmost important problems encountered was howto acquire, store and analyze massive amounts ofDNA sequence information. Reliable, high-throughput sequencing methods perfected in thepast few years are now churning out vastquantities of information --- from completegenomes of several bacteria and archaea(bacteria -like organisms that live in extremeconditions: a third kingdom of life) up to amostly complete sequence of humanchromosome 22, completed in late 1999.
Partial List of Completely Sequenced Genomes
Genome
Size
(MM base pairs) Est. Genes* Comp let ed Relevance
ArchaeaAeropyrum pernix K1 1.67 2,694 1999 Potential source of novel enzymes, etc.Archaeoglobus fulgidus 2.18 2,407 1997 Potential source of novel enzymes, etc.Methanobacteriumthermoautotrophicum
1.75 1,869 1997 Potential source of novel enzymes, etc.
Pyrococcus abyssi 1.77 1,765 1999 Potential source of novel enzymes, etc.Pyrococcus horikoshii 1.74 2,064 1998 Potential source of novel enzymes, etc.
BacteriaAquifex aeolicus 1.55 1,522 1997 Potential source of novel enzymes, etc.Bacillus subtilis 4.21 4,100 1997 Represents sporulating Gram-positive bacteriaCampylobacter jejuni 1.64 1,654 2000 Food-borne pathogenChlamydia trachomatis 1.04 894 1998 Human pathogenChlamydia pneumoniae 1.23 1,052 1998 Human pathogenEscherichia coli 4.64 4,289 1998 Key model organism; human pathogenHaemophilus influenzae 1.83 1,709 1995 Human pathogen; first free-living organism to
have genome completely sequencedHelicobacter pylori 1.67 1,553 1997 Major cause of stomach ulcersHelicobacter pylori J99 1.64 1,491 1999 AnotherH. pylori strainMycobacterium tuberculosis 4.41 3,918 1998 Causes tuberculosisMycoplasma genitalium 0.58 480 1995 Genome is interesting because it is very smallMycoplasma pneumoniae 0.82 677 1996 Leading cause of walking pneumoniaRickettsia prowazekii 1.11 834 1998 Causes epidemic typhusSynechocystis PCC6803 3.57 3,169 1996 Should help us understand photosynthesis
Treponema pallidum 1.14 1,031 1998 Causes venereal syphilisThermotoga maritima 1.86 1,846 1999 Potential source of novel enzymes, etc.Ureaplasma urealyticum 0.75 611 2000 Sexually transmitted pathogen
EukaryotaCaenorhabditis elegans ~97.0 ~19,000 1998 Worm a key model organismSaccharomyces cerevisiae 12.07 5,885 1996 Yeast a key model organismHuman Chromosome 22** 33.46 600+ 1999 First human chromosome to be fully
sequenced
Source: NCBI; *excludes tRNA and rRNA genes; **euchromatic region
8/6/2019 Commercial Bio in for Ma Tics
3/20
Commercial Bioinformatics
3
Growth in GenBank. GenBank, a major publicrepository of DNA sequence data, has grown toinclude roughly 4.86 million individualsequence records (representing about 3.86
Growth of GenBank
0
1
2
3
4
5
6
1982 1984 1986 1988 1990 1992 1 99 4 1996 1998
Sequences(millions)
-
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
BasePairsofDNA
(millions)
Sequences
Base Pairs
Source: NCBI
billion base pairs), up from 0.56 million recordsin 1995 (0.38 billion base pairs). At the time ofthis writing, GenBank contained the full andpartial genome sequences of over 670 differentorganisms, including 27 complete genomes (6archaea, 19 bacteria and 2 eukaryotes).
Organisms Represented in GenBank
Source: NCBI
This DNA sequence has been deposited inGenBank by a whole host of internationalacademic and government research groups, aswell as by commercial concerns. Almost allcompanies conducting genomic research, suchas Incyte, Human Genome Sciences, MillenniumPharmaceuticals, Myriad Genetics and GenomeTherapeutics, have sequenced stretches of
human and other organisms DNA. Some of thisprivately-generated sequence data has beensubmitted to public databases like GenBank,while some remains proprietary.
Human Genome Sequence. A publicconsortium now plans to produce a draft versionof the human genome sequence by mid-2000,and completely sequence the genome by 2003.This effort will be spearheaded by researchgroups at Washington University (St. Louis),Baylor College of Medicine, the WhiteheadInstitute and the Sanger Center in England. ThisHuman Genome Project consortium has greatlyspeeded timelines from its original genomecompletion date of 2005 due to: (1) thedevelopment of robust high throughput
sequencing techniques, and (2) competition fromCelera, a division of PE Corp that intends tocomplete a lower fidelity, but still very useful,copy of the human genome by 2001 (with a draftversion expected out this year). In addition,both the public consortium and private concernsincluding Celera are sequencing all or parts ofthe genomes of model organisms, like themouse, hoping to gain additional insights intogenomic structure and function.
Milestones in Human Genome (~3,000 mm bp) Sequencing
Genetic Map CompletePhysical Map Complete
High-Throughput Sequencing
Technology
Largely Perfected
Chromosome 22 Sequence Finished Late 1999
Celera Draft Sequence Expected Mid 2000
Public Consortium Draft Sequence 90% by Mid 2000
Celera Final Sequence Expected 2001
Public Consortium Final Sequence Expected 2003
Other:
Human Sequence Variation Data In Progress
Gene Identification In Progress
Functional Analysis In Progress
Sequence of Key Model Organisms:
E.Coli (4.6 mm bp) Complete
Yeast (12 mm bp) Complete
C. Elegans (97 mm bp) Complete
Drosophila (160 mm bp) Raw Sequence FinishedLate 1999 (Celera)
Mouse (2,600 mm bp) Expected 2002 (Celera)Rice (400 mm bp) Expected 2001 (Celera)
Source: NIH, NCBI, Celera
485
135
54 10
Viruses
Eukaryota
Bacteria
Archaea
8/6/2019 Commercial Bio in for Ma Tics
4/20
Commercial Bioinformatics
4
Data Generation is Accelerating. Datageneration is only accelerating at this timebecause: (1) many genomes besides human arebeing completely sequenced, and (2) high-throughput methods are being perfected in other
areas like gene expression assays, protein-protein interaction assays, in vitro or cell-basedassays used in drug development and a host ofclinically related genetic tests.
Data Source Drivers
DNA Sequence High-throughput technique s:--shotgun sequencing
--hybrid shotgun/map-based methods--automated capillary electrophoresis
--genome maps of various kinds
Lots of medically/biologically interestingorganisms
Gene Expression Data Researchers have now found lots ofgenes:
--cDNA sequencing, SAGE, etc.
--genomic sequencing w/ gene IDtechniques
Microarrays can assay thousands ofgenes at one time (10,000+)
Very important for finding/validatingdrug targets
Often involves model organisms
Protein Data High-throughput techniques:
--2-D gels
--mass spectrometry
--protein -protein interaction assays (yeast2 & 3 hybrid assays; also can be onchips)
--various new in vitro and cell-basedassays
Structure determination/predictionmethods becoming more powerful
Very important for finding/validatingdrug targets
Often involves model organisms
Medical Genetics Data High-throughput techniques:
--SNP/polymorphism chip-based assays
--SNP/polymorphism mass spec assays--SNP/polymorphism electrophoresisassays
Enables tailoring drugs to pat ients viagenetic profile (pharmacogenomics)
Enables more efficient patient selectionfor clinical trials
Enables disease predisposition testing
Source: Oscar Gruss Research
Bioinformatics is Becoming Critical to LifeScience R&D. If we take the massivegeneration of biological data as a starting place,bioinformatics technology enables the extractionof information that can be used in commercial
drug discovery, clinical diagnostics, agriculturalbiotechnology and other applications.Currently, this includes three areas: (1) tools thatsupport laboratory experiments; (2) the design,implementation and integration of biologicaldatabases; and (3) various analytical tools todetermine via computer vs. experiment thingslike gene location within a chromosome, findingsimilar genes or proteins from other species anddetermining the 3-D structure and function ofdifferent proteins. These analyses can enable orgreatly accelerate drug target identification
efforts, drug lead validation and optimization,pharmacogenomic studies and many otherbiotech applications.
Bioinformatics Technology Involves:
Design, Implementation and Integration of Biological Databases
Aligning Protein and DNA Sequences
Tools That Support Laboratory Experiments
Assembling DNA Sequence Fragments and
Creating Genomic Maps
Recognizing and Annotating DNA Sequence Features
Phylogenetic Comparisons
Predicting RNA Secondary Structure
Modeling Protein Structure and Dynamics
These Techniques are Very Useful in R&D Related to:
Commercial Drug Discovery
Improving Clinical Trials
Medical Diagnostics
Pharmacogenomics Tailoring Medicines to Individuals
Industrial Biotech
Agbiotech
Source: Oscar Gruss Research; Durbin, et al; Altman
A Comment About the Current Limitations of
Bioinformatics Technology: A comprehensiveassessment of the strengths and weaknesses ofthe different bioinformatics technologies isbeyond the scope of this report. Suffice it to saythat many of these methods are still very muchin development. However, these tools can bevery powerful when applied in the correctanalytical context, and in conjunction with theappropriate experimental validation. To further
8/6/2019 Commercial Bio in for Ma Tics
5/20
Commercial Bioinformatics
5
clarify how these tools are used, we provide asimplified example below.
BIOINFORMATICS TECHNOLOGY: AN
EXAMPLE
Sequence an Interesting Genomic Region.You might start by finding the DNA sequence ofa chromosomal region, speculating that itcontains genes from an interesting biologicalpathway. Your ultimate goal might be to find anundiscovered drug target. To rapidly assemble acontiguous DNA sequence that might have oneor more complete genes you might use theshotgun technique. This technique relies onpiecing together many small, electrophoreticallydetermined stretches of DNA sequence, each say
500 base pairs in length, into a much largercontinuous stretch, say 2 million base pairs inlength. To do this in a (mostly) automatedfashion, you will need special programs likePHRED to read the raw DNA sequence, andPHRAP to assemble the small pieces into a largestretch of sequence. You will probably alsoneed to use a laboratory informationmanagement system (LIMS) to track yoursequencing project, as the process involvesmany individual samples and pieces of data thatneed to be stored and organized.
Find Genes and Other Interesting Features inYour Genomic Sequence. You now have aDNA sequence that is a string of several millionsymbols (like ...AAGGCTGAGTGCTAAGCGCGCG), or a few strings of several hundredthousand symbols if you cannot put it alltogether (a common problem). You want to findregions that correspond to genes and perhapsregulatory sequences that control when thegenes are turned on and off. You might start byusing a program called BLAST (Basic Local
Alignment Search Tool) to search the public orcommercial DNA sequence databases to see ifany stretches of your 2 million base pairsequence match previously identified genesequences. To do this faster you might usespecial computer hardware known as anaccelerator, such as the DeCypher systemfrom TimeLogic. You might use a moresophisticated software package like GENIE,GENSCAN or GRAIL to better identify where
in your sequence the gene starts and stops, andwhere regulatory regions might be. These genefinding programs are not completely reliable inmost cases, but are useful when used inconjunction with other methods. In this regard,
you would also want to search public and privateexpressed sequence tag (EST) databases. ESTsare short sequences (several hundred base pairs)experimentally determined to correspond to realgenes. If an EST matches part of your sequence,it is likely that that part contains a real gene.This is a very powerful technique, as you dontactually have to know what the gene does, andbecause available EST databases are now verycomprehensive.
Compare Your Gene to Other Known Genes
to Find More Information. As an extension ofthe process described above, you might want tokeep comparing regions of your sequence thatyou now think correspond to a gene with otherknown genes from human and with genes fromother organisms. If your gene is similar to ahuman gene of known function, say an enzymeof some kind, your gene might perform the samefunction and be structurally similar. To do this,you want to continue to use pairwisealignment algorithms (like BLAST and Smith-Waterman) to search both public and private
databases. Comparing your gene to similargenes in other organisms (say genes from amouse and a fish, if they are known) can helpyou find important regulatory and functionalregions, among other things, because these tendto be evolutionarily conserved. This is onereason that the Human Genome ProjectConsortium and Celera are sequencing modelorganisms like mouse and fruit fly, in addition tohuman. To compare the sequences of severalgenes, you can use multiple alignmentalgorithms such as CLUSTAL W, or MSA.
There are many other tools to compare the DNAand protein sequence of your new gene withother known genes and protein motifs.
Displaying this Information is Critical toUnderstanding It. As you have collected all ofthis information about your DNA sequence,where the genes are, where the regulatorysequences might be, what corresponds to an ESTsequence, etc., you will need to display it
8/6/2019 Commercial Bio in for Ma Tics
6/20
Commercial Bioinformatics
6
graphically. One of the best ways to do this is ina browser fashion that lets you easilyinvestigate each piece of information via mouseclick or something similar. A good display cantell you what information might be lacking and
where the different sources of information agreeor disagree.
Next Steps. In the example above, we mighthave been able to find out a great deal about thefunction, structure and pathway of action of ourgene via computer tools. This might tell us thatthe gene produces a protein that could beimportant in a disease process. Therefore wehave gone from information about achromosomal region to a potential drug target.
This information can help us more efficiently
design future experiments, or make someexperiments unnecessary.
Going forward, we might want to usemicroarrays to investigate the expression of ourgene under different conditions (in response todifferent chemicals, etc.). There is now greatinterest in making databases of this type of geneexpression information, so you might not haveto conduct the experiment yourself. Examplesinclude the gene expression data availablecommercially from GeneLogic and Incyte, and a
host of academic and government researchgroups now developing free gene expressiondatabases like Stanford University, theWhitehead Institute and the U.S. NationalCenter for Biotechnology Information (NCBI,part of the NIH).
For the purposes of drug development, youddefinitely be interested in which other proteinsinteract with the protein from your gene. It mayturn out that another, structurally dissimilarprotein in the pathway would be a better drug
target for some reason. (Myriads ProNettechnology, or CuraGens protein-proteininteraction databases are powerful,commercially available tools to find this type ofinformation.)
Finally, you might attempt to model thestructure of your protein, and how it interactswith a drug molecule. This might tell you whichchemical class of molecules would be the most
promising drug candidates. It should be notedthat modeling 3-D protein structures andprotein-small molecule interactions are some ofthe toughest problems in computational biology.Companies like Structural Bioinformatics and
IBM are working on these kinds of problems, asare many other commercial and academicgroups.
With this admittedly oversimplified example,weve hoped to demonstrate the following pointsabout bioinformatics technology:
1. In one form or another, it is ubiquitousin genomics research.
2. It can involve lots of database searching.The more high-quality informationavailable, the more powerfulbioinformatics can become.
3. It can require many different algorithmsand analyses.
4. Integrating and displaying theinformation is key.
5. Bioinformatics wont replace
experiments, but can greatlystreamline and enable the discovery
process.
An integrated system comprised of databases,analysis algorithms and display tools isdescribed in the schematic below.
Data Viewer
Administration Interface
Data Analysis FunctionsInternal Control Functions
Old Data New Data Public Data
Data Viewer
Administration Interface
Data Analysis FunctionsInternal Control Functions
Old Data New Data Public Data
8/6/2019 Commercial Bio in for Ma Tics
7/20
Commercial Bioinformatics
7
Integrating these elements in an easily navigablesystem, whether it be a desktop program or anenterprise wide IT system, is highly desirable inmost commercial and in many large-scaleacademic research efforts.
Commercial Applications of BioinformaticsAre Numerous. To understand the commercialrelevance of these technologies, one need onlyconsider that all of the public and private sectorgenomics research now being conducted reliesheavily on bioinformatics tools of the kinddescribed above. In the future, bioinformaticstools should see extensive use in all the key lifescience R&D markets, including thepharmaceutical and biotechnology industries,
agricultural biotechnology, and in governmentand academic research. Penetration ofbioinformatics techniques into these marketsshould be driven by the following factors:
1. The pressure to rapidly organize,integrate and mine data is enormous: itcosts a lot to produce, and competitiveand patent concerns are an issue.
2. Maturation of tools should make themeasier to use.
3. Life sciences R&D organizations are
becoming more receptive to a paradigmshift in research techniques (i.e.genomics, R&D outsourcing, etc.), duein large measure to the insufficiency ofcurrent methods (product output perresearch $).
But partly offset by:
1. The fact that experienced bioinformaticspeople are relatively scarce.
2. Lack of universal compatibilitystandards for tools and databases.
3. Applications can be very complex andheterogeneous, thus the developmenttime/cost is often high.
4. In some cases, the capital expenditure tosupport in-house capability is quitelarge, plus constant service expendituresof some type will probably be required.
THE COMMERCIAL BIOINFORMATICS
MARKET
Market Structure
By our estimation, there are now more than 50companies that offer bioinformatics productsand services of various kinds to externalcustomers. In surveying the industry, we findsurging volume growth, particularly amongindustry leaders, and an acceleration in thenumber of corporate deals and othercollaborations. We believe this reflects theexplosive growth in genomic and relatedresearch techniques, plus the weaknesses ofavailable analysis tools and databases. By someestimates, the total market for bioinformaticstools and services, including custom databases,could exceed $2.0 billion within five years.
There remain a number of significantchallenges in this market, however. Over thepast few years, the customer base willing to paybig dollars for a customized bioinformaticssolution, large biopharma, has been relativelyconcentrated (perhaps fewer than 50 customers)and the largest players have mostly satisfiedtheir own needs with in-house bioinformaticsexpertise. Further, publicly available tools anddatabases are ubiquitous and becoming easier touse and are more integrated. Commercialsolutions that add substantial value tend to becomplex with longer development cycles thantraditional software products. On the otherhand, the individual applications can be veryheterogeneous, so it can be hard to leverage aspecific product across many applications. Thenet result is that development time/cost can behigh, but each individual market can berelatively narrow. Notably, the recentdissolution of the high-profile bioinformaticsstartup Molecular Applications Group (MAG)
can be traced to these issues. Somebioinformatics companies have responded tothese hurdles by reorienting their businessmodels. For example, Pangea Systems recentlychanged from being an enterprise IT solutionprovider (a low-volume, high-price business) tobeing an e-bioinformatics portal (now calledDoubleTwist.com), which is targeted mostlytoward small- and mid-size customers (high-
8/6/2019 Commercial Bio in for Ma Tics
8/20
Commercial Bioinformatics
8
volume, low-price). Compugen, whichoriginally produced special computer hardwarefor DNA and protein sequence analysis, nowoffers expanded services such as DNAmicroarray design and an e-bioinformatics portal
(called LabOnWeb.com).
Because bioinformatics is becoming such a
critical enabling technology in modern
biological research, we strongly feel thatcommercial solutions will ultimately reachtheir multi-billion dollar sales potential. It isan open question as to how the industry willrespond to the current problems of marketheterogeneity and small customer base. It is ourfeeling that consolidation, driven by the largerplayers, and cross-platform standardization
will be major themes going forward.
Below we outline the bioinformatics marketstructure and growth outlook in further detail:
Product Categories
There are several identifiable bioinformaticsproduct categories: proprietary databases ofvarious kinds, software and hardware analysistools of varying comprehensiveness, completeenterprise IT systems that manage and integratedatabases and analysis tools, and, finally, customservices. In time these distinctions shouldbecome blurred as tools, databases andinformation management systems become moreintegrated.
We see the following technical hurdles asimportant to bioinformatics product design, andthe solutions which most effectively addressthem should have a competitive advantage:
1. The data to be organized/analyzed is
very heterogeneous.
2. Analysis tools are rapidly evolving.
3. Seamlessly integrating public, legacyand new data is a must.
4. Many users are not software/computerexperts.
Customer Base
1. Pharmaceutical and biotechnologycompanies will use bioinformaticstechnology in all stages of the drugdiscovery process, from drug targetidentification through lead validation andoptimization to drug response profilingand clinical diagnostics.
Key driver: This is the most importantcustomer base in terms of dollar value, dueto competitive and patent expiry pressuresand the fact that biopharma hastraditionally spent heavily on R&D. Largepharmaceutical companies are alreadyprodigious customers of outsourcedgenomics R&D that includes a lot of
bioinformatics content. This includespartnerships like those betweenMillennium Pharmaceuticals andAstraZeneca, Bayer, Pfizer, and WyethAyest for example, or Human GenomeSciences deals with SmithKline Beecham,Schering-Plough, Merck KGaA, etc.There are many more examples. Webelieve that the middle market of smallerpharmas and mid- to small-size biotechs(perhaps 300+ companies, excludinggenomics companies) is relatively
underpenetrated for a variety of reasons,including smaller R&D budgets and ahistorical emphasis on more traditionaldrug discovery technologies.
Key constraint: As discussed above,leading pharma companies that have madea substantial commitment to genomicsresearch have already developed asubstantial bioinformatics infrastructure.This includes companies like SmithKline,Glaxo, Merck, Novartis and others. These
types of customers are potentially thehighest value segment, but displacing a bigpharmas custom-tailored bioinformaticsgroup with an external product is onlypractical in the case of niche or especiallyhigh-value applications. Lion Biosciencesof Heidelberg, Germany, has been perhapsthe most successful bioinformaticscompany to date in penetrating big pharmawith a high-value infrastructure deal. In
8/6/2019 Commercial Bio in for Ma Tics
9/20
Commercial Bioinformatics
9
1999, Lion entered into a five-year alliancewith Bayer AG worth up to $100 million,in which Lion will provide and supportbioinformatics IT systems to speedBayers drug discovery programs. The
deal included the establishment of LionsU.S. subsidiary Lion Bioscience Researchin Cambridge, MA.
2. Agbiotech/Industrial Biotech companieshave already started to use genomicsresearch methods extensively in the studyof crops and livestock, with the hope ofimproving crop/livestock yields, increasingpesticide/herbicide resistance, improvingtaste/nutritional content, etc.
Key driver: We expect that the wideninguse of gene expression assays andproteomics assays of various kinds inag/industrial biotech will sharply increasethe need for bioinformatics technology inthis market. The increased pace of wholegenome sequencing of thermophilicorganisms and other extremeophiles(like that of M. thermoautotrophicusbyGenome Therapeutics), which mayprovide a novel source of enzymes forindustrial processes, should support this
trend.
Key constraint: This market segment hastraditionally been slower than biopharmato embrace genomic techniques. Webelieve that the current negative publicperception of genetically modifiedorganisms (GMOs) will remain a factor, atleast in the near future.
3. Academic research groups , particularlythose associated with the international
effort to sequence the human genome,have pioneered most of the genomic andbioinformatics techniques in use today andshould continue to be heavy users.
Key driver: Cutting edge research intogene expression, proteomics and medicalgenetics will increasingly rely on the useof bioinformatics tools, in our opinion.
Key constraint: Outside of the large,government coordinated projects like theHuman Genome Initiative, individualresearchers tend to be less intensive datagenerators/users than commercial
concerns. As a result, their bioinformaticsneeds are often satisfied by a combinationof publicly available tools, commercialdesktop solutions (like those availablefrom InforMax or GCG) and home grownsystems.
4. Other markets include governmentagencies like the U.S. Patent andTrademark Office, which recentlypurchased a Compugen DNA and proteinsequence analysis computer system to aid
in patent searches. We expect lawenforcement agencies like the FBI and thearmed services to compile and makeincreasing use of genetic profile databasesin the future. However, in the near term,these non-commercial markets willprobably remain small in terms of totaldollar value.
Participants in the Field
1. Academic and government groups which
produce publicly available tools anddatabases, some of which are quitecomprehensive and sophisticated.Examples are the many tools and databasesmaintained by the NCBI, includingGenBank. Appendix B at the end of thisreport contains a partial list of availablebiological databases, many of which arepublic free-access databases. Below is aschematic of NCBIs Entrez databasebrowser system:
Source: NCBI
8/6/2019 Commercial Bio in for Ma Tics
10/20
Commercial Bioinformatics
10
2. Genomic and pharmacogenomiccompanies that offer databases andservices to outside customers, as well asfor their own internal use. This includescompanies like Incyte, Celera, CuraGen
and GeneLogic. We would also includebiotech instrumentation companies like PEBiosystems in this category.Instrumentation products usually includedata management and analysis tools ofvarying utility.
3. Large pharma, biotech and agbiocompanies which develop their own in-house databases and bioinformaticsexpertise. As discussed above, some ofthe largest pharmaceutical companies have
well-developed bioinformaticsinfrastructures, and thus are difficult foroutside providers to penetrate. Thesituation is much more favorable in mid-size to smaller companies, however thesefirms often cannot justify extremely largeexpenditures on infrastructure unless itaddresses a core research focus.
4. Traditional computer, electronictechnology and IT services companiesthat offer products and services for the
bioinformatics market. This includescompanies like Compaq, SunMicrosystems, Silicon Graphics, IBM andAgilent Technologies. For the most part,these companies have taken thecomplementary approach of providinginfrastucture that supports varioussolutions by specialized bioinformaticsproviders. We expect these companies to
be an increasingly important
competitive force in genomics andbioinformatics. For instance, Compaq
has a major strategic alliance with Celerato provide integrated bioinformaticshardware, software, networking andservice solutions. IBM is conductingresearch into high value data mining andprotein structure determination methods.IBM offers a variety of enterprise-wide ITsolutions for the life science market, andrecently initiated a collaboration withNetGenics. Through its partnership with
Rosetta Inpharmatics, Agilent offers anenterprise-wide gene expression analysissolution that includes software andhardware and is a rival to AffymetrixsGeneChip system.
5. More or less pure play bioinformaticscompanies that offer products and servicesto external customers. Some of thesecompanies are trying to leverage theirbioinformatics expertise toward in-houseefforts like drug discovery, and are thussomewhat like traditional genomicscompanies (see category #2 in this list).Most of these are private companies, butwe would not be surprised to see a numberof the more mature players go public in the
next 12 months. Some, but by no meansall, of the prominent companies in thisspace are listed in Appendix A of thisreport, and the market outlook for thissegment is discussed in more detail below.
More on Market Size and Growth Outlook
Given the nascent nature of this industry and thelarge number of private players in the field, thecurrent market for external products and servicesis hard to determine. Surveys of the 50 or so
bioinformatics tool and database companies bymarket research groups like Frontline and Frost& Sullivan, for example, put the current marketfor bioinformatics databases, products andservices at roughly $300 million, with about halfof the annual sales by data suppliers and half ofthe sales by tool/IT providers of various kinds.These groups and other industry observersbelieve that this market could grow to $1.5-2.0billion over the next five years. These estimatesexclude some significant internal spending on ITinfrastructure by pharmaceutical and
biotechnology companies that is bioinformaticsrelated, and could be as large as $2.0+ billionannually. As discussed above, also excluded aremost of the project-based R&D collaborationsbetween pharma/agbio companies and genomicscompanies that include bioinformatics content,and which total well over $1.0 billion on acumulative basis over the past 3-5 years.
8/6/2019 Commercial Bio in for Ma Tics
11/20
Commercial Bioinformatics
11
Without more publicly disclosed financials thesemarket size estimates are hard to pin down.However, we find them reasonable if notconservative, in that they imply visible 25%-35% top-line growth over the next few years,
which is consistent with our own survey of keyindustry players. Conceptually, as discussedabove, we believe bioinformatics will becomeessential to many if not all life science R&Dactivities, and the market for commercialsolutions of various kinds should increase inproportion.
Most of todays sales come from the databaseproviders and the software/hardware toolsuppliers, with complete enterprise IT solution
just emerging (perhaps 20% or less of thecurrent market). Over the next few years, theenterprise IT solution should garner a largerproportion of the industrys total sales, driven bya great need for integration of the various
databases/tools with R&D efforts. Also, weexpect growth in the sales of data providers to besupported by the emergence of new types ofdata, namely gene expression and proteomicsdata. However, commercial database sales arelikely to be constrained by the increasing publicavailability of well-annotated genome sequencefrom human and other organisms, and by theincreasing public availability of other types ofdata.
8/6/2019 Commercial Bio in for Ma Tics
12/20
Commercial Bioinformatics
12
Appendix A -- Representative Bioinformatics Database, Software, Hardware and Service Providers
Concentrated Bioinformatics Plays Ticker Description
Compugen Private Originally specialized in computer hardware/software designed to
accelerate bioinformatics algorithms. Business model nowmoving more toward an internet portal concept, plus proprietaryand collaborative gene discovery.
DoubleTwist.com Private An internet portal business model, which includes on-line accessto a variety of bioinformatics/biotech tools, databases and otherproducts. DoubleTwist changed its name from Pangea Systems in
1999.
eBioinformatics Private Originally a spin-off from the Australian National GenomicInformation Service. eBioinformatics provides a variety of web-based bioinformatics tools and databases.
Genomica Private Provides enterprise-wide bioinformatics systems and services.
Relationships include AstraZeneca, Glaxo Wellcome, Parke Davisand PE Biosystems.
Informax Private Desktop and enterprise-wide bioinformatics products. Customerbase of over 60 pharma companies, 250 biotechs and 500
universities.
Lion Bioscience Private Provides enterprise-wide bioinformatics systems and services.Lion has interest in leveraging technology for proprietary R&D.Lions $100 MM alliance with Bayer AG largest bioinformatics
deal to date.
Molecular Mining Private Molecular Mining produces high value-added data miningalgorithms than can be used to filter gene expression and othertypes of data.
Neomorphic Private Bioinformatics tools to mine and visualize genomic information.
Collaborations with key academic and commercial genomictechnology leaders.
Netgenics Private Provides enterprise-wide bioinformatics systems and services.Relationships include Pfizer, Abbott, Wyeth Ayerst and IBM.
Oxford Molecular OMG.LN Comprehensive business model that includes bioinformatics and
related fields of cheminformatics and computational chemistry. In1997, acquired Genetics Computer Group, maker of the popularWisconsin desktop bioinformatics product.
Paracel Private Specialized computer hardware/software designed to accelerate
bioinformatics algorithms. Relationships with many academic andcommercial research groups, including PE Corp.
Silicon Genetics Private Tools for gene expression analysis and visualization, plus otherdata-mining applications.
SpotFire Private SpotFire offers data visualization software for gene expression aswell as products for non-life sciences industries.
Structural Bioinformatics Private Bioinformatics tools and databases with a special focus on proteinstructural information, a critical component of rational drug design
TimeLogic Private Specialized computer hardware/software designed to accelerate
bioinformatics algorithms. Configurable hardware architectureoffers competitive advantage in some cases. Relationships withkey academic and commercial research groups, including Stanford
University, Roche, Bristol-Myers and Novartis.
8/6/2019 Commercial Bio in for Ma Tics
13/20
Commercial Bioinformatics
13
Genomic/Biotechnology Companies
with Bioinformatics ProductsTicker Description
Celera CRA A division of PE Corp founded to rapidly sequence the human andother genomes, with the intent to supply high value-added genomicdata to life science collaborators. Celera has the worlds mostpowerful high-throughput DNA sequencing capability .
CuraGen CRGN CuraGen conducts project driven genomic R&D for propriety use andin collaboration with life science partners. CuraGen offerscollaborators a variety of well-integrated databases, bioinformaticstools and services.
GeneLogic GLGC Offers GeneExpress gene expression database products, and otherservices to the life sciences industry.
Human Genome Sciences HGSI HGSI practically founded the commercial genomics industry with itslandmark 1993 gene database deal with SmithKline Beecham. HGSInow has collaborations with more than ten commercial partners inareas including gene databases, antibodies, gene therapy andmicrobial genomics.
Incyte INCY A pioneer commercial bioinformatics database company. Provideshigh-value gene expression, proteomics and other data/analysis toolsto pharmaceutical and academic subscribers.
Myriad Genetics MYGN Myriads core competence is therapeutic and diagnostic productdevelopment via genomic and proteomic methods. Myriad offers apublic version of its high-quality protein interaction database, ProNet,through DoubleTwist.com and t hrough its own Myriad-ProNet.comwebsite.
PE Biosystems PEB A division of PE Corp, the premier provider of DNA sequencers andother life science instrumentation. The PE Informatics division offersa variety of software products to life science and other customers.
Rosetta Inpharmatics Private Rosettas core competence is obtaining gene expression and otherdata in a setting relevant to drug/product discovery for proprietary use
and in collaboration with life science partners. Through itscommercialization partner Agilent, Rosetta offers an enterprise-widegene expression analysis solution that includes software andhardware.
Computer, Electronic Technologyand IT Services Companies Offering
Bioinformatics Products
Ticker Description
Agilent Technologies A In 1999, Agilent entered into a strategic collaboration with RosettaInpharmatics to make and sell gene expression analysis systems,including hardware and software.
Compaq CPQ Compaq has a major strategic alliance with Celera to provideintegrated bioinformatics hardware, software, networking and servicesolutions.
IBM IBM IBM is conducting research into high value-added data mining andprotein structure determination methods. IBM offers a variety ofenterprise-wide IT solutions for the life science market, and recentlyinitiated a collaboration with NetGenics.
Silicon Graphics SGI SGI offers visual computing and high-performance computer systems.SGI systems support a wide variety of bioinformatics softwareapplications.
Sun Microsystems SUNW Sun systems support a wide variety of bioinformatics softwareapplications.
8/6/2019 Commercial Bio in for Ma Tics
14/20
Commercial Bioinformatics
14
Appendix B --Representative Molecular Biology Databases
(from A. Baxevanis inNucleic Acids Research , 2000, V.28, No.1)
Major Sequence Repositories
GenBank All known nucleotide and protein sequences; International NucleotideSequence Database Collaboration
EMBL Nucleotide Sequence Database All known nucleotide and protein sequences; International Nucleotide
Sequence Database CollaborationDNA Data Bank of Japan (DDBJ) All known nucleotide and protein sequences; International Nucleotide
Sequence Database Collaboration
Genome Sequence Database (GSDB) All known nucleotide and protein sequencesTIGR Gene Indices Non-redundant, gene-oriented clustersUniGene Non-redundant, gene-oriented clusters
Comparative Genomics
Clusters of Orthologous Groups (COG) Phylogenetic classification of proteins from 21 complete genomesXREFdb Cross-referencing of model organism genetics with mammalian phenotypes
Gene ExpressionASDB Protein products and expression patterns of alternatively-spliced genesAxeldb Gene expression in Xenopus
BodyMap Human and mouse gene expression dataEpoDB Genes expressed in vertebrate RBCFlyView Drosophila development and genetics
Gene Expression Database (GXD) Mouse gene expression and genomicsKidney Development Database Kidney development and gene expressionMAGEST Ascidian (Halocynthia roretzi) gene expression patterns
Mouse Atlas and Gene ExpressionDatabase
Spatially-mapped gene expression data
PEDB Normal and aberrant prostate gene expression
Tooth Development Database Gene expression in dental tissue
TRIPLES TRansposon-Insertion Phenotypes, Localization and Expression inSaccharomyces
Gene Identification and Structure
Ares Lab Intron Site Yeast spliceosomal intronsCOMPEL Composite regulatory elements
CUTG Codon usage tablesEID Protein-coding, intron-containing genesEPD Eukaryotic POL II promoters
ExInt Exon-intron structure of eukaryotic genesIDB/IEDB Intron sequence and evolutionPLACE Plant cis -acting regulatory elements
PlantCARE Plant cis -acting regulatory elements
TransTerm Codon usage, start and stop signalsTRRD Regulatory regions of eukaryotic genes
YIDB Yeast nuclear and mitochondrial intron sequences
Genetic Maps
GeneMap '99 International Radiation Mapping Consortium human gene map
G3-RH Stanford G3 and TNG radiation hybrid mapsGB4-RH Genebridge4 (GB4) human radiation hybrid mapsGDB Human genes and genomic maps
DRESH Human cDNA clones homologous to Drosophila mutant genesGenAtlas Human genes, markers and phenotypes
8/6/2019 Commercial Bio in for Ma Tics
15/20
Commercial Bioinformatics
15
HuGeMap Human genome genetic and physical map dataIXDB Physical maps of human chromosome X
Radiation Hybrid Database Radiation hybrid map data
Genomic Databases
AceDB Caenorhabditis elegans, Schizosaccharomyces pombe and human sequences
and genomic informationFlyBase Drosophila sequences and genomic informationMouse Genome Database (MGD) Mouse genetics and genomics
Saccharomy ces Genome Database (SGD) Saccharomyces cerevisiae genomeAmmtDB Metazoan mitochondrial DNA sequencesArabidopsis Database (AtDB) Arabidopsis thaliana genome
CropNet Genome mapping in crop plantsCyanoBase Synechocystis sp. genomeEcoGene Escherichia coli K-12 sequences
EMGlib Completely sequenced bacterial genomes and the yeast genomeGOBASE Organelle genome databaseHIV Sequence Database HIV RNA sequences
Human BAC Ends Database Non-redundant human BAC end sequencesINE Rice genetic and physical maps and sequence dataMendel Database Database of plant EST and STS sequences annotated with gene family
informationMitBASE Mitochondrial genomes, intra-species variants, and mutantsMitoDat Mitochondrial proteins (predominantly human)
MITOMAP Human mitochondrial genomeMITONUC/MITOALN Nuclear genes coding for mitochondrial proteinsMITOP Mitochondrial proteins, genes and diseases
Munich Information Center for ProteinSequences (MIPS)
Protein and genomic sequences
NRSub Bacillus subtilis genome
Phytophthora Genome Initiative Database Oomycete sequences and genetic maps
RsGDB Rhodobacter sphaeroides genomeTIGR Microbial Database Microbial genomes and chromosomes
ZFIN Zebrafish genetics and development; mutant and wild-type linesZmDB Maize genome database
Intermolecular Interactions
Database of Ribosomal Crosslinks (DRC) Ribosomal crosslinking dataDIP Catalog of protein-protein interactionsDPInteract Binding sites for Escherichia coli DNA-binding proteins
Metabolic Pathways and Cellular Regulation
Kyoto Encyclopedia of Genes andGenomes (KEGG)
Metabolic and regulatory pathways
EcoCyc Escherichia coli K-12 genome, gene products and metabolic pathwaysENZYME Enzyme nomenclatureEpoDB Genes expressed during human erythropoiesis
FlyNets Drosophila melanogaster molecular interactionsKlotho Collection and categorization of biological compoundsLIGAND Enzymatic ligands, substrates and reactions
RegulonDB Escherichia coli pathways and regulationUM-BBD Microbial biocatalytic reactions and biodegradation pathways primarily for
xenobiotic, chemical compounds
WIT2 Integrated system for functional curation and development of metabolicmodels
8/6/2019 Commercial Bio in for Ma Tics
16/20
Commercial Bioinformatics
16
Mutation Databases
Online Mendelian Inheritance in Man
(OMIM)
Catalog of human genetic and genomic disorders
ALFRED Allele frequencies and DNA polymorphismsAndrogen Receptor Gene Mutations
Database
Mutations in the androgen receptor gene
Asthma and Allergy Database Genetics of allergy and asthma, including linkage studies and mutation dataAsthma Gene Database Linkage and mutation studies on the genetics of asthma and allergy
Atlas of Genetics and Cytogenetics inOncology and Hematology
Chromosomal abnormalities in cancer
BTKbase Mutation registry for X-linked agammaglobulinemia
Cytokine Gene Polymorphism Database Cytokine gene polymorphisms, in vitro expression and disease-associationstudies
Database of Germline p53 Mutations Mutations in human tumor and cell line p53 gene
DbSNP Single nucleotidepolymorphismsGRAP Mutant Databases Mutants of family A G-Protein Coupled Receptors (GRAP)Haemophilia B Mutation Database Point mutations, short additions and deletions in the Factor IX gene
HAMSTeRS Hemophilia A mutation database
HGBASE Intragenic sequence polymorphismsHIV-RT HIV reverse transcriptase and protease sequence variation
Human Gene Mutation Database (HMGD) Known (published) gene lesions responsible for human inherited diseaseHuman PAX2 Allelic Variant Database Mutatio ns in human PAX2 geneHuman PAX6 Allelic Variant Database Mutations in human PAX6 gene
Human Type I and Type III CollagenMutation Database
Human type I and type III collagen gene mutations
HvrBase Primate mtDNA control region sequences
iARC p53 Database Missense mutations and small deletions in human p53 reported in peer-reviewed literature.
KinMutBase Disease-causing protein kinase mutations
KMDB Mutations in human eye disease genesMmtDB Mutations and polymorphisms in metazoan mitochondrial DNA sequences
Mutation Spectra Database Mutations in viral, bacterial, yeast and mammalian genesNCL Mutations Mutations and polymorphisms in neuronal ceroid lipofuscinoses (NCL) genesp53 Databases Human p53 and hprt mutations; transgenic lacZ and transgenic/bacterial lacI
mutations
PAHdb Mutations at the phenylalanine hydroxylase locusPMD Compilation of protein mutant dataRB1 Gene Mutation Database Mutations in the human retinoblastoma (RB1) gene
Ribosomal RNA Mutational Database 16S and 23S ribosomal RNA mutation databaseSV40 Large T-Antigen Mutant Database Mutations in SV40 large tumor antigen gene
Pathology
FIMM Functional molecular immunology data (diseases, antigens, peptides andHLA binding sites
Mouse Tumor Biology Database (MTB) Mouse tumor names, classification, incidence, pathology, genetic factorsPEDB Sequences from prostate tissue and cell type-specific cDNA libraries
Protein Databases
AARSDB Aminoacyl-tRNA synthetase sequences
DatA Annotated coding sequences from ArabidopsisDExH/D Family Database DEAD-box, DEAH-box and DExH-box proteinsEndogenous GPCR List G protein-coupled receptors; expression in cell lines
ESTHER Esterases and [alpha]/[beta] hydrolase enzymes and relativesFUNPEP Low-complexity or compositionally-biased protein sequencesGenProtEC Escherichia coli genes, gene products and homologs
8/6/2019 Commercial Bio in for Ma Tics
17/20
Commercial Bioinformatics
17
GPCRDB G protein-coupled receptorsHistone Sequence Database Histone and histone-fold sequences and structures
HIV Molecular Immunology Database HIV epitopesHomeobox Page Information relevant to homeobox proteins, classification and evolutionHomeodomain Resource Homeodomain sequences, structures, and related genetic and genomic
informationHUGE Large (>50 kDa) human proteins and cDNA sequencesIMGT Immunoglobulin, T cell receptor and MHC sequences
InBase Intervening protein sequences (inteins) and motifsKabat Database Sequences of proteins of immunological interestLGIC Ligand-gated ion channel sequences, alignments and phylogeny
Membrane Protein Database Membrane protein sequences, transmembrane regions and structuresMEROPS Peptidase sequences and structuresMHCPEP MHC-binding peptides
NRR Steroid and thyroid hormone receptor superfamilyOlfactory Receptor Database Sequences for olfactory receptor-like moleculesOoTFD Transcription factors and gene expression
Peptaibol Peptaibol (antibiotic peptide) sequences
PhosphoBase Protein phosphorylation sitesPKR Protein kinase sequences, enzymology, genetics, and molecular and structural
propertiesPPMdb Arabidopsis plasma membrane protein sequence and expression dataProlysis Proteases and natural and synthetic protease inhibitors
PROMISE Prosthetic centers and metal ions in protein active sitesProtein Information Resource (PIR) Non-redundant protein sequence databaseReceptor Database (RDP) Receptor protein sequences
Ribonuclease P Database RNase P sequences, alignments and structuresSENTRA Sensory signal transduction proteinsSWISS-PROT/TrEMBL Curated protein sequences
TRANSFAC Transcription factors and binding sitesWnt Database Wnt proteins and phenotypes
Protein Sequence Motifs
BLOCKS Protein sequence motifs and alignmentsPROSITE Biologically-significant protein patterns and profilesPfam Multiple sequence alignments and hidden Markov models of common protein
domainsO-GLYCBASE Glycoproteins and O-linked glycosylation sitesPIR-ALN Protein sequence alignments
PRINTS Protein squence motifs and signaturesProClass Families defined by PROSITE patterns and PIR superfamiliesProDom Protein domain families
ProtoMap Automated hierarchical classification of SWISS-PROT proteinsSBASE Annotated protein domain sequences
SMART Signalling domain sequencesSYSTERS Protein clusters
Proteome Resources
Aaindex Physicochemical properties of peptides
REBASE Restriction enzymes and associated methylasesSWISS-2DPAGE 2D-PAGE images and reference mapsYeast Proteome Database (YPD) Saccharomyces cerevisiae proteome
8/6/2019 Commercial Bio in for Ma Tics
18/20
Commercial Bioinformatics
18
Retrieval Systems and Database Structure
KEYnet Keywords extracted from EMBL and GenBank
Virgil Database interconnectivity
RNA Sequences
5S Ribosomal RNA Databank 5S rRNA sequences
ACTIVITY Functional DNA/RNA site sequencesCollection of mRNA-like non-codingRNAs
Non-protein-coding RNA transcripts
Database on the Structure of Large SubunitRibosomal RNA
Alignment of large subunit ribosomal RNA sequences
Database on the Structure of Small Subunit
Ribosomal RNA
Alignment of small subunit ribosomal RNA sequences
Guide RNA Database Guide RNA sequencesIntronerator RNA splicing and gene structure in Caenorhabditis elegans
Non-canonical Base Pair Database RNA structures containing rare base pairsPLMItRNA Plant mitochondrial tRNAs and tRNA genesPseudobase Information on RNA pseudoknots
Ribosomal Database Project (RDP) rRNA sequences, alignments, and phylogeniesRNA Modification Database Naturally modified nucleosides in RNASELEX_DB Selected DNA/RNA functional site sequences
Small RNA Database Direct sequencing of small RNA sequencesSRPDB Signal recognition particle RNA, protein, and receptor sequencesTmRDB tmRNA (10Sa RNA) sequences
tmRNA Website tmRNA (10Sa RNA) sequencestRNA Sequences tRNA and tRNA gene sequencesUTRdb 5' and 3' UTRs of eukaryotic mRNAs
Viroid and Viroid-Like RNA Database Viroid and viroid-like RNA and vHDV sequencesYeast snoRNA Database Yeast small nucleolar RNAs
Structure
PDB Structure data determined by X-ray crystallography and NMRCATH Hierarchical classification of protein domain structuresSCOP Familial and structural protein relationships
ASTRAL Analysis of protein structures and their sequencesBioImage Searchable database of multi-dimensional biological imagesBioMagResBank NMR spectroscopic data from proteins, peptides and nucleic acids
CSD Crystal structure information for organic and metal organic compounds.Database of Macromolecular Movements Descriptions of protein and macromolecular motions, including moviesDecoys 'R' Us Computer-generated protein conformations based on sequence data
HIC-Up Structures of small molecules ('hetero-compounds')HSSP Structural families and alignments; structurally-conserved regions and
domain architecture
IMB Jena Image Library Visualization and analysis of three-dimensional biopolymer structures
ISSD Integrated sequence and structural informationLPFC Library of protein family core structures
MMDB All three-dimensional structures, linked to NCBI Entrez systemMODBASE Comparative protein structure modelsNDB Nucleic acid-containing structures
PDB-REPRDB Representative protein chains, based on PDB entriesPRESAGE Protein structures with experimental and predictive annotationsProtein Motions Database Motions of protein loops, domains and subunits
ProTherm Thermodynamic data for wild-type and mutant proteinsRESID Protein structure modifications
8/6/2019 Commercial Bio in for Ma Tics
19/20
Commercial Bioinformatics
19
Transgenics
Cre Transgenic Database Cre transgenic mouse lines
Transgenic/Targeted Mutation Database Information on transgenic animals and targeted mutations
Varied Biomedical Content
CarbBank Complex carbohydrate/polysaccharide sequences
Dbcat Catalog of databasesDrugDB Pharmacologically-active compounds; generic and trade namesHOX-PRO Clustering of homeobox genes
LocusLink/RefSeq Curated sequence and descriptive information about genetic lociMolecular Probe Database Synthetic oligonucleotides, probes and PCR primersMPDB Information on synthetic oligonucleotides
NCBI Taxonomy Browser Names of all organisms that are represented in the genetic databases with atleast one nucleotide or protein sequence
PubMed MEDLINE and Pre-MEDLINE citations
Tree of Life Information on phylogeny and biodiversityVectordb Characterization and classification of nucleic acid vectors
8/6/2019 Commercial Bio in for Ma Tics
20/20
INVESTMENT RESEARCH
HEALTHCARE TECHNOLOGY
Biotechnology Telecommunications Equipment
Akhtar Samad, M.D., Ph.D. (212) 514-2342 Ayelet Oron (212) 514-2305
Jason Reed, Ph.D. (212) 514-2341 Gilad Alper (212) 514-2356
Alan J. Tuchman, M.D. (212) 514-2345
John Tonkin (212) 514-2348 Diversified Technology
Rami Rosen (972) 3519-9004
Medical Technology
Alan J. Tuchman, M.D. (212) 514-2345 SPECIAL SITUATIONS
Alan J. Septimus (212) 514-2317
Peter H. Vogel (212) 514-2336
ASIA PACIFIC
Telecommunications and Special Situations
Sandia Shih (212) 514-2358
This report is based upon information which Oscar Gruss & Son Incorporated believes to be reliable but no
representation is made by this Firm or any of its affiliates as to its completeness or accuracy. This report is not acomplete analysis of every material fact concerning any company, industry or security, and more information isavailable upon request. Opinions expressed herein are subject to change without notice. Oscar Gruss & SonIncorporated makes a market in this security and may have a long or short position in this security in connection withthis activity. This Firm and/or our employees and affiliates may own or have positions in any securities of companies
mentioned in this study, which positions may change at any time, and may, from time to time, sell or buy suchsecurities. This Firm or one of its affiliates may from time to time perform investment banking or other services for, orsolicit investment banking or other business from a company mentioned in this report.
2000 Oscar Gruss & Son Incorporated. All rights reserved.
Top Related