Bioinformatics Industry Australia · Morange, M. (1999). A History of Molecular Biology. Harvard...

12
96 APBN • Vol. 7 • No. 3 • 2003 www.asiabiotech.com B ioinformatics in Australia I ndustry Tim Littlejohn CEO, BioLateral Pty Ltd. Sydney, Australia. [email protected] http://www.biolateral.com.au 1 Introduction In the last couple of years, the role of bioinformatics in Australia has been recognized as a key ingredient in the development of high tech industries in the country (Littlejohn 2000, Littlejohn 2002a). This recognition is based on a strong and growing biotechnology sector (Littlejohn 2002b, Thorburn and Hopper 2002), which is underpinned, by vibrant life science research and ICT (information and communication technology) R&D. Recently an industry analysis of bioinformatics in Australia was conducted and the report (Littlejohn 2002a) recommended that four major areas be focused on: a. Education and skills development; b. Research; c. Infrastructure; d. Commercialization. The taskforce identified that skills development in bioinformatics is the most pressing issue for industry development. But what skills? This very much depends on the definition of bioinformatics. 2 Defining Bioinformatics Bioinformatics is one of the most frequently used terms in the modern molecular life sciences and bioindustries, and perhaps one of the least well defined. Common usage of bioinformatics defines it broadly as the intersection point of the life sciences and technologies (e.g. genomics) with the information sciences and technologies (e.g. computing). Further refinement of the term, however, typically leads to disagreements: Fig. 1 illustrates the level of semantic confusion that surrounds the term “bioinformatics”. There are four layers to bioinformatics (Littlejohn 1997) that have been used in various ways over the last five years. All are valid uses of the term bioinformatics but all are very distinct and their definitions are frequently the subject of heated debate. These layers are: Users = biologists. This refers to the application of bioinformatics technologies to the life science discovery process. However, much as molecular biology was a discipline and later became a technique and its practitioners were no longer considered molecular biologists (Morange 1999), those scientists who use bioinformatics tools should no longer considered bioinformatics practitioners. Product and service builders and deliverers = BioIT. The newest term used to describe bioinformatics, BioIT should be associated with the “IT” (technical) aspects of bioinformatics. Examples of this might be provision of an online service such as a WWW interface to a databank. Engineering = BioIT. Software, database and computing systems engineering should also now be considered BioIT. Examples include development of new software tools for biological data management. Research = bioinformation science. The term “bioinformation science” has recently been applied to describe the research activities in bioinformatics. In the past the term “computational biology” has also been used to describe “pure” bioinformatics research. Research also overlaps with biostatistics, especially in areas such as bioinformatics problems in functional genomic data such as microarray datasets. Examples of research in this area include gene detection algorithms in novel genome sequences. 3 Bioinformatics Consumers In Australia, the bioinformatics “industry” is best carved into two slices: “consumers” bioinformatics of products and services and “suppliers” of these products and services. Owing to the ambiguity in definition of the term “bioinformatics”, these two need to be addressed separately.

Transcript of Bioinformatics Industry Australia · Morange, M. (1999). A History of Molecular Biology. Harvard...

Page 1: Bioinformatics Industry Australia · Morange, M. (1999). A History of Molecular Biology. Harvard University Press. ISBN 0-674-39855-6. and Oracle Extensibility Bioinformatics Bruce

96 APBN • Vol. 7 • No. 3 • 2003

www.asiabiotech.com

Bioinformatics in AustraliaIndustry

Tim LittlejohnCEO, BioLateral Pty Ltd. Sydney, [email protected]://www.biolateral.com.au

1 Introduction

In the last couple of years, the role of bioinformaticsin Australia has been recognized as a key ingredient inthe development of high tech industries in the country(Littlejohn 2000, Littlejohn 2002a). This recognition isbased on a strong and growing biotechnology sector(Littlejohn 2002b, Thorburn and Hopper 2002), whichis underpinned, by vibrant life science research and ICT(information and communication technology) R&D.

Recently an industry analysis of bioinformatics inAustralia was conducted and the report (Littlejohn2002a) recommended that four major areas be focusedon:

a. Education and skills development;

b. Research;

c. Infrastructure;

d. Commercialization.

The taskforce identified that skills development inbioinformatics is the most pressing issue for industrydevelopment. But what skills? This very much dependson the definition of bioinformatics.

2 Defining Bioinformatics

Bioinformatics is one of the most frequently usedterms in the modern molecular life sciences andbioindustries, and perhaps one of the least well defined.

Common usage of bioinformatics defines it broadlyas the intersection point of the life sciences andtechnologies (e.g. genomics) with the informationsciences and technologies (e.g. computing). Furtherrefinement of the term, however, typically leads todisagreements: Fig. 1 illustrates the level of semanticconfusion that surrounds the term “bioinformatics”.

There are four layers to bioinformatics (Littlejohn1997) that have been used in various ways over the lastfive years. All are valid uses of the term bioinformaticsbut all are very distinct and their definitions arefrequently the subject of heated debate. These layersare:

• Users = biologists. This refers to the applicationof bioinformatics technologies to the life sciencediscovery process. However, much as molecular biologywas a discipline and later became a technique and itspractitioners were no longer considered molecularbiologists (Morange 1999), those scientists who usebioinformatics tools should no longer consideredbioinformatics practitioners.

• Product and service builders and deliverers =BioIT. The newest term used to describe bioinformatics,BioIT should be associated with the “IT” (technical)aspects of bioinformatics. Examples of this might beprovision of an online service such as a WWW interfaceto a databank.

• Engineering = BioIT. Software, database andcomputing systems engineering should also now beconsidered BioIT. Examples include development ofnew software tools for biological data management.

• Research = bioinformation science. The term“bioinformation science” has recently been applied todescribe the research activities in bioinformatics. In thepast the term “computational biology” has also beenused to describe “pure” bioinformatics research.Research also overlaps with biostatistics, especially inareas such as bioinformatics problems in functionalgenomic data such as microarray datasets. Examples ofresearch in this area include gene detection algorithmsin novel genome sequences.

3 Bioinformatics Consumers

In Australia, the bioinformatics “industry” is bestcarved into two slices: “consumers” bioinformatics ofproducts and services and “suppliers” of these productsand services. Owing to the ambiguity in definition ofthe term “bioinformatics”, these two need to beaddressed separately.

Page 2: Bioinformatics Industry Australia · Morange, M. (1999). A History of Molecular Biology. Harvard University Press. ISBN 0-674-39855-6. and Oracle Extensibility Bioinformatics Bruce

97APBN • Vol. 7 • No. 3 • 2003

A large proportion of publicly funded life scienceresearch in Australia in organizations such asuniversities, hospitals and the CSIRO are big consumersof bioinformatics. There are too many of these toconsider separately here.

On the commercial side, the biotechnologyindustry in Australia is a growing consumer ofbioinformatics technologies. In fact, the peakbiotechnology industry body, AusBiotech, has recentlylaunched a Bioinformatics Special Interest Group(ABSIG) in recognition of the growing importance ofbioinformatics to biotechnology industrial growth inAustralia (http://www.ausbiotech.org).

Biotechnology firms that heavily use bioinformatics(some even having their own bioinformaticsdepartments) include Bionomics in Adelaide(http://www.bionomics.com.au), GeneTraks inBrisbane (http://www.genetraks.com) andthe Brain Resource Company in Sydney(h t tp : / /www.bra inresource .com) .Instrument companies such as ProteomeSystems Limited (PSL) in Sydney(http://www.proteomesystems.com) arealso voracious consumers ofbioinformatics and producers ofbioinformatics tools.

Several publicly funded orsemi-commercial organizations are alsoconsumers of bioinformatics, includingthe Innovative Dairy ProductsCooperative Research Center in Sydney(http://www.dairycrc.com) and theIntelligent Island Center Of Excellencein Bioinformatics in Hobart(http://www.intelligentadvantage.net.au).In addition, Australia has three nationalfacilities that are consumers ofbioinformatics — ANGIS in Sydney isAustralia’s National Bioinformatics facility(http://www.angis.org.au) and also a supplier ofbioinformatics to many research and educationalorganizations in Australia. In addition, APAF in Sydneyas the National Proteomics Facility (http://www.proteome.org.au) and the AGRF in Melbourne andBrisbane, as the National genomics facility (http://www.agrf.org.au), are also heavy bioinformaticsconsumers.

4 Bioinformatics Suppliers

Australia has a number of pure bioinformaticscompanies offering products and services notjust to the consumers mentioned above but

also to organisations outside Australia. Thesecompanies include Sydney based BioLateral(http://www.biolateral.com.au) and Desert Scientific(http://www.desertsci.com), Canberra basedGeneticXchange (http://www.geneticxchange.com/v3/index.php), Brisbane based GeneXbase(http://www.genxbase.com) and GeneticSolutions(http://www.geneticsolutions.com.au) and Perth basedBiogene-E-Us (http://biogene-e-us.com).

In addition to these “pure” bioinformaticscompanies, some instrument companies are alsosuppliers of bioinformatics products. These companiesinclude Axon Instruments (http://www.axon.com) andPSL (http://www.proteomesystems.com). Australia hasa number of clinical trials software companies too,including Sydney based iLife (http://www.ilife.com.au)and Phase Forward (http://www.phaseforward.com).

The life science sectoris seen as one of the fastestgrowing markets for the ITindustry, and as a result, anumber of IT firms areactively selling equipment,software and services intothe sector. These firmsinclude IBM (http://www-3 . i b m . c o m / s o l u t i o n s /lifesciences), Apple (http://www.apple.com.au), Cray(http://www.cray.com), Sun(http://www.sun.com) andHP (http://www.hp.com).

Australia has a largenumber of publiclysupported research groupswith active researchprograms in bioinformatics

(see accompanying article). These groups producebioinformatics innovations that are in turn consumedby the life science research communities mentionedabove. Bioinformatics research groups include Canberrabased CBiS (Center for Bioinformation Sciences) at theANU (http://cbis.anu.edu.au) and the Universityof Canberra Medical Informatics Center(http://nhsc.org.au), Melbourne based WEHI(http: / /www.wehi.edu.au/bioweb/index.html)and Victorian Bioinformatics Consortium(http://www.vicbioinformatics.com), Sydney basedPeterWills Bioinformatics Center at the Garvan Institute(http://www.garvan.org.au) and the Centenary Institute

Fig. 1 Semantic confusion surrounding

bioinformatics.

Page 3: Bioinformatics Industry Australia · Morange, M. (1999). A History of Molecular Biology. Harvard University Press. ISBN 0-674-39855-6. and Oracle Extensibility Bioinformatics Bruce

98 APBN • Vol. 7 • No. 3 • 2003

www.asiabiotech.com

(http://www.centenary.usyd.edu.au) and Brisbane basedIMB (http://www.imb.uq.edu.au).

5 Future Industry Directions

The future of bioinformatics in Australia will verymuch be shaped by how rapidly the life science researchsector gains the skills in bioinformatics. As aconsequence of this recognized demand for skills,Australia has seen rapid growth in the range ofbioinformatics education opportunities available forboth life scientists and information scientists. BioLateral,a Sydney based bioinformatics company, offersprofessional training in a range of bioinformatics topics(http://www.biolateral.com.au), and many universitiesnow offer bioinformatics training at the undergraduatelevels. Undergraduate courses are available atuniversities around the country (summarized athttp://www.angis.org.au/new/education/study.html) andpost-graduate courses are available from the Universityof Sydney (http://www.scifac.usyd.edu.au/future/pg/pgc_bioinform.html) and Curtin University(http://www.curtin.edu.au/curtin/handbook/courses/30/30 1135.html). It will be interesting to watch the growthof the industry as skills levels rise over the comingyears.

References

Hopper, K. and Thorburn, L. (2002). 2002 BioIndustry Review:

Australia & New Zealand. ISBN 0-9580301-1-1

Littlejohn, T.G. (1997) Delivery Of Computational Resources-

Principles And Practice. pp 155 – 166. In Advances in Computational

Life Sciences, Volume. II, Visual Human to Molecular Structure of

Proteins, CSIRO Publishing. Edited by Marek T. Michalewicz.

Littlejohn, T.G. (2000). Bioinformatics in Australia. Bioinformatics

16, 849 – 850

Littlejohn, T.G. (2002a). Bioinformatics — Issues and Opportunities

for Australia. 2002. Australian Federal Government Emerging

Industries Occasional Paper 15. Taskforce lead by T. G. Littlejohn.

ISBN 0 642 721319

Littlejohn, T.G. (2002b) Sunbaked Biotechnology in Australia. Nature

Biotechnology 20, 873 – 877

Morange, M. (1999). A History of Molecular Biology. Harvard

University Press. ISBN 0-674-39855-6.

and OracleExtensibility

Bioinformatics

Bruce BlackwellOracle Corporation, New England

Development Center1 Oracle Drive, Nashua, New Hampshire, US.

[email protected]

Abstract

The Oracle relational database managementsystem, with object-oriented extensions and numerousapplication-driven enhancements, plays a critical roleworldwide in managing the exploding volumes ofbioinformatics data. There are many features of theOracle product which support the bioinformaticscommunity directly already and there are severalfeatures which could be exploited more thoroughly byusers, service vendors, and Oracle itself to extend thatlevel of support. This paper will present an overview ofOracle features which support storage of bioinformaticsdata and will discuss extensibility features which givethe product room to grow. Some attention will be givento Oracle’s own efforts to use that extensibility to exploitemerging standardization of many of the complex dataand computation requirements of the life sciences.

Keywords: Oracle; database; bioinformatics; extensibility.

1 Introduction

Huge volumes of bioinformatics data are emergingfrom DNA and protein sequencing efforts, geneexpression assays, X-ray and NMR protein structuredetermination, and many other activities. Almost everyimportant effort to understand life processes or todevelop therapeutic drugs is a producer or consumerof large data volumes. Database management systemshave a very important role to play in the storage andserving of this explosion of data. Relational technologyhas emerged as the standard for databases in many fieldsincluding bioinformatics.Oracle’s relational databasemanagement system (RDBMS) is now used for about85 percent of the bioinformatics database market. The

The life science sector is seen as one of

the fastest growing markets for the IT

industry, and as a result, a number of IT

firms are actively selling equipment,

software and services into the sector.

Page 4: Bioinformatics Industry Australia · Morange, M. (1999). A History of Molecular Biology. Harvard University Press. ISBN 0-674-39855-6. and Oracle Extensibility Bioinformatics Bruce

99APBN • Vol. 7 • No. 3 • 2003

Oracle RDBMS has evolved from a simple relationaldata store to one that enables complex data structuresto be explicitly represented in the database, and for thosedata structures to be provided with diverse functionalbehavior within the server. There are emerging standardsfor the content of bioinformatics data structures and fortheir associated functional behavior. It is Oracle’s planto provide implementations of many of these standardsso that users of bioinformatics data can concentrate onscience and not on data management. This paper willdiscuss features of the Oracle RDBMS that supportbioinformatics and will talk about plans for futureextensions.

2 The Oracle Object-Relational Database

Object-relational database management systems(ORDBMS) incorporate object-oriented capabilities ina database environment to better support specializedapplications. Starting with the 8i release, the Oracledatabase has been a fully functional object relationaldatabase which extends the relational database in threemain areas: (1) definition and storage of new datatypes;(2) incorporation of object behavior in the querylanguage; and (3) extensible indexing and the queryoptimizer.

In the Oracle database, a user can define new datatypes that closely represent the objects used in theapplication. With this approach, applications can focuson object manipulation logic (the object behavior)without worrying about the object storage and retrievallogic. The query language used in all the commercialrelational database systems is SQL, and it can beextended to integrate the new operations defined byapplications with standard SQL queries. This has theadvantage that all the operations on the database objectsare handled using a standard query model resulting insimplified application logic.

Indexes are optional structures associated withrelational tables. Indexes speed up the execution of SQLstatements in the database by providing a faster accesspath to the data, primarily by reducing disk I/O.Traditional databases provide an indexing mechanismwhich can work with simple data but this is not suitablefor use with application-defined complex data types.With Oracle Extensible Indexing, the application candefine and manage the structure and semantic contentof the index, and can base the index on properties ofcomplex abstract data types that are only understoodby the application. The database system then interactswith the application to build, maintain, and employ theindex. The main advantage of this extensible indexingframework is that the index is always in synchronization

with the data; once the index is built, all the updates onthe base table will automatically result in updates inthe index data. Thus the users are not required to worryabout the data integrity and index currency issues. Oncethe index is built, it is treated like a regular B-tree index.The database server knows the existence of the indexand manages all the index-related work using userdefined functions. In this extensible indexing model,the query optimizer also understands the application’sindex structure and operators. When the user-definedoperators are evaluated, the corresponding index canbe used to efficiently process the SQL query. This issimilar to using a B-tree index (built in to Oracle) whileevaluating an equality operator. The application canstore the index data either inside the Oracle database(in the form of tables) or outside the Oracle database(in the form of files) although it is highly desirable forthe database to handle the physical storage of the index.By doing this, the index data will have the same securityand reliability as the rest of the data in the database.

3 Current Applications of OracleExtensibility in Bioinformatics

Many independent service vendors (ISVs) andOracle customers have taken advantage of the databasefeatures for object definition and storage to createdatabase abstractions for sequences, structures,annotations, and other bio data. Furthermore Oracle’sobject-oriented extensibility permits the behaviors ofthe abstractions to be programmed as well. Thispowerful feature enables ISVs and in-house teams tooffer domain products based on Oracle that havefunctional capabilities that are tightly integrated withthe data. The externally programmed functionalcapabilities then become accessible to users and otherapplications via SQL or Java APIs.

Oracle extensibility also includes extensibleindexing, a very useful feature for querying overcomplex scientific data types in large stores. Severallife sciences vendors, including MDL and Daylight, havebuilt chemoinformatic cartridges which use extensibleindexing to do fast searching for structural subgraphs inlarge tables of small molecules. These proprietaryindexes, whether based on hash keys (Daylight) or digitalfingerprints (MDL), are fully registered and made activewithin the Oracle database via extensible indexing sothat standard SQL queries can implicitly make use ofthem.

Page 5: Bioinformatics Industry Australia · Morange, M. (1999). A History of Molecular Biology. Harvard University Press. ISBN 0-674-39855-6. and Oracle Extensibility Bioinformatics Bruce

100 APBN • Vol. 7 • No. 3 • 2003

www.asiabiotech.com

4 Future Oracle Plans to Extend theDatabase for Bioinformatics

When developing a major application-drivenextension to the DB server the issue of standardizationis very important. This applies to the data dictionariesor ontologies of structured data as well as to thespecifications for standard computations. Theburgeoning complexity and quantity of life sciencesscientific data present such an immense challenge thatuniform data standards (ontologies) for sequences,structures, gene-expression profiles, annotations, etc.are absent, with some exceptions. This is not the sameas saying there is no format for the data. There are lotsof formats, some with an old and awkward history. Thesame is true for standard computations.

In making its entry into this rich area, Oracle isfocused initially in two areas : sequence searching andalignment, which together we will call sequencematching; and macromolecular structures.

In the case of sequence matching, it is thecomputational requirements that have become moreor less standard. Although there is no single standardfor the underlying data type, the linear DNA orpolypeptide sequence, this is a fairly simple data typeas things go with biological data. Oracle will bedelivering a server-based implementation of the BasicLocal Alignment and Search Tool (BLAST) together withits variants BLASTP, etc. in a near release of the product.This sequence alignment capability will operate onsequences stored in the database as character largeobjects (CLOBs). It will be simple to move the serviceslater to a proper ontology-based representation ofsequences and annotations in the database, if suitablestandards emerge for sequence objects. In themeanwhile Oracle is investigating data compressiontechniques for sequences so that we may one day doan efficient storage implementation of an emergingstandard. Further on the horizon Oracle is looking intofast ways of solving the general pattern matchingproblem for sequences, so that rapid searches could bedone in the database for things like flexible promotersequences, intron/exon boundaries, and statisticalsecondary structure predictors in the case of proteinsequences. There have been no standard computationalrequirements to emerge for this generalized matching(although there are lots of software implementations forparticular require ments), but this does not precludeR&D on powerful, general methods for matchingsequence data that will be useful and efficient in thedatabase for a lot of typical problems.

In the case of macromolecular structures, it is thedata dictionary for the solved structure that has shownsigns of standardization. The Protein Data Bank (PDB)is now supporting a specification for a structured datatype for macromolecular structures based on a thoroughscientific ontology called the MacromolecularCrystallographic Information File (mmCIF) (Bourne et.al. 1997) This specification has received the blessing ofthe Object Management Group, Life Sciencescommittee, as a standard and has been served to usersas an option by the PDB since August 2002. Oracleintends to do an efficient schema and object data typeimplementation of mmCIF and make it part of the DBserver offering. Initially Oracle will provide a SQL andJava interface to the macromolecular structuresin the database consistent with the low levelaccessor/depositor functions of the OMG standard. Overtime Oracle will layer computations over the datastructures as standard computational requirementsemerge. Already Oracle has identified many commonoperators that are believed to be of wide and uniformapplicability to molecular biology problems. In otherwords, Oracle will implement standards about the thingsresearchers commonly want to ask about largemolecules, whether explicitly captured in the data orderived by computation. With such a smart databaseimplementation of mmCIF, researchers will be able toload and utilize not just public molecular structures butalso proprietary structures and those derived viahomology, or ab initio protein folding, if the latterproblem is ever solved.

ReferencesBourne, P.E., H.M. Berman, B. McMahon, K. Westbrook, P.M.D.

Fitzgerald (1997): The Macromolecular CIF Dictionary (mmCIF).

Methods in Enzymology, 277, 571 – 590. Conference on Management

of Data, Washington DC, USA, 22, 207 – 216, ACM Press.

In the Oracle database, a user can define

new data types that closely represent the

objects used in the application. With this

approach, applications can focus on object

manipulation logic (the object behavior)

without worrying about the object storage

and retrieval logic.

Page 6: Bioinformatics Industry Australia · Morange, M. (1999). A History of Molecular Biology. Harvard University Press. ISBN 0-674-39855-6. and Oracle Extensibility Bioinformatics Bruce

101APBN • Vol. 7 • No. 3 • 2003

Page 7: Bioinformatics Industry Australia · Morange, M. (1999). A History of Molecular Biology. Harvard University Press. ISBN 0-674-39855-6. and Oracle Extensibility Bioinformatics Bruce

102 APBN • Vol. 7 • No. 3 • 2003

www.asiabiotech.com

Ninepins of Asia Pacific

Bowling Ball‘Bioinformatics’ hits

by

Balaji K.(Healthcare Practice, Asia ) FROST & SULLIVAN

[email protected]

For many burgeoning start-ups, bioinformatics isconsidered to be “a pot of gold at the end of therainbow”. The key questions are: Is there a realopportunity for them? And has the bowling ballcalled bioinformatics truly hit the ninepins of AsiaPacific — Japan, Australia, India, Singapore, SouthKorea, China, Taiwan, and Indonesia?

1 Introduction

“Information is power” has been one of theoft-repeated clichés in corporate boardrooms. Withinformation originating from various processes,biotechnology has reworked the old cliché. “Informationcapture, storage, management, and analysis is power”.To put it in a nutshell, biotechnology is looking towardadvances in information technology, like for instance,bioinformatics, to get the job done faster and quicker.

To define, “Bioinformatics is the capture,management, analysis, and dissemination of informationrelated to the emerging drug-discovery paradigm usinggenomics, combinatorial chemistry, proteomics, andhigh throughput screening”.

With pharmaceutical and biotech companiesincreasingly relying on infotech to provide them withsolutions and tools to manage their growing data,bioinformatics is the most important part of the biotechwave to hit the Asia Pacific region.

2 The Global Snapshot

2.1 The opportunity

Drug research is data rich, but information poor.Genomics or gene-sequencing projects, highthroughput screening, combinatorial chemical synthesis,gene-expression investigations, pharmacogenomics,and proteomics studies have created massive volumesand multiple sources of biological and chemical data.This data is threatening to create a bottleneck that mighthamper drug discovery and development. The primarygoal of bioinformatics is to link and convert this complexdata into useful information and knowledge. Ascomputing and biology have converged, software toolsfor data capture, management, analysis, mining, anddissemination have also emerged. The convergence ofbiotech and infotech has become inevitable.

It has been estimated that about 20 percent of thecurrent novel discovery programs are based ongenomics, and this is fueling the growth ofbioinformatics. It is predicted that virtually all newdiscovery programs will be genomics-based in the nearfuture. Currently, there is an increased pressure todevelop breakthrough drugs and shorten the drugdiscovery time and costs involved. This presents anopportunity to bioinformatics companies as data

Page 8: Bioinformatics Industry Australia · Morange, M. (1999). A History of Molecular Biology. Harvard University Press. ISBN 0-674-39855-6. and Oracle Extensibility Bioinformatics Bruce

103APBN • Vol. 7 • No. 3 • 2003

capture, management, analysis, and disseminationcould play a vital role for drug discovery companies incontaining both cost and time.

2.2 The Market

The total market for bioinformatics is estimated tobe US$837.7 million in 2002 growing at a compoundannual growth rate (CAGR) of 8.64 percent.

The global bioinformatics market is directlyproportional to the investments taking place in thehuman healthcare R&D for new drug development.Pharmaceutical companies are under pressure todevelop block buster drugs. With costs to take a drugfrom “bench to bottle” ranging from US$400 to US$900million, and the percentage of drugs that make it tomarket, being low, the pharmaceutical companies areincreasingly looking at biotechnology to deliver results.With informatics being the backbone of biotechnology,informatics market today is piggybacking theinvestments made in drug discovery and developmentand encompasses every stage of the pharmaceutical andbiotech R&D process. Most of the data regarding the

previously discovered gene sequences is available inthe public domain, bringing down the cost of having toinvest in costly Intellectual Property Rights (IPRs), andthis has become a crucial driver for the growth ofbioinformatics globally.

Globally, the markets in US, Japan, Western Europe,Taiwan, and Singapore are in a high growth phase, whilethose in the remaining parts of Asia and Latin Americaare in the medium growth phase. In Asia — Japan,Australia, India, Singapore, Taiwan, and South Koreahave shown good potential as supplier of Bioinformaticstools. Software for data management and data analysisare on the higher end of value chain and contributemore revenues than other segments. Some of theimportant market players include Celera Genomics, LionBiosciences, Informax, Accelrys, AlphaGene, Base4Informatics, Gene Codes Corp., DNASTAR, Inc.,Genomica Crop., Incyte Genomics, MolecularInformatics, Inc., Cognia Corp., CuraGen and Textco,Inc.

2.3 Regional Attractiveness Index

The following chart gives the regional attractivenessfor bioinformatics. The index takes into accountmacro-economic variables such as GDP; market relatedvariables such as market age, government initiatives,growth rate, IT expenditure, and degree of competition;and micro variables such as technical expertise and R&Dexpenditure of biopharma companies.

2.4 The Market Segments

The global bioinformatics market can be segmentedinto two — the supply side and the demand side.

The supply side consists of vendors in the marketthat provide data-generation software, which is used tocontrol gene-sequencing hardware, data storage andmanagement that allows the raw sequence data to bestored so that it can be analyzed and sequenced later,data-analysis software applications which are used to

Page 9: Bioinformatics Industry Australia · Morange, M. (1999). A History of Molecular Biology. Harvard University Press. ISBN 0-674-39855-6. and Oracle Extensibility Bioinformatics Bruce

104 APBN • Vol. 7 • No. 3 • 2003

www.asiabiotech.com

increasing application of biotechnology is driving thegrowth of bioinformatics. An increase in the number ofteaching institutes, which provide systematic trainingin bioinformatics, has ensured that adequate skilledmanpower is available to take advantage of theopportunities presented. Huge volumes of data availablein public and commercial databases along with the datagenerated by pharmaceutical companies posesformidable challenges for data management and alsoaids the growth of bioinformatics. The global market isalso witnessing increased interest in bioinformatics frompharmaceutical, biotechnology, and IT companies, andthis has resulted in a spate of mergers, acquisitions, andjoint ventures, both among them and withbioinformatics start-ups as well.

3 Asia Pacific: Bioinformatics Bowling theNinepins

Bioinformatics in Asia Pacific is drivenpredominantly by the government-funded initiatives.This is all set to change in the near future. More andmore private informatics start-ups are being formed.Today, the informatics initiative is being driven by thelarge pharmaceutical companies in the region as wellas the IT companies entering the arena with theirtechnical skills. The Asia Pacific market is witnessing ashift to a large number of private sectors holdingconsiderable sequence information in the proprietarydomain. The biggest driver of bioinformatics continuesto be the availability of significant amount of data inthe public domain. This continues to play a significantrole in terms of training a whole new generation ofscientists, environmentalists, healthcare professionals,clinicians, computational biologists, bioethicist, and soon. The key countries in the region are Japan, Australia,India, Singapore, and South Korea. Taiwan, China, andIndonesia continue to lag behind these top fivebehemoths.

mine data from available databases, and then performvarious forms of analysis such as gene-sequencemanipulation, comparison, alignment and the datadissemination, which makes available the informationon discovered sequences to all concerned parties inside,and frequently outside, the organization.

The demand side consists of gene-basedInformatics, which comprises all the uses ofbioinformatics techniques for the analysis of geneticelements. This includes the discovery phase of genomics(structural genomics), the target validation phase(functional genomics), and the stratification of patientpopulations based on genotype for diagnostics andtherapeutics (pharmacogenomics). The othercomponent of demand side is the Cheminformatics,which is the characterization and optimization ofcombinatorial libraries with the intent of using theselibraries in the drug discovery process.

2.5 Trends

The global marketplace is currently witnessing theemergence of increased numbers of validatedtechnologies with increasing levels of integration. Opendata formats such as Java and XML are driving themarket. The market is also witnessing increasing marketconsolidation and use of legacy systems. The decodingof the human genome sequence has accelerated theneed for high throughput target identification andvalidation, driving the need for bioinformatics as a toolin both biotechnology and drug discovery. Today, ITcapabilities cover the entire R&D chain, which includesgene identification, target identification and validation,and patient monitoring, further fueling the growth ofbioinformatics. Companies are increasing theapplication of biotechnology in areas such astherapeutics, agriculture, industry, food, and so on. This

Page 10: Bioinformatics Industry Australia · Morange, M. (1999). A History of Molecular Biology. Harvard University Press. ISBN 0-674-39855-6. and Oracle Extensibility Bioinformatics Bruce

105APBN • Vol. 7 • No. 3 • 2003

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

3.1 Japan

The chief promoter of Japanese bioinformaticsindustry continues to be Japan Biological InformaticsConsortium (JBiC). JBiC is an organization of largechemical and pharmaceutical companies that aims toimprove the international competitiveness of theJapanese biotechnology industry by using bioinformaticsto the speed of R&D in all sectors of biotechnology.JBiC has spent approximately US$130 million till dateon various bioinformatics projects. JBiC focuses on SNP,protein analysis, and establishment of e-commercesystems for the biotech industry. It coordinates with thevarious ministries associated with Japanesebiotechnology such as Ministry of Education, Culture,Sports, Science and Technology (MECSST), the Ministryof Agriculture, Forestry, and Fisheries (MAFF), Ministryof Health, Labor, and Welfare (MHLW), and Ministry ofEconomy, Trade, and Industry (METI) with the variouspublic-funded research institutes and the private sectorcompanies.

One of the important private players in the Japanesebioinformatics arena is Hitachi Life Science. Hitachioffers a variety of research services, right from DNAsequencing and SNP discovery to genetic analysis,protein structure modeling, along with systemintegration services. A number of pharmaceutical majorsalso play a critical role in promoting bioinformatics inthe country. They include Ajinomoto Co. Ltd., OnoPharmaceutical Co. Ltd., Taisho Pharmaceutical Co. Ltd.and Yamanouchi Pharmaceutical Co. Ltd., to name afew.

3.2 Australia

The surge in bioinformatics field is fuelled by theadvances in genomics and proteomics generatingincreasing volumes of biological data that need to bemanaged and interpreted. The Australian bioinformaticsindustry is still in its infancy. A number of Australiancompanies, research institutes, Cooperative ResearchCenters (CRCs), universities, and other organizations areactive in bioinformatics and the development ofbioinformatics technologies. A large volume ofbioinformatics work happens in the universities andgovernment research institutes. Developments withinAustralia’s bioinformatics capability include uniquedatabases and libraries, innovative screening andanalysis technologies, and various bioinformaticsprograms.

The important centers of action are The AustralianNational University, Monash University, The Universityof Queensland (Biological Information Theory group(BITS), The Computational Biology and bioinformaticsEnvironment), Murdoch University (Center forBioinformatics and Biological Computing), SydneyUniversity (Australian National Genomic InformationService (ANGIS)), CSIRO (Bioinformatics ResearchGroup), and Macquarie University (Australian ProteomeAnalysis Facility (APAF)). The Walter and Eliza HallInstitute and the Garvan Institute of Medical Research(through the Peter Wills Center for Bioinformatics) bothhave dedicated bioinformatics research programs.Tasmania has announced an Intelligent Island Program,which supports the Tasmanian Center of Excellence forBioinformatics.

3.3 India

India’s Department of Biotechnology (DBT)initiated a bioinformatics program to create anetwork-based infrastructure that now extends across57 universities and public-funded institutions. Thisinitiative comprises research groups involved indatabase creation, molecular modeling, and algorithmdevelopment. The DBT has pledged US$65 million forgenomics research over the next five years, and hasannounced plans to enhance the infrastructure forbioinformatics research in public-funded institutions.The Indian bioinformatics market was worth aboutUS$25 million in 2001, contributing to about 51 percentof the overall Indian biotechnology market.

The most crucial advantage for India lies in thelow cost of development and the high success ratesenjoyed by bioinformatics companies in the country.

Page 11: Bioinformatics Industry Australia · Morange, M. (1999). A History of Molecular Biology. Harvard University Press. ISBN 0-674-39855-6. and Oracle Extensibility Bioinformatics Bruce

106 APBN • Vol. 7 • No. 3 • 2003

www.asiabiotech.com

The Indian advantage also stems from stronggovernment support for the biotech sector and theavailability of technical expertise, which gives animpetus to the research process and the demand forbioinformatics tools. The low number of competitorsmakes India attractive to new entrants. India has alreadymade a mark in the IT sector and has a strong ITinfrastructure, as well as skilled manpower, which wouldhelp it to become a strong player in the bioinformaticsmarket.

Growing volumes of genomics data and increasingnumbers of participants contracting work to Indiancompanies have encouraged many pharmaceutical, IT,and biotechnology companies to enter thebioinformatics sector. Indian IT companies such as TataConsultancy Services (TCS), Cognizant Technologies,Infosys, Wipro, and Satyam have already set up theirbioinformatics divisions. Not far behind arepharmaceutical companies such as Dr. Reddy’sLaboratories, Ranbaxy, Biological E, and NicholasPiramal that have either opted for the joint venture routeor have tied up with start-up companies. India is alsowitnessing the emergence of pure-play bioinformaticscompanies such as Strand Genomics. Government andprivate institutes, as well as participants such as theNational Institute of Information Technology (NIIT), arefostering the growth of man power skilled inbioinformatics.

3.4 Singapore

Bioinformatics is an area where Singapore canleverage its position as a country with a thriving ITindustry to its advantage. Unlike biological sciences,bioinformatics expertise in Singapore can be easier todevelop as Singapore has an “IT culture”. However,bioinformatics is not just IT; the “bio” component is asimportant as the latter. Singapore expects its efforts inbioinformatics education to pay off in the form of atalented pool of bioinformaticians.

Singapore houses a number of key bioinformaticsinstitutions. The Bioinformatics Institute (BII), a divisionwithin the Biomedical Research Council (BMRC), is onesuch institution. The BII, in partnership with theNational University of Singapore (NUS), has set up abioinformatics graduate program. This includes atwo-year Masters program in Bioinformatics (by coursework). The Singapore BioMedical ComputationResource (SBCR) is a national resource that will allowpublic (academic, research) institutions andorganizations access genomic data as well as runbiomedical and bioinformatics applications (BLAST,

FASTA, and CLUSTALW). The Bioinformatics Center(BIC) at the NUS was set up in 1996 with an initialfunding from the Economic DevelopmentBoard (EDB). Now, it receives funding from A*STAR,GlaxoSmithklineBeecham, and NUS. BIC aims to pursueleading-edge research in the field of bioinformatics andlife sciences, promote and develop state-of-the-artservices in bioinformatics, and to offer informaticssolutions to complex biological problems for Singaporeand international scientific users with possiblecommercialization.

BIC has research groups at Kent Ridge Digital Labs(now, I2R), the Center for Natural Product Research(CNPR), and the Institute of Molecular and Cell Biology(IMCB). New groups are emerging in other researchcenters and life sciences departments. BIC is amongthe first in the region to undertake research inbioinformatics.

The key bioinformatics initiatives of Singaporeinclude the BioMed Grid (BMG), another developmentin this area. It comprises a network of distributedcomputing resources in Singapore and collaborates withother R&D centers in the US, Europe, and Japan. Theother key initiative is the S* Life Science InformaticsAlliance. This alliance is an attempt by six topinstitutions from five continents to share their resourcesto establish global course modules in bioinformatics.The six institutions, led by the NUS BioinformaticsCenter as the Secretariat, include Stanford University,Karolinska Institutet and Uppsala University in Sweden,University of Sydney, and the South African NationalBioinformatics Institute (SANBI) at the University ofWestern Cape, South Africa.

3.5 South Korea

The South Korean Ministry of Science andTechnology (MOST) declared 2001 as the Year ofBiotechnology, and is setting out to promotedevelopments in this industry. The ministry plans tonurture 600 biotech companies by the end of 2002 tomake South Korea the seventh most developed countryfrom its current 14th place in terms of competitivenessin the industry. Some of the key factors responsible forthe country’s attractiveness include domestic potential,government policies and favorableness to multinationalcompanies, investment climate, infrastructure, attitudeof private sector toward cutting-edge technology, andSouth Korea’s position as an international hub.Bioinformatics plays a crucial role in Korean biotechplans.

Page 12: Bioinformatics Industry Australia · Morange, M. (1999). A History of Molecular Biology. Harvard University Press. ISBN 0-674-39855-6. and Oracle Extensibility Bioinformatics Bruce

107APBN • Vol. 7 • No. 3 • 2003

An important advantage of Korea in bioinformaticsis its mature IT industry. The bioinformatics onslaughtin Korea is led by the chaebols such as LG, SK, andSamsung. These companies have established newbioinformatics teams. The bioinformatics industry alsofinds support from pharmaceutical companies, such asChoungkeundang and Green Cross. Some of thestart-up bioinformatics companies in Korea areBioinfomatix, Inc., Pax GENETICA, Inc, Macrogen,IDRTech, Badasoft Co., SmallSoft Co., BITEK CHEMS,Inc., IStech Co., IDGENE, lnc, RNA, Inc., Genoprot Co.,and Proteogen.

Some of the public institutions involved inbioinformatics research include Korean Society ofBioinformatics (KSBI), Biological Research InformationCenter (BRIC), National Institutes of Health (NIH),Korean Institute of Science and Technology Information(KISTI), Korea Research Institute of Bioscience andBiotechnology (KRIBB) and the 21c Frontier the Centerfor Functional Analysis of Human Genome.

4 Conclusion

Asia Pacific definitely enjoys a distinct advantagein bioinformatics. The strength of its IT sector along withthe presence of a large base of man power skilled in ITwould enable a number of countries in the region tograb opportunities in bioinformatics. This large pool oftechnically-skilled personnel can be used to meet thedemand for bioinformatics in other regions as well.Growth in the volume of genomic data along withtechnological advances and increased automation ofthe R&D process present bright growth prospects forbioinformatics worldwide and Asia Pacific is well placedto harness the same.

If there are a couple of constraints, it would beAsia Pacific’s ability to hard and soft sell its skill sets toget a chunk of the global pie as well as the lowexpenditure on biopharma R&D locally. Only time canreveal whether countries in the region have theresourcefulness needed to play a winning game inbioinformatics.

Inferring an OriginalSequence from Erroneous

Copies: TwoApproaches

Jonathan M. Keith1, Peter Adams1,Darryn Bryant1,

Keith R. Mitchelson2,3,Duncan A. E. Cochran1,2,

Gita H. Lala1,2

1Department of MathematicsUniversity of Queensland

St. Lucia Qld. 4072, Australia.2Australian Genome Research Facility

University of QueenslandSt. Lucia Qld. 4072, Australia.

3Institute for Molecular BioscienceUniversity of Queensland

St. Lucia Qld. 4072, Australia.

[email protected]

The primary goal of bioinformatics is to link

and convert this complex data into useful

information and knowledge. The

convergence of biotech and infotech has

become inevitable. The total market for

bioinformatics is estimated to be US$837.7

million in 2002 growing at a compound

annual growth rate (CAGR) of 8.64 percent.

AbstractThis paper considers the problem of inferring anoriginal sequence from a number of erroneous copies.The problem arises in DNA sequencing, particularlyin the context of emerging technologies that providehigh throughput or other advantages at the cost of anincreased number of errors. We describe and comparetwo approaches that have recently been developedby the authors. The first approach searches for asequence known as a Steiner string; the secondsearches for the most probable original sequence withrespect to a simple Bayesian model of sequencingerrors. We present the results of extensive tests inwhich erroneous copies of real DNA sequences weresimulated and the algorithms were used to infer theoriginal sequences. The results are used to comparethe two approaches to each other and to a third, moreconventional, approach based on multiple sequencealignment. We find that the Bayesian approach issuperior to the Steiner approach, which in turn issuperior to the alignment approach.