BII_YearBook10

BII2010

BioinformaticsInstitute

Bioinformatics Institute(A member of A*STAR’s Biomedical Sciences Institutes)30 Biopolis Street #07-01 MatrixSingapore 138671Tel: +65 6478 8298Fax: +65 6478 9048Website: www.bii.a-star.edu.sgCompany Registration Number: 19-9702109N

For Enquiries:

1

DIRECTOR’S MESSAGE

DIRECTOR’S MESSAGE | 1

It is still premature to speak about life sciences as a

theoretical scientific discipline. The extrapolation depth is

small due to the fragmentary knowledge in a vast space of

the unknown. Incremental accumulation of data as a result

of hypothesis-driven experiments and observations is still

the major source of new insight. Nevertheless, there are a

few increasingly important research areas where the

application of quantitative, mathematical concepts has

become instrumental for the discovery of new biomolecular

mechanisms and for progress in biological theory.

This development has been fuelled by the emergence of

high-throughput experimental techniques (such as DNA

sequencing, microarray techniques, etc.). As a result,

researcher can, for the first time, generate so much data

that, essentially, the aim of describing living organisms in

their totality has become realistic. Yet, the deluge of data

is often without understanding in terms of biomolecular

mechanisms that link genome information and phenotypes.

Computational biology has entered a new era characterized

by the availability of fully sequenced genomes, as well as

increasingly complete gene expression and proteomics

datasets that wait for functional interpretation. In increasingly

more instances, hypotheses about biomolecular mechanisms

from the data can be derived; thus, computational biology

becomes instrumental to generate qualitatively new biological

insight.

The Bioinformatics Institute, which was founded by Dr.

Gunaretnam Rajagopal in 2001 and led by myself since

August 2007, is on its way to becoming a notable contributor

of biologically relevant results and new, efficient computational

biology methods to the world-wide scientific effort in the

search for yet unknown biomolecular mechanisms, an effort

with the goal of applications in medicine and biotechnology.

For BII, the years 2007 -2009 are the time for launching a

new research program, for the start of new research teams

and for re-equipping BII with new computer systems and

wet lab equipment. At present, the Institute carries out

research in the areas of

• biomolecularsequenceanalysis for thepredictionof

molecular and cellular functions (including the

biochemical verification of hypotheses on function)

• biomolecularstructuremodellingandliganddesign

• geneexpressionprofileanalysisat thetranscriptand

proteome levels

• automatedanalysisofmicroscopicimagesfromcellular

systems (imaging informatics)

The Bioinformatics Institute has developed and deployed

analytical tools and computational techniques for biology

research in house and through close collaborations with

experimental and clinical groups within and outside the

Biopolis and Singapore. The ANNOTATOR suite as an efficient

environment for protein sequence-based function prediction

is an example for this. To emphasize, experimental efforts

also have a place in BII (i) for the verification of theoretically

derived hypotheses (as a complement to interactions with

experimental teams in collaborating institutions) as well as

(ii) for the generation of datasets that are necessary for the

development of theoretical methods. For this purpose, BII

researchers heavily rely on co-operations with experimental

teams affiliated with other A*STAR institutes and elsewhere

in Singapore and the world. BII also has its own biochemical/

cell-biological laboratory and a high-end microscopy unit.

The members of our institute are united in making this effort

a success and I invite you to join us in this endeavour that

will open new frontiers in biology.

Dr. Frank Eisenhaber

Director

Bioinformatics Institute

ContentsDirector’s Message | 1Bioinformatics Institute | 2Scientific Advisory Board | 2

RESEARCH DIVISIONSBiomolecular Function Discovery Division Sebastian Maurer-Stroh & Frank Eisenhaber | 4 Georg Schneider | 6 Sharmila Adhikari | 8

Biomolecular Modelling And Design Division Chandra Verma | 10 Ivana Mihalek | 12 Mallur Srivatsan Madhusudhan | 14

Genome And Gene Expression Data Analysis Division Vladimir Kuznetsov | 16 Vivek Tanavde | 18 Igor Kurochkin | 20

Imaging Informatics Division Lee Hwee Kuan | 22 Martin Wasser | 24

IT SCIENTIFIC SERVICES Software Engineering | 26 Bio-Computing Centre | 27

Adjunct Scientists | 28Visiting Scientists | 29Science Outreach Activities | 30Conferences and Visits | 31Recreation Club | 32Administrative Team | 33BII Location | 33

Located in the Biopolis, the Bioinformatics Institute (BII) was set up by the Agency for Science, Technology and Research (A*STAR) in July 2001; it was re-launched in the autumn 2007 as a research institute for biomolecular mechanism discovery guided by computational biology methods.

The spectrum of research activities in BII includes bioinformatics method development, experimental work for verification of hypotheses on gene function and collaborations with other experimental labs for biological data interpretation. Additionally, BII aims to provide postgraduate training as well as regional resource support in bioinformatics, especially for the institutes of the Biomedical Research Council (BMRC) of A*STAR.

Together with the BMRC, A*STAR research institutes and multinational R&D organizations in the Biopolis, the BII is situated in a conducive environment for exchange of scientific knowledge and friendly interaction that will prompt greater collaborations, and position the Biopolis as a notable biomedical R&D hub in Asia and in the world.

The Director of BII is advised by a Scientific Advisory Board consisting of eminent scientists in the field of bioinformatics/computational biology and experimental life sciences, with respect to the institute’s research directions, recruitment of staff and international research collaborations.

The presiding members are:

SCIEnTIFIC ADvISORy BOARD

BIOInFORMATICS InSTITuTE2

RESEARCh DIvISIOnS3

RESEARCh DIvISIOnS

Principal Investigators- Frank Eisenhaber

- Sebastian Maurer-Stroh

- Georg Schneider

- Sharmila Adhikari

Principal Investigators- Chandra Verma

- Ivana Mihalek

- Mallur Srivatsan Madhusudan

Principal Investigators- Vladimir Kuznetsov

- Vivek Tanavde

- Igor Kurochkin

Principal Investigators- Lee Hwee Kuan

- Martin Wasser

BIOMOlECulAR FunCTIOn DISCOvERy DIvISIOn

BIOMOlECulAR MODEllInGAnD DESIGn DIvISIOn

GEnOME AnD GEnE ExpRESSIOnDATA AnAlySIS DIvISIOn

IMAGInG InFORMATICSDIvISIOn

Bioinformatics is a multi-disciplinary approach combining computational and biological expertise to analyze biological data (both genomic and clinical), to advance biomedical research and development. Bioinformatics is both a science and an engineering art, concerned with the application of mathematics, physical/chemical principles and information technology to solve biological problems.

In the Bioinformatics Institute, there are four methodology-oriented research divisions comprising of research groups lead by independent Principal Investigators that focus on specific areas of computational biology. The common denominator is the goal of understanding biomolecular mechanisms underlying cellular phenomena, which is the basis for a rational understanding of pathogenesis or for planning biotechnological applications.

Prof. Sir Tom Blundell (Chairman) Chair of School of Biological Sciences Sir William Dunn Professor of Biochemistry Department of Biochemistry University of Cambridge

Prof. Eytan Domany The Henry J Leir Professorial ChairHead, Kahn Family Research Center of Systems Biology of the Human CellDepartment of Physics of Complex Systems Weizmann Institute of Science, Israel

Prof. Michael Levitt Professor and Chair Department of Structural Biology Stanford University School of Medicine Stanford

2 | BIOInFORMATICS InSTITuTE RESEARCh DIvISIOnS | 3

Matrix Building, Biopolis Reception CounterPhoto by Vivek Tanavde, BII Photo by Vivek Tanavde, BII

Prof. Tom Rapoport Professor of Cell Biology Howard Hughes Medical Institute Investigator Department of Cell Biology Harvard Medical School

Prof. Jason SwedlowWellcome Trust Senior Research Fellow and Reader Wellcome Trust Centre for Gene Regulation and Expression College of Life Sciences University of Dundee

Recent Publications1. Van Damme P, Maurer-Stroh S, Plasman K, Van Durme J, Colaert N,

Timmerman E, De Bock PJ, Goethals M, Rousseau F, Schymkowitz J, Vandekerckhove J, Gevaert K. Analysis of protein processing by N-terminal proteomics reveals novel species-specific substrate determinants of granzyme B orthologs. Mol Cell Proteomics. 2009 Feb;8(2):258-72.

2. Dhar PK, Thwin CS, Tun K, Tsumoto Y, Maurer-Stroh S, Eisenhaber F, Surana U. Synthesizing non-natural parts from natural genomic template. J Biol Eng. 2009 Feb 3;3:2.

3. Reumers J, Maurer-Stroh S, Schymkowitz J, Rousseau F. Protein sequences encode safeguards against aggregation. Hum Mutat. 2009 Mar;30(3):431-7.

4. Maurer-Stroh S, Ma J, Lee RT, Sirota FL, Eisenhaber F. Mapping the sequence mutations of the 2009 H1N1 influenza A virus neuraminidase relative to drug and antibody binding sites. Biol Direct. 2009 May 20;4(1):18.

5. Yamamoto Y, Ihara M, Tham C, Low RW, Slade JY, Moss T, Oakley AE, Polvikoski T, Kalaria RN. Neuropathological correlates of temporal pole white matter hyperintensities in CADASIL. Stroke. 2009 Jun;40(6):2004-11.

6. Ooi HS, Kwo CY, Wildpaner M, Sirota FL, Eisenhaber B, Maurer-Stroh S, Wong WC, Schleiffer A, Eisenhaber F, Schneider G. ANNIE: integrated de novo protein sequence annotation. Nucleic Acids Res. 2009 Jul 1;37(Web Server issue):W435-40.

7. Wong WC, Cho SY, Quek C. R-POPTVR: A novel reinforcement-based POPTVR fuzzy neural network for pattern classification. IEEE transactions on neural networks, 2009 Jul 5; v20, n11, pp1740-1755

8. Van Durme J, Maurer-Stroh S, Gallardo R, Wilkinson H, Rousseau F, Schymkowitz J. Accurate prediction of DnaK-peptide binding via homology modelling and experimental data. PLoS Comput Biol. 2009 Aug;5(8):e1000475.

9. Zhao C, Zhang H, Wong WC, Sem X, Han H, Ong SM, Tan YC, Yeap WH, Gan CS, Ng KQ, Koh MB, Kourilsky P, Sze SK, Wong SC. Identification of novel functional differences in monocyte subsets using proteomic and transcriptomic methods. J Proteome Res. 2009 Aug;8(8):4028-38.

10. Eisenhaber F, Kwoh CK, Ng SK, Sung WK, Wong L. Brief overview of bioinformatics activities in Singapore. PLoS Comput Biol. 2009 Sep;5(9):e1000508.

11. Papan C, Chen L. Metabolic fingerprinting reveals developmental regulation of metabolites during early zebrafish embryogenesis. OMICS. 2009 Oct;13(5):397-405.

12. Zhang G, Liu T, Wang Q, Chen L, Lei J, Luo J, Ma G, Su Z. Mass spectrometric detection of marker peptides in tryptic digests of gelatin: A new method to differentiate between bovine and porcine gelatin. Food Hydrocolloids Volume 23, Issue 7, 2009 Oct, Pages 2001-2007

13. Ranganathan S, Eisenhaber F, Tong JC, Tan TW. Extending Asia Pacific bioinformatics into new realms in the “-omics” era. BMC Genomics. 2009 Dec 3;10 Suppl 3:S1.

4 | RESEARCh DIvISIOnS

Dr Vachiranee Limvipuvadh et al received the “Best Poster Award” for their poster entitled “Analysis of the molecular mechanisms of known and predicted disease mutations in LGI epilepsy genes” during the 8th International Conference on Bio-informatics held in Biopolis, Singapore from 7 to 11 September 2009.

The publication of Maurer-Stroh et al. “Mapping the sequence mutations of the 2009 H1N1 influenza A virus neuraminidase relative to drug and antibody binding sites” in Biology Direct (2009) was a highlight since it drew the attention of the scientific and lay public to BII for its scientific work. The finding that drug resistant strains are almost absent from the circulating H1N1 virus population was of general medical importance. This paper has been downloaded by about 11,000 readers since May 2009; thus, it is the most accessed publication on Biology Direct during the last 12 months and it belonged to the 10 most viewed articles in Biomed Central (having more than 200 journals) during May 2009.

PROTEIN SEquENCE ANALySISBased primarily on protein sequence analysis and the analysis of other sequence-associated data (for example, from functional genomics and proteomics studies), the various aspects of molecular and cellular function (enzymatic activities, posttranslational modifications, cleavage, translocation signals, 3D structures, effects of mutations, pathway relationships, etc.) are predicted. This biological insight can then be used for planning experimental validation experiments in cooperation with collaborators from other institutes or in the division’s own protein biochemical laboratory.

Our group has covered a wide range of projects during the last year, from a proteomic analysis of neural cortical stem cells in dicer knock-out mice, over the identification of an epilepsy candidate gene, to the prediction of amyloid fibre-forming peptides. As a characteristic example of our work, the analysis of the new swine-origin H1N1 influenza virus has been among the first published scientific works during the pandemic outbreak. We concluded that, although the virus belongs to a new subtype variety, the mutations tend to not affect the potency of neuraminidase-inhibiting drugs but merely change the antigenic properties of the virus proteins.

We have been critically involved in the establishment of an efficient analysis pipeline of viral sequences in Singapore that was later also extended to partners in Mexico from the Instituto Nacional de Medicina Genomica, the leading genome institute studying the virus sequences close to the source of the outbreak. Our immediate collaborators include the local hospitals and Singapore’s Ministry of Health for samples and GIS A*STAR for the sequencing. The particular contribution of our group is the surveillance of the ongoing evolution of the 2009 H1N1 influenza A virus and the effect that mutations could have on the biology of the virus, the severity of infection and the applicability of available antiviral drugs.

Postdoctoral Fellows: CHEN Li; Vachiranee LIMVIPHUVADH; Roger LOW Wee Chuang; MA Jianmin; Dimitar KENANOV

Research Associates: HAN Hao; WONG Wing Cheong; Raphael LEE Tze Chuen; NEO Keng Hwee; Swe Swe THET PAING

SebastianMAURER-STROH & Frank EISENHABER

4

14. Carugo O. and Eisenhaber F. (editors) Data Mining Techniques for the Life Sciences. Humana Press and Springer Business Media. New York 2009

15. Carugo,O., and Eisenhaber,F. 2009. Preface: Electronic databases in life science research. In Data Mining Techniques for the Life Sciences. O.Carugo, and Eisenhaber,F.E., editors. Humana Press and Springer Business Media. New York. v-viii.

16. Eisenhaber,B., and Eisenhaber,F. 2009. Prediction of Posttranslational Modification of Proteins from Their Amino Acid Sequence. In Data Mining Techniques for the Life Sciences. O.Carugo, and Eisenhaber,F.E., editors. Humana Press and Springer Business Media. New York. 365-384.

17. Ooi,H.S., Schneider,G., Lim,T.-T., Chan,Y.-L., Eisenhaber,B., and Eisenhaber,F. 2009. Biomolecular Pathway Databases. In Data Mining Techniques for the Life Sciences. O.Carugo, and Eisenhaber,F.E., editors. Humana Press and Springer Business Media. New York. 129-144.

18. Ooi,H.S., Schneider,G., Chan,Y.-L., Lim,T.-T., Eisenhaber,B., and Eisenhaber,F. 2009. Databases of Protein–Protein Interactions and Complexes. In Data Mining Techniques for the Life Sciences. O.Carugo, and Eisenhaber,F.E., editors. Humana Press and Springer Business Media. New York. 145-160.

19. Schneider,G., Wildpaner,M., Sirota,F.L., Maurer-Stroh,S., Eisenhaber,B., and Eisenhaber,F. 2009. Integrated Tools for Biomolecular Sequence-Based Function Prediction as Exemplified by the ANNOTATOR Software Environment. In Data Mining Techniques for the Life Sciences. O.Carugo, and Eisenhaber,F.E., editors. Humana Press and Springer Business Media. New York. 257-268.

20. Sohail A, Wenyu B, Lee RTC, Maurer-Stroh S and Wah IG. F-BAR domain proteins: families and function. Communicative & Integrative Biology, in press.

21. Sirota FL, Ooi HS, Gattermayer T, Schneider G, Eisenhaber F and Maurer-Stroh S. Parameterization of disorder predictors for large-scale applications requiring high specificity by using an extended benchmark dataset. BMC Genomics, in press.

22. Limviphuvadh V, Chua LL, Eisenhaber F, Adhikari S, Maurer-Stroh S. Is LGI2 the candidate gene for partial epilepsy with pericentral spikes? Journal of Bioinformatics and Computational Biology, in press.

23. Kawase-Koga Y, Low R, Otaegi G, Pollock A, Deng H, Eisenhaber F, Maurer-Stroh S, and Sun T. RNAase III enzyme Dicer maintains signaling pathways for differentiation and survival in mouse cortical neural stem cells. Journal of Cell Science, in press.

24. Maurer-Stroh S, Debulpaep M, Kuemmerer N, Lopez de la Paz M, Martins IC, Reumers J, Copland A, Serpell L, Serrano L, Rousseau F, Schymkowitz JWH. Exploring the sequence determinants of amyloid structure using position-specific scoring matrices. Nature Methods, accepted.

Figure 1: 3D model of the 2009 H1N1 neuraminidase. The bound antiviral drug is shown in green. Regions differing from the H5N1 avian flu and the 1918 H1N1 Spanish flu are shown in yellow. Mutations occurring among different patients within the first weeks of the 2009 outbreak appear red.

Principal Investigators’ BiographiesSebastian Maurer-Stroh studied theoretical biochemistry in the group of Peter Schuster at the University of Vienna and wrote his master and PhD thesis while working in Frank Eisenhaber’s lab at the Institute of Molecular Pathology (IMP) in Vienna. After a Marie Curie Postdoc fellowship at the VIB-SWITCH lab in Brussels, he joined the A*STAR Bioinformatics Institute (BII) in Singapore where he is heading the Protein Sequence Analysis Group in the Biomolecular Function Discovery Division since 2007. He has contributed widely used predictors for posttranslational modifications and catalyzed new biomolecular insights by sequence-based function predictions. (Details: http://www.bii.a-star.edu.sg/research/biography/sebastianms.php)

Frank Eisenhaber’s research interest is focused on the discovery of new biomolecular mechanisms with theoretical and biochemical approaches and the functional characterization of yet uncharacterized genes and pathways. Frank Eisenhaber is one of the scientists credited with the discovery of the SET domain methyltransferases, ATGL, kleisins, many new protein domain functions and with the development of accurate prediction tools for posttranslational modifications and subcellular localizations. He studied mathematics at the Humboldt-University in Berlin and biophysics and medicine at the Pirogov Medical University in Moscow. He received the PhD from the Engelhardt Institute of Molecular Biology in Moscow. After postdoctoral work at the Institute of Molecular Biology in Berlin-Buch (1989-1991) and at the EMBL in Heidelberg (1991-1999), he worked as teamleader at the Institute of Molecular Pathology (IMP) in Vienna (1999-2007). Since August 2007, he is the Director of the Bioinformatics Institute, A*STAR Singapore. (Details: http://www.bii.a-star.edu.sg/research/biography/franke.php)

RESEARCh DIvISIOnS | 5

Postdoctoral Fellow: Fernanda SIROTA

Research Associates: OOI Hong Sain; Durga KUCHIBHATLA; Tobias GATTERMAYER; Wilson KWO Chia Yee; Nigel TAN Yeow Lam

Georg SCHNEIDER6

RESEARCh DIvISIOnS | 76 | RESEARCh DIvISIOnS

ANNOTATOR SOFTWARE DEVELOPMENTThe ANNOTATOR group is developing software tools for sequence analytic tasks. To this end we integrate a large number of publicly available algorithms while at the same time implementing our own heuristics and workflows.

The ever increasing amount of data flowing into biological databases shows no signs of leveling off. Sequencing technology is improving at an unprecedented rate, bringing down the time it takes to decipher entire genomes to a matter of days. Making sense of this data by predicting molecular function is a time-consuming and tedious manual task. The number of new sequence analytic methods constantly being added to the toolbox of the computational biologist requires knowledge about a vast array of different interfaces, execution parameters and input formats.

The ANNOTATOR group is developing an advanced tool for functional characterization of sequences and strives to establish the ANNOTATOR software environment as the de-facto standard in this field. This is achieved by providing a number of public web-services based on the ANNOTATOR technology with ANNIE (http://annie.bii.a-star.edu.sg)1 being the most recent addition. The scope of work includes the integration of established algorithms as well as research into novel heuristics for tracing distant evolutionary relationships2,3. Due to the complex nature of sequence analytic algorithms, it is necessary to additionally consider aspects of high performance and distributed computing4.

A large number of external algorithms are plugged into the ANNOTATOR and can be used to analyze sequences5. Applicable external algorithms are presented in a way that closely follows the standard procedure for segment based sequence analysis, which is based on the assumption that proteins are chains of functional units that can be analyzed independently with the overall function of the protein arising from the synthesis of the functions predicted for each individual module. The procedure first uses algorithms for

Recent Publications1. Sirota, FL, Ooi HS, Gattermayer T, Schneider ., Eisenhaber F, Maurer-

Stroh S. Parameterization of disorder predictors for large-scale applications requiring high specificity by using an extended benchmark dataset. BMC Genomics 11, S15 (2010).

2. Mujezinovic, N., Schneider, G., Wildpaner, M., Mechtler, K. & Eisenhaber, F. Reducing the haystack to find the needle: improved protein identification after fast elimination of non-interpretable peptide MS/MS spectra and noise reduction. BMC Genomics 11, S13 (2010).

3. Ooi HS, Kwo CY, Wildpaner M, Sirota FL, Eisenhaber B, Maurer-Stroh S, Wong WC, Schleiffer A, Eisenhaber F, Schneider G. ANNIE: integrated de novo protein sequence annotation. Nucleic Acids Res 37, W435-440 (2009).

4. Ooi,H.S., Schneider,G., Lim,T.-T., Chan,Y.-L., Eisenhaber,B., and Eisenhaber,F. 2009. Biomolecular Pathway Databases. In Data Mining Techniques for the Life Sciences. O.Carugo, and Eisenhaber,F.E., editors. Humana Press and Springer Business Media. New York. 129-144.

5. Ooi,H.S., Schneider,G., Chan,Y.-L., Lim,T.-T., Eisenhaber,B., and Eisenhaber,F. 2009. Databases of Protein–Protein Interactions and Complexes. In Data Mining Techniques for the Life Sciences. O.Carugo, and Eisenhaber,F.E., editors. Humana Press and Springer Business Media. New York. 145-160.

6. Schneider,G., Wildpaner,M., Sirota,F.L., Maurer-Stroh,S., Eisenhaber,B., and Eisenhaber,F. 2009. Integrated Tools for Biomolecular Sequence-Based Function Prediction as Exemplified by the ANNOTATOR Software Environment. In Data Mining Techniques for the Life Sciences. O.Carugo, and Eisenhaber,F.E., editors. Humana Press and Springer Business Media. New York. 257-268.

7. Neuberger, G., Schneider, G. & Eisenhaber, F. pkaPS: prediction of protein kinase A phosphorylation sites with the simplified kinase-substrate binding model. Biol Direct 2, 1 (2007).

8. Maurer-Stroh S, Koranda M, Benetka W, Schneider G, Sirota FL, Eisenhaber F. Towards complete sets of farnesylated and geranylgeranylated proteins. PLoS Comput Biol 3, e66 (2007).

9. Schneider G, Neuberger G, Wildpaner M, Tian S, Berezovsky I, Eisenhaber F. Application of a sensitive collection heuristic for very large protein families: evolutionary relationship between adipose triglyceride lipase (ATGL) and classic mammalian lipases. BMC Bioinformatics 7, 164 (2006).

Principal Investigator’s BiographyGeorg Schneider received his PhD from the University of Vienna, Austria. Prior to joining the Bioinformatics Institute he was leading the software development of sequence analytic projects at the Institute of Molecular Pathology in Vienna, Austria. (Details: http://www.bii.a-star.edu.sg/research/biography/georgs.php)

Figure 2: A query sequence is projected onto the pathway graph, which can then be interactively navigated.

the detection of non-globular regions, which are segments with a compositional bias or repetitive patterns that often represent linker regions, fibrillar segments, flexible binding sites or points of post-translational modifications. The subsequent step is to run algorithms for the identification of known globular domains. As a final step, iterative heuristics have to be applied to uncover weak links in sequence space and collect a family of protein sequence segments that contain yet unknown globular domains.

Since our aim is to develop sequence-analytic tools that can be used by biologists, special emphasis is put on the development of a user-friendly interface. An AJAX-based sequence-visualizer allows for the interactive display of function predictions (see Figure 1), while alternative views highlight different aspects of biological objects. As an example, it is possible to get immediate access to the distribution of predicted structural or functional features of a set of sequences using its associated histogram view.

Proteins don’t work in isolation and gaining functional insights increasingly depends on understanding their roles within complexes of interacting partners and pathways. Nevertheless, data is spread across several major databases with each one of them using different models and identifiers. We have created a data-model which allows us to integrate data from many different sources and navigate it in a unified manner. The pathway portion of the viewer has been implemented using an interface inspired by Google Maps (see Figure 2).

The three-dimensional structure of proteins provides additional functional information and the ANNOTATOR framework has been enhanced with a number of methods for mapping sequence conservation to structural models. We are currently also working on incorporating algorithms for homology modeling which will seamlessly integrate with heuristics for the collection of protein families.

Sequence analysis requires the application of a wide range of algorithms, the results of which have to be interpreted and used to decide on further steps. This naturally leads to a view of these tasks as workflows, with decisions being made by the application of rules. We are currently extending the ANNOTATOR with the ability to design workflows and capture rules from biologists who are taking sequence analytic decisions. The final implementation will feature an easy-to-use workflow editor that allows bioinformaticians to connect pre-defined sequence-analytic building blocks into higher-level tasks. For most sequence or structural features there are a number of distinct predictors, whose methods are based on different underlying principles. As an example, several algorithms for the prediction of protein disorder, which plays an important role in structural and functional genomics, are available. In order to use these in automated workflows, it is necessary to identify parameter sets and threshold selections under which the performance of the predictors becomes directly comparable. To this end we derived new benchmark sets and used them to identify settings, in which the different predictors have the same false positive rate6.

Figure 1: Sequence Analysis of LGI2 with highlighted EPTP domain


Postdoctoral Fellows: Neelagandan KAMARIAH; Subhashri RAMABADRAN; TOH Yew Kwang; XIN Hongyi

Research Associates: Michaela SAMMER; GUO Fusheng; Nicholas HO Rui Yuan; LUA Wai Heng; LIEW Lailing; Winnie TAY Yu Ling

EXPERIMENTAL VERIFICATION OF PREDICTED MOLECuLAR AND CELLuLAR FuNCTIONAL PROPERTIES OF uNCHARACTERIZED GENE PRODuCTSThe protein biochemistry/molecular biology group is involved in the verification of sequence-analytic hypotheses with regard to molecular and cellular functions of proteins. The targets are selected from proteome-wide screens or from gene sets provided by collaborators. This work will lead us to new biological insight and to the discovery of new biological processes and mechanisms. In the future, the lab will also generate experimental data that can be used for the development of new prediction tools. Among the ongoing projects, there are nuclear shuttling of transcription factors, parasite proteins of human pathogens, targets in development and differentiation in Drosophila melanogaster, human disease mutations and a structural biology target related to GPI lipid anchor biosynthesis.

Partial epilepsy with pericentral spikes (PEPS) is a familial epilepsy with disease locus mapped to human chromosome region 4p15; yet, the causative gene is unknown. Analysis was performed to all 52 genes known to be located in the PEPS disease map locus (4p15). We found that only 14 of these genes are common to be deleted in patients with similar epilepsy-related seizure phenotype. Based on functional characteristics derived from the sequences of the encoded proteins, we conclude that the gene LGI2 is the most likely candidate to be associated with PEPS. LGI2 has considerable similarity to LGI1, LGI4 and GPR98 for which mutations have been found to be directly associated with different forms of epilepsy. We experimentally investigated the effect of point mutations in LGI1 and LGI2. We observed a reproducible phenotype in terms of lack of protein secretion (resulting in loss of function) for both LGI1 and LGI2 if structurally homologous positions are mutated that are conserved throughout the LGI family and known to cause disease in LGI1. Hence, we suggest that the underlying cellular disease mechanism is similar for LGI1 and LGI2 and each of the LGI family members may be responsible for phenotypically similar, mechanistically related but genotypically distinct forms of epilepsy.

With microscopy, the subcellular localization of the non-secreted mutant variants was studied. COS7 cells were transfected with expression constructs for each mutation including a C-terminal GFP-tag based on the pEGFPN3 vector. COS7 cells with the empty vector were used as controls. We found that LGI1 WT and LGI1 K353E are localized predominantly in the Golgi and partially in the ER. But the other mutants are more predominant in the ER (Fig. A). As for LGI2 and its mutants, they show similar localization in both ER and Golgi with the exception of LGI2 V420E which is localized mainly in the ER (Fig. B).

Subcellular localization of A) LGI1 and B) LGI2 GFP-tagged wild type and mutant protein. COS7 cells were transfected with GFP-fused protein (green) as indicated and treated with either an anti-PDI followed by alexa 546 (red) to stain the endoplasmic reticulum or TR Ceramide to detect the Golgi apparatus and DAPI (blue) to stain the nuclei and then examined by laser fluorescence confocal microscopy. The fields shown were visualized independently at the appropriate wavelength for GFP (488 nm) and anti-PDI or TR Ceramide (546 nm), and then the two images were merged. Original magnification: 63x.

Left to Right: Dr. Sebastian Maurer-Stroh, Mr. Lim Chuan Poh(Chairman, A*STAR), Dr. Sharmila Adhikari, Dr. Vachiranee LimviphuvadhPhoto by Bernard Chan, A*STAR

Principal Investigator’s BiographySharmila Adhikari was appointed as a Principal Investigator at the Bioinformatics Institute A*STAR Singapore in August 2008. She leads a biochemistry and molecular biology lab that aims to bridge the gap between theoretical predictions of proteins with unknown functions and their cellular biology, which can subsequently be used to aid the identification of novel drug targets. She obtained her PhD degree at the National University of Singapore and worked as a postdoctoral research fellow at Department of Pharmacology, Yong Loo Lin School of Medicine, NUS. (Details: http://www.bii.a-star.edu.sg/research/biography/sharmilaa.php)

A poster by the Biomolecuar Function Discovery Division was one of the 3 recipients of the “Outstanding Poster Award” at the A*STAR Scientific Conference held in Singapore from 18 to 20 November 2008.

Recent Publications1. Limviphuvadh, V., Chua, LL., Eisenhaber,F., Adhikari, S., and Maurer-

Stroh, S. Is lGI2 the candidate gene for partial epilepsy with pericentral spikes? J Bioinform Comput Biol. (In press).

2. Limviphuvadh, V., Chua, LL., Eisenhaber,F., Maurer-Stroh, S., and Adhikari, S. Similarity of molecular phenotype between known epilepsy gene LGI1 and disease candidate gene LGI2. (submitted).

3. Grillari J, Löscher M, Denegri M, Lee K, Fortschegger K, Eisenhaber F, Ajuh P, Lamond AI, Katinger H, Grillari-Voglauer R. Blom7alpha is a novel heterogeneous nuclear ribonucleoprotein K homology domain protein involved in pre-mRNA splicing that interacts with SNEVPrp19-Pso4. J Biol Chem. 2009 284(42):29193-204.

4. Tse WK, Eisenhaber B, Ho SH, Ng Q, Eisenhaber F, Jiang YJ. Genome-wide loss-of-function analysis of deubiquitylating enzymes for zebrafish development. BMC Genomics. 2009 30; 10(1):637.

5. Adhikari S, Bhatia M. H2S-induced pancreatic acinar cell apoptosis is mediated via JNK and p38 MAP kinase. J Cell Mol Med. 2008 12(4):1374-83.

6. Benetka,W., Mehlmer,N., Maurer-Stroh,S., Sammer,M., Koranda,M., Neumuller,R., Betschinger,J., Knoblich,J.A., Teige,M., and Eisenhaber,F. Experimental testing of predicted myristoylation targets involved in asymmetric cell division and calcium-dependent signalling. Cell Cycle 2008 7: 3709-3719.

Although several genes responsible for some types of epilepsy have already been identified, clinical description of most epilepsy cases ends at the phenotypic level. Genetic testing is needed to increase our understanding of the underlying molecular basis. We aim to identify mutations in a set of selected genes already known to be associated with certain types of epilepsy as well as to screen reasonable new candidates from epilepsy hotspots. Screening for mutations in epilepsy genes is important to classify genotypically distinct forms of epilepsy which later can be applied for more detailed diagnosis. We also aim to elucidate the molecular mechanisms that trigger the respective epileptic attacks through cell and molecular biology experiments that could lead to the development of better or new drugs in the future.

Sharmila ADHIKARI8


Postdoctoral Fellows: Amor A. SAN JUAN; Madhumalar AMURUMUGAM; Shubhra GHOSH DASTIDAR; Gloria FUENTES; Thomas Leonard JOSEPH; Devanathan RAGHUNATHAN; JAGADEESH M. Nanjegowda

Research Associates: LOW Soo Mei; QUAH Soo Tng; SIAU Jia Wei; TAN Yaw Sing

PhD Student: Suryani LUKMAN

Chandra VERMA10

ATOMISTIC SIMuLATIONS AND DESIGN IN BIOLOGyMechanisms underlying biological processes at a molecular level are explored through identifying and/or mapping the character of proteins and their interactions with other proteins, nucleic acids, ligands. The methods used are computational and combine representations at various levels from the coarse grained to fully atomistic. The work builds upon foundations that are rooted in rigorous computational biochemistry benchmarked extensively against available experimental data. In particular simulations are combined with detailed experimental information through extensive collaborations with experimental laboratories to provide incisive insights into biology at an atomic level. The group’s current research focus is on the p53 pathway, kinases, translation initiation, ATP-synthases, 14-3-3, defensins and basic structural/computational biophysical chemistry.

The usual arsenal of tools are utilised: construction of models based on “imagination with a whiff of hand-waving”, homology modelling, molecular dynamics, free energy, normal modes, reaction paths to examine shape shiftings in proteins, electrostatics, ligand-protein/protein-protein dockings including virtual screening (the docking program also includes the development of novel or modifications of existing scoring methods). On the one hand, the group examines how native and mutant forms of proteins may (mis)function in their interactions, while on the other, there is an extensive program that is directed towards ligand/drug discovery and protein/peptide design both from a therapeutic as well as a (bio)technological perspective.

The group has extensive links with a variety of experimental labs, including the group’s own attempts at “wetting their hands” so that the hypotheses are subject to rigorous testing and validations.

An extensive program investigating the relationship between the structural and functional aspects of the p53 family has revealed several insights, most notably of how the underlying dynamics critically controls interactions, both prior to and after binding. Movies of these processes

Recent Publications1. Lane DP, Cheok CF, Brown CJ, Madhumalar A, Ghadessy FJ, Verma C.

The Mdm2 and p53 genes are conserved in the Arachnids. Cell Cycle (2010 in press).

2. Brown CJ, Dastidar SG, See HY, Coomber DW, Ortiz-Lombardia M, Verma C, Lane DP. Rational design and biophysical characterization of Thioredoxin-based aptamers: insights into peptide grafting. J Mol Biol. 2010 (in press)

3. Lane DP, Cheok CF, Brown C, Madhumalar A, Ghadessy FJ, Verma C.Mdm2 and p53 are highly conserved from placozoans to man. Cell Cycle. 2010 9:1-8.

4. Dastidar SG, Madhumalar A, Fuentes G, Lane DP, Verma CS Forces mediating protein-protein interactions: a computational study of p53 “approaching” MDM2 Theoret Chem Accnts 2010 125:621-635.

5. Brown CJ, Lain S, Verma CS, Fersht AR, Lane DP. Awakening guardian angels: drugging the p53 pathway. Nat Rev Cancer 2009 9:862-873.

6. Dastidar SG, Lane DP, Verma CS Modulation of p53 binding to MDM2: computational studies reveal important roles of Tyr100. BMC Bioinf 2009 15:S6

7. Madhumalar A, Lee HJ, Brown CJ, Lane D, Verma C. Design of a novel MDM2 binding peptide based on the p53 family. Cell Cycle 2009 8:2828-2836.

8. Bai Y, Liu S, Jiang P, Zhou L, Li J, Tang C, Verma C, Mu Y, Beuerman RW, Pervushin K. Structure-dependent charge density as a determinant of antimicrobial activity of peptide analogues of defensin Biochemistry 2009 48:7229-7239.

9. Brown CJ, Verma CS, Walkinshaw MD, Lane DP Crystallization of eIF4E complexed with eIF4GI peptide and glycerol reveals distinct structural differences around the cap-binding site. Cell Cycle 2009 8:1905-1911.

10. Scaltriti M, Verma C, Guzman M, Jimenez J, arra JL, Pederson K, Smith DJ, Landolfi S, Ramon Y, Cajal S, Arribas J, Baselga J Lapatinib, a HER2 tyrosine kinase inhibitor, induces stabilization and accumulation of HER2 and potentiates trastuzumab-dependent cell cytotoxicity Oncogene 2009 28:803-814.

11. Dastidar SG, Lane DP, Verma CS Multiple peptide conformations give rise to similar binding affinities: molecular simulations of p53-MDM2. Jl Amer. Chem Soc 2008 130:13514-13515.

Principal Investigator’s BiographyChandra Verma carried out his undergraduate studies at IIT, Kanpur after which he studied for his D.Phil in York, UK. Subsequently he joined the York Structural Biology lab where he remained until 2003 when he moved to the Bioinformatics Institute, Singapore. (Details: http://www.bii.a-star.edu.sg/research/biography/chandra.php)

Paper titled “Multiple Peptide Conformations Give Rise to Similar Binding Affinit ies: Molecular Simulations on p53-MDM2” was selected by the Journal of the Americal Chemical Society (JACS) as one of the best 23 papers published in the journal in the last two years. Selected best papers were published in JACS Select #3 on 10 Dec 2008.

A Review titled Awakening guardian angels: drugging the p53 pathway by Christopher J. Brown, Sonia Lain, Chandra S. Verma, Alan R. Fersht and David P. Lane, published in Nature Revieaws Cancer 9, 862–873 (1 December 2009) | doi:10.1038/nrc2763 was featured on the Nature Reviews Cancer website

This has led to the design of a unique entropically driven peptide and more recently of novel small molecules that have the potential to be exploited as leads for developing therapeutics.

In a related and highly successful effort that involves close collaboration with the group of Prof Beuerman and Dr Jagadeesh Mavinahalli at the Singapore Eye Research Institute and researchers at Nanyang Tehcnological University and National University of Singapore, the group has successfully been designing and investigating novel antibiotics based on defensins that appear to show selective activity for certain bacteria with little or no activity against human cells. The success of this project, illustrated by the filing of two patents, has attracted seed money and the attention of several industries.The virtual screening efforts of the group have received a recent boost with the successful identification of a set of molecules that appear to show promise as potential lead compounds in the development of antibiotics, targeted against bacterial enzymes. This work is carried out with the Experimental Therapeutics Centre, A*STAR, who is carrying out the experimental investigations and synthesis of compounds that are generated from the virtual screens.

In parallel, the group is also involved in establishing detailed mechanisms that underpin experimental observations in a variety of systems. An ongoing successful effort is with the laboratory of Prof Gruber at the Nanyang Technological University in developing models of the ATPase machinery. Together with detailed structural characterization of elements of this enzyme, we provide, using simulations, an understanding of how dynamics modulate the assembly and function of this complex machine.

We have been investigating mechanisms of signalling among PAK kinases and 14-3-3 proteins, together with Prof Manser of the Institute of Medical Biology, A*STAR. Recently, this has lead to the finding that a commonly used phosphomimetic mutant used to study signalling in PAK1 and assumed to be active, is in fact inactive; simulations validated by experiments have provided mechanistic details on the origin of this finding. This particular development is significant because the phosphomimetic mutation is widely used in kinase studies and insights such as those provided by our studies can provide rapid screens for experimentalists to identify systems where these phosphomimetics are indeed active.

Figure 1: The MDM2/MDMX-p53 binding mechanism depends on the stability of the α-helical motif formed by the residues 19-25 of the transactivation domain of p53 (cyan cartoon) which displays three hydrophobic side chains F19, W23 and L26 (cyan sticks) appropriately for optimal interactions with MDM2.Peptides or small molecule such as nutlin can displace this region of p53 from MDM2, and induce apoptosis by stabilizing p53 in tumour cells. Computer simulations show how the dynamics of the surfaces modulate each other and how the narrower binding cleft in MDMX (in green surface) is less amenable to inhibition by nutlin (clash between the magenta molecule and the green surface is evident) compared to MDM2 (the magenta nutlin fits very well into the orange surface of MDM2). For the first time, simulations carried out by the group have shown how the dynamics of the interacting surfaces controls the interactions and the observed discriminations towards p53 and nutlin by MDM2 and MDMX.

In a major translational effort, the group is engaged with experimentalists (Dr Scaltriti) and clinicians (Prof Baselga) at the val d’Hebron Hospital in Barcelona, studying the effects of small molecule and antibody based therapies for breast cancer. A significant breakthrough has emerged in the understanding of molecular mechanisms that underlies the observation of synergism between kinase inhibitors and antibody-based therapy for breast cancers that are characterized by overexpressed HER2 receptors. This work is now being extended to understand the molecular basis underlying the cooperative interactions that are increasingly being recognized as being of significance for improving the therapeutic potential of existing and new therapies that target the EGFR (and related) receptor families in oncology.

Figure 2: The interactions between therapeutic antibodies pertuzumab (in yellow on left) and tratsuzumab (in black on right) both bound to their receptor, the extra-ceullar domain of HER2 (in green). Both these antibodies have had success in the clinic for breast cancer patients.

show for the first time how intermolecular interactions are orchestrated through mutliple subtle and networked interactions of both partners and how these interactions are mutually modulated. Indeed the thermodynamic basis underlying these interactions revealed for the first time the degeneracy that is inherent in these systems (this work published in JACS was selected for special mention).

This highly successful program is in close collaboration with the p53 laboratory of Prof Sir David Lane. More recently the expertise of the group has led to the development of collaborations with the University of Edinburgh, Beatson Institute, Ludwig Institute, INSERM. In parallel, the effort with the Lane lab is also focused on understanding the mechanisms of the translational initiation cascade and design aptamers, peptides and small molecules to inhibit a key component in this pathway, eIF4E, which offers opportunities as a major target for therapeutic intervention in several cancers. We have recently begun detailed crystallographic, biophysical and simulation based interrogation of the system.

The recent developments and the excitement generated in the p53 field has been outlined in a Nature Reviews in Cancer article that was published recently by the joint efforts of teams from Singapore, Karolinska & Cambridge University.


Postdoctoral Fellows: Kavitha BHARATHAM; Westley A. SHERMAN

Research Associates: ZHANG Zong Hong; Sharon CHEE Min Qi

Ivana MIHALEK12

EVOLuTION OF PROTEIN STRuCTuRE AND FuNCTIONThe aim of Evolution of Protein Structure and Function (EPSF, not to be confused with Encapsulated PostScript Format) group is to reverse engineer the function of a protein through studying its evolution. Bioinformatics is used to get the first inkling of the layout and mechanism of these biological nanomachines, and computer simulation to test, to the extent it currently allows, the reasonableness of the interpretation of bioinformatics data. Ultimately, the goal is to build a straightforward hypothesis which can then be tested experimentally. Therefore, serious effort is invested into developing ways to present the group’s findings in the most useful and compact way to experimentalist colleagues.

How evolution chisels out functionally relevant pieces of a protein?In evolution, as in any statistical process, anything that can happen will happen. Compared, however, to the options open to a simple physical system, “can happen” is a somewhat more elaborate condition. While the physics of DNA stability may allow for a mutation, this mutation might severely degrade the stability of the protein it encodes, which in turn may kill the organism carrying the mutation. A mutation at a different position might be irrelevant to the protein stability, but it may adversely impact its interaction with another protein, thus disrupting a pathway in the hapless organism.

Keeping that scenario in mind, a comparison of proteins performing the same function in living and thriving organisms can be done, identifying the regions in the protein in which mutations, or certain types of mutations, are conspicuously absent. Since it can reasonably assumed that mutations do happen sporadically in those places, as they do in all underlying positions in the DNA, it is possible that the carriers were eliminated from the gene pool because the mutation resulted in some disadvantage for the organism, be it on the translational, folding, or protein-protein interaction level. Therefore, these are precisely the regions crucial for the protein function, the regions we should focus our attention to.

While in some cases detecting such “conserved” regions in a protein is not a very challenging task, our task as bioinformaticians is attending to the cases when the information about the protein from multiple species is scarce, or when the evolutionary correspondence is difficult to interpret. This is an active field of research, in which our group participates.

Figure 1: The correlation between the degree of conservation of a protein region and the impact the mutation has on the organism is most readily observable in the case of enzymes, small chemical factories that are a very common type of protein. In the illustration in Figure 1 (from an enzyme, called HPPD, from the tyrosine degradation pathway), the most highly conserved regions (yellow) cannot be mutated without causing the organism's demise, while the slightly less conserved regions (red), if mutated, cause health problems of various degrees of severity.

Figure 2: Detecting a structure of unknown function (deposited in Protein Databank under identifier 3dcx; blue) as a substructure of protein involved in cell adhesion, called radixin (1gc6; red).

Recent Publications1. Wang H, Chumnarnsilpa S, Loonchanta A, Li Q, Kuan YM, Robine S,

Larsson M, Mihalek I, Burtnick LD, Robinson RC. Helix straightening as an activation mechanism in the gelsolin superfamily of actin regulatory proteins. J Biol Chem. 2009 Aug 7;284(32):21265-9.

2. RibesZamora A, Mihalek I, Lichtarge O, Bertuch AA. “Distinct faces of the Ku heterodimer mediate DNA repair and telomeric functions.” Nat Struct Mol Biol. 2007.

3. Mihalek I, Res I, Lichtarge O. “On itinerant water molecules and detectability of protein-protein interfaces through comparative analysis of homologues.” J Mol Biol. 2007 Jun 1;369(2):58495.

4. Mihalek, I., I. Res, and O. Lichtarge.” Evolutionary Trace Report Maker: a new type of service for comparative analysis of proteins.” Bioinformatics. 2006 Jan 15;22(2):14956.

5. Mihalek, I., I. Res, O. Lichtarge..”A Family of EvolutionEntropy Hybrid Methods for Ranking of Protein Residues by Importance” J. Mol. Bio. 336(5): 126582(2004).

Principal Investigator’s BiographyIvana got her undergraduate degree in physics from U. of Zagreb, Croatia (1993), and her PhD in Physics (2000) and MSc (2001) in computer science from U. of Kentucky, USA. She worked as a postdoctoral fellow in bioinformatics in Baylor College of Medicine, Houston, USA until 2007, when she joined BII. (Details: http://www.bii.a-star.edu.sg/research/biography/ivanam.php)

What is the function of a functionally relevant piece of protein structure?Even in the situations when the conserved regions on the protein are clearly discernible, their functional role may be difficult to interpret. Sometimes this task can be relegated to the experiment. For example, in a study of a protein called Ku (from the large group of telomere-related proteins) the conservation map pointed to several regions, which seemed quite mysterious. The results were turned straight away to experimental colleagues who were able to establish, through the site directed mutagenesis, that several pathways were critically affected, distinct pathways intriguingly assorting with distinct protein regions.

However, it would be desirable to elucidate the role of such regions in silico, thus focusing the experimental work even more narrowly. Several options are open to a computational biologist at this point, depending on the available information about the protein: if the structure is known, computer simulation might shed some light on its role in structural changes the protein undergoes, the small ligands it binds, and interaction with other proteins it engages in.

In our group we use the existing software as a tool to conduct computational experiments designed to test our and our collaborators’ hypotheses about protein function.

Similar Structure – Similar Function?Very similar protein structures often signal a very similar function irrespective of the level of sequence similarity. The rule is not infallible, and contrary examples exist of both the similar structures carrying a different function, and of the so termed convergent evolution in which similar functions are performed by different structures. The latter case can be recognized by a certain degree in local similarity of the structure. We have recently started pursuing

the former question: How far can we get in our bioinformatical analysis by comparing pieces of similar structure, one of which has a known physiological role? In particular if the two (pieces of) structure appear to be conformationally rearranged, does that imply that both actually move in order to perform their job within a cell, and were just caught at different stages of their action?

A Piece of Structure in a HaystackThe first technical hurdle, in answering the above questions, is finding similar pieces of the structure (as in Figure 2) in a database of the size of the modern Protein Data Bank. From the onset of the Information Era people have assembled and deposited, information about tens of thousands different protein structures, from all kingdoms of life, mostly coming from X-ray crystallography and nuclear magnetic resonance experiments. Sorting through this volume of data, in order to find relevant biological answers is an interesting problem for a computer inclined scientist. While computer science teaches us how to retrieve efficiently a well defined entry in a large database, biologically interesting answers are often to be find in a twilight zone of loosely defined (from the computational perspective) hits. Designing algorithms that balance the conflicting requirements of efficiency and non-triviality of the search is currently the focus of independent research in our group.

We are in the process of developing a set of methods, together with the accompanying server (http://epsf.bmad.bii.a-star.edu.sg/struct_server.html) to handle precisely this type of a problem. The main theme of the approach is to deconstruct the protein structure from its gross structural features, such as the relative orientation of its elements of secondary structure (helices and strands) down to the minutiae of its atomic makeup, and use them in that order to retrieve, and sort by relevance, similar pieces of structure, appearing either as complete entries in Protein Data Bank, or as a substructure thereof. The trick, of course, is to explain to the computer how this is to be done, without losing the underlying ideas to the weak implementation. This is the point at which we put on our computer scientist’s hat. Next, biology comes into play: how do we keep the search general enough to give useful answers to molecular biologists specializing in different proteins and pathways? How do we use our findings as a bridge to more of the existing knowledge, and yet present the results in a way that is parseable both by a human and

by downstream computer applications? As the underlying quantity of data increases, processing them becomes an ever larger task, but the reward lies in the promise of giving ever more focused answers to scientific queries.

EPSF Group in the SciencelandThe main role of a bioinformatician is sieving through the existing (already daunting and relentlessly growing) amount of biological data and finding facts pertinent to the problem at hand. Working in protein science, we are lucky enough to be able to push that search a step further through explicit simulation of the physical systems in question. Immediate goals aside, as computational biologists we are trying to push forward the point at which the experiment needs to be invoked. Ultimately, we are biologists, and consider our contributions to the experimentally testable work our most important accomplishments.


MODELING THE 3D STRuCTuRES OF PROTEINS AND THEIR COMPLEXESThe broad aim of our group is to develop and apply computational tools to model the structural biology of molecular interactions in the living cell. To this end, we combine the laws of physics with experimental observation and statistics to develop computational methods in structural biology. The methods are tested, often in close collaboration with experimental biologists, on particular systems of interest. Our research results in detailed information of cellular processes and provides testable hypotheses that can then be verified experimentally. In particular, we are interested in the following problems:

1. Improving the accuracy of comparative protein structure modeling and functional annotation

We are developing methods to accurately align protein sequences to protein structures. Accurate alignments are key to accurately modeling the 3D models of proteins. These efforts include using a structure based environment dependent gap penalty function [Madhusudhan et al., 2006, Madhusudhan et al., 2009], and substituting single sequences with their profiles during the alignment process.

Methods that improve on modeling accuracy are often beneficial in improving the accuracy of functional annotation of proteins. Previously, we tested a decision tree based approach to predict the structural/functional stability of single point mutants [Bajaj et al., 2007], starting with the crystal structure of the native protein. We extended the stability predictions to be made using homology models instead of experimental structures. The stability prediction method was also improved upon, with additional branches to the decision tree that incorporated evolutionary information in the form of sequence conservation. The effect of alignment errors on the decision

Postdoctoral Fellows: Debashree BANDYOPADHYAY; NGUYEN Ngoc Minh

Research Associates: Rowena CHEONG Wai Sim; TAN Kuan Pern

Mallur Srivatsan MADHUSUDHAN

14tree accuracy by using both sequence and structure alignments to build the models, as well as the effect of using single and multiple templates was studied. Our results showed that while small errors in alignment accuracy did not change the prediction of stability, the use of multiple templates improved upon the prediction accuracy of models built using single templates.

2. Aligning the 3D structures of proteins independent of their topology

We devised a new algorithm, CLICK, to align the 3D structures of proteins using their Cartesian coordinates, secondary structure content and residue-wise surface accessible areas. CLICK aligns pair of protein structures independent of their topology. This is a powerful method to investigate structural similarity across protein folds and protein families. CLICK is effective in not only giving the optimal alignment between two protein structures but can also in detecting conformational changes, such as domain motions and rigid body shifts. The method was extensively benchmarked on several datasets of pairs of structurally similar proteins, both topologically similar and topologically dissimilar. The method was also compared with other frequently used structure alignment algorithms. CLICK performs at the same level of accuracy as these other methods, if not statistically significantly better. We are now using CLICK to detect small molecule binding sites on proteins (figure 1), and protein-protein interaction interfaces on proteins (figure 2). The application of CLICK is not restricted to aligning protein structures. The algorithm can be readily generalized and used to align the 3D structures of any two molecules, such as DNA, or RNA.

3. Predicting protein-protein interactionsHomology modeling is used to construct the models of target interacting protein complexes. The method predicts the protein constituents of the interacting complex and its 3D structure. The 3D structures of these complexes are modeled using the structural similarity of the target proteins to constituents of known (template) protein domain-domain interactions [Davis et al., 2006, Pieper et al., 2005]. The complexes predicted are not restricted to pairs of proteins. If multi-domain templates are present, multi-component interacting protein complexes are predicted. The complexes are assessed using a statistical potential constructed from residue contacts across known protein domain-domain interfaces. Prediction scores were calibrated for reliability. On a benchmark set of 100 interactions, the statistical potential accurately predicted interactions in 97 cases. The method is also capable of distinguishing between alternate modes of binding (Figure 3). Additional information, such as functional annotation, and sub-cellular localization can be used to enhance reliability. We are now developing methods that a) Model protein interactions without the aid of full-length (entire domain coverage) templates and b) Model all biological complexes, not restricted to protein interactions.

Figure 1: The 3D structures of A.fulgidus Rio2 Kinase (PDB code 1zao; magenta) and purt-encoded glycinamide ribonucleotide transformylase (PDB code 1kj9; cyan) are superimposed using CLICK. The two proteins are unrelated by protein fold, function, or sequence. The regions of structural similarity between the two proteins lie in their ATP binding sites. The inset shows the similarities (dashed blue lines denote hydrogen bonds) in the interaction between the proteins and the ATP molecules that have been bound to them. The conformation of the bound ATP is similar in both proteins. These similarities are despite the aforementioned differences between the proteins.

Figure 2: On the left is the 3D structure of the protein assembly of two proteins that are a part of the cellulosome from clostridium thermocellum (PDB code 1aoh, chains A and B; grey and purple respectively). The interacting interfaces between the two proteins are shown in yellow and green. CLICK was used to detect regions on other proteins that are similar to these interacting interfaces. A region similar to interface in 1aohA was found in 1g7kA (right, top), and a region similar to interface in 1aohB was found in 1g7kB (right, bottom). The crystal structure of the complex of 1g7k, a red fluorescent protein from coral, shows similar interface association to 1aoh. 1aoh and 1g7k belong to different protein families and their overall 3D fold is different. Though the interaction interface regions are topologically different, CLICK is able to detect their structural similarity.

Figure 3: 3 Camelid VHH domains AMB7 (blue), AMD10 (orange) and AMD9 (green) bind to porcine pancreatic α-amylase (PPA, gray surface) through three distinct binding modes (PDB codes 1kxt, 1kxv, and 1kxq, respectively). All three interaction modes were evaluated for each VHH–PPA complex using the interface statistical potential. The statistical potential is sensitive enough to distinguish the native binding modes from the non-native modes.

Recent Publications1. Madhusudhan MS, Webb BW, Marti-Renom MA, Eswar N, Sali A.

Alignment of multiple protein structures based on sequence and structure features. Protein Eng Des Sel. 2009 22, 569-74.

2. Bajaj K, Madhusudhan MS, Adkar BV, Chakrabarti P, Ramakrishnan C, Sali A, Varadarajan R. Stereochemical criteria for prediction of the effects of proline mutations on protein stability. PLoS Comput Biol. 2007; 3(12):e241.

3. Davis FP, Braberg H, Shen MY, Pieper U, Sali A, Madhusudhan MS. Protein complex compositions predicted by structural similarity. Nucleic Acids Res. 2006 34(10):2943-52

4. Pieper U, Eswar N, Davis FP, Braberg H, Madhusudhan MS, Rossi A, Marti-Renom M, Karchin R, Webb BM, Eramian D, Shen MY, Kelly L, Melo F, Sali A. MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 2006;34(Database issue):D291-5.

5. Madhusudhan MS, Marti-Renom MA, Sanchez R, Sali A. Variable gap penalty for protein sequence-structure alignment. Protein Eng Des Sel. 2006; 19(3):129-33.

Principal Investigator’s BiographyM. S. Madhusudhan joined the Bioinformatics Institute as a Principal Investigator in 2008. He received a Masters degree in Physics from the University of Pune and a PhD in Biophysics from the Molecular Biophysics Unit, Indian Institute of Science. This was followed by post-doctoral work in the lab of Andrej Sali at the Rockefeller University and University of California, San Francisco. (Details: http://www.bii.a-star.edu.sg/research/biography/madhusudhan.php)

Postdoctoral Fellows: Oleg GRINCHUK; Efthimios MOTAKIS; Philip PRATHIPATI; Arsen BATAGOV; KYAW Tun; YENAMANDRA Surya Pavan

Research Associates: Piroon JENJAROENPUN; OW Ghim Siong; Zack TOH Swee Heng;

PhD students: Thidathip WONGSURAWAT; Mikhail LUKYANOV

Vladimir KUZNETSOV16

COMPuTATIONAL ANALySIS OF GENOME COMPLEXITy, TRANSCRIPTION REGuLATION AND CELLuLAR PHENOTyPES

16 | RESEARCh DIvISIOnS RESEARCh DIvISIOnS | 17

Principal Investigator’s BiographyVladimir Kuznetsov was appointed Principal Scientist and Head of the Genome and Gene Expression Data Analysis Division at the Bioinformatics Institute (BII), A*STAR in November 2007. Since 2007, he holds adjunct professor appointments in the Mathematics Department of the NUS, and in the School of Computing Engineering of NTU, Singapore. He received a PhD in Biophysics at Moscow State University (Russia) in 1984. In 1992, he received a Doctor of Science degree in Mathematics and Physics at the Technical Union of Russian Academy of Sciences (St. Petersburg, Russian Federation). In 1992-1998, he established and led the laboratory of Mathematical Immuno-biophysics in the Institute of Chemical Physics (Moscow). In 1995, he was awarded a prestigious one year scholar grant by the American Cancer Society/International Union against Cancer and then worked as a researcher scholar at the Laboratory of Molecular Tumor Biology, Centre for Biological Evaluation, FDA (Bethesda, MD, USA). He later worked as scientist at National Institutes of Health (MD, USA), where he was involved in the NIH Cancer Genome Anatomy Project. He also served as Chief Scientist at the Civilized Software Inc. (Bethesda, MD) and as a Senior M staff at System Research Applications International Inc. (Farfax, VA, USA). In 2004-2007, he developed several systems biology and computational genomics projects at GIS/A-STAR as a Senior Group Leader. In 1994, he was awarded the P.L Kapitsa Silver Medal “To the Author of Scientific Discovery” and elected as a Corresponding Member the Russian Academy of Natural Sciences. He has published two books, over 100 research papers and reviews. He is a member of the editorial boards of BMC Biology Direct, BMC Genomics and Journal of Integrative Bioinformatics. (Details: http://www.bii.a-star.edu.sg/research/biography/vladimirk.php)

Recent Publications1. Kuznetsov VA, Singh O., Jenjaroenpun P. Statistics of protein-DNA binding

avidity and estimating the total number of binding sites of a transcription factor in the mammalian genome. BMC Genomics 2010, 11 (Suppl 1):S12 (10 Feb 2010).

2. Kanapin AA, Mulder N, Kuznetsov VA. Projection of gene-protein networks to functional space of proteome and its application to analysis of organism complexity. BMC Genomics 2010, 11 (Suppl 1):S4 (10 Feb 2010).

3. Grinchuk OV, Motakis E., Kuznetsov VA. Complex sense-antisense architecture of TNFAIP1/ POLDIP2 on 17q11.2 represents a novel transcriptional structural-functional gene module involved in breast cancer progression. BMC Genomics 2010, 11 (Suppl 2):S9 (10 Feb 2010).

4. Winston Koh, Chen Tian Sheng, Betty Tan, Vladimir Kuznetsov, Lim Sai Kiang, Vivek Tanavde. MicroRNA Expression Profile of Human Embryonic Stem Cells Derived Mesenchymal Stem Cells (hES-MSC) by Deep Sequencing Reveals Possible Role of Let-7 microRNA Family in Downstream Targeting of Hepatic Nuclear Factor 4 Alpha (HNF4A) BMC Genomics 2010, 11 (Suppl 1):S6 (10 Feb 2010).

5. Grinchuk OV, Jenjaroenpun P, Orlov YL, Zhou Jiangtao and Kuznetsov VA, Integrative analysis of the human cis-antisense gene pairs, miRNAs and their transcription regulation patterns, Nucleic Acids Res. 2009, Nov 11. [Epub ahead of print]

6. Kuznetsov VA: Relative avidity, specificity, and sensitivity of transcription factor-DNA binding in genome-scale experiments. Methods Mol Biol. 2009, 563:15-50.

7. Jenjaroenpun P, Kuznetsov VA. TTS Mapping: Integrative WEB Tool for Analysis of Triplex Formation Target DNA Sequences, G-quadruplets and Non-protein Coding Regulatory DNA Elements in the Human Genome, BMC Genomics 2009 Vol. 10 (Suppl 3) :S9, doi: 10.1186/1471-2164-10-S3-S9.

8. Motakis E, Ivshina AV. & Kuznetsov VA. Data-driven approach to predict survival of cancer patients. IEEE Engineering in Medicine and Biology 2009, 28, 58-66.

9. Motakis E and Kuznetsov VA. Genome-scale identification of survival significant genes and gene pairs, Lecture Notes in Engineering and Computer Science.(Proc. of World Congress on Engineering and Computer Science, San Francisco, USA, 20-22 Oct., 2009; Eds: S.I Ao, C. Douglas, W.S. Grundfelt& J. Burgstone), IA ENG ,Newswood Limited, vol. I, 41-46, 2009. ISBN:978-988-17012-6-8.

10. Prathipati P, Ma NL, Manjunatha UH, Bender A. Fishing the target of antitubercular compounds: in silico target deconvolution model development and validation. J Proteome Res. 2009, 8, 2788-98.

11. Grinchuk O, Motakis E. and Kuznetsov VA. Identification of complex sense-antisense gene’s module on 17q11.2 associated with breast cancer aggressiveness and patient’s survival. In: World Academy of Science, Engineering and Technology(WASET), (Editor-in-Chiff: Cemal Ardil), vol. 58, pp.1046-1056. Venice, Italy; October, 2009. ISSN: 2070-3724.

12. Kashuba E, Yenamandra SP, Darekar SD, Yurchenko M, Kashuba V, Klein G, Szekely L. (2009) MRPS18-2 protein immortalizes primary rat embryonic fibroblasts and endows them with stem cell-like properties. Proc Natl Acad Sci U S A. 2009 Nov 10. [Epub ahead of print]PMID: 19903879]

13. Yenamandra SP, Sompallae R, Klein G, Kashuba E. Comparative analysis of the Epstein-Barr virus encoded nuclear proteins of EBNA-3 family. Comput Biol Med. 2009 Nov;39(11):1036-42. Epub 2009 Sep 16.PMID: 19762010 [PubMed - in process]

14. Tun K, Rao RK, Samavedham L,Tanaka H, Dhar, P. Rich can get poor: conversion of hub to non-hub proteins. Syst Synth Biol. 2008 Dec;2(3-4):75-82. Epub 2009 Apr 28.

We develop integrative computational and statistical analyses of massive sequence datasets, matrixes of DNA-transcription factor interactions, genome architectures, cis-antisense gene pairs, ncRNAs and genome regulatory signals. We study the functions of these sequences in transcriptional regulation and gene networks. We predict and study the genes and genome modules which are specifically associated with distinct phenotypes of cancerous cells and which could be essential for cancer patient’s survival. We also aim to develop novel theoretical and computational frameworks for functional and structural analyses of co-regulation of protein-coding genes and ncRNAs that integrates high-throughput sequencing and expression data sets relating to mechanisms of transcriptional control of the ncRNAs processing, and ncRNA functions. In this context the associations with sense-antisense gene pairing, G-quadruples, triplex forming oligonucleotide structures, RNA secondary structures are under our consideration. The in silico predictions are assumed to be validated through wet-lab experiments.

Selected Projects:1. TTS (Triplex Target DNA Site) mapping WEB tool

and its applicationsWe have developed TTS mapping method (Figure 1) which provides comprehensive visual and analytical tools to help users to find TTSs and their co-localizations with G-quadruplets transcription factors (TFs), micro-RNA (miRNA) precursors, CpG Island and other regulatory DNA elements in the human genome regions. Moreover, applications of this tool motivated us to suggest that some ncRNAs could provide specific control of transcription via forming natural triplexes and quadruples with genomic DNA. In particular, TTS Mapping reveals that ncRNA precursor of mir-483 is formed by the high-complementary and evolutionarily conserved pair of polypurine- and polypyrimidine- rich

DNA tracks. We predict that such paired sequences can produce the triplex forming ncRNAs (miRNAs and siRNAs) and thus might be involved in silencing or activating mechanisms of expression of many essential genes and might be used for antigene therapy purposes.

2. Statistical and computational analysis of protein-DNA binding in the mammalian genome

TF-DNA binding loci are explored by analyzing massive datasets generated by application of Chromatin Immuno-Precipitation (ChIP)-based high-throughput sequencing technologies. However, these datasets suffer from a bias in the information about binding loci availability, sample

Figure 3: A. USA GP Model: Integrative database and computational tool for comprehensive analysis of cis-antisense gene pairing and expression in the human genome. http://globalisland.bii.a-star.edu.sg/ ~jiangtao/sas/index3.php?link.

3. Cis-antisense gene pairing (CASGPs) phenomenaComprehensive catalogue for cis-antisense gene pairs and complex genome architectures

Cis-antisense gene pairs (CASGPs) can transcribe mRNAs from opposite strands of a given locus. To classify and understand diverse CASGP phenomena in the human genome, we compiled a genome-wide catalog of CASGPs and integrated these sequences with microarray, SAGE and miRNA data in our United Sense Antisense Gene Pairs (USA GP) Database (http://globalisland.bii.a-star.edu.sg/~jiangtao/sas/index3.php ?link = about). We identified up to 9,000 of overlapping antisense loci. 4374 of these CASGPs form 1759 complex gene architectures. For the first time, we found strong significant overrepresentation of human miRNA genes in loci of CASGPs. Using USA GP, we found Structural-Functional Modules of Cis-antisense gene pairs with precursors of microRNAs. We developed a data-driven model of cross-talk between co-expressed CASGPs and DICER1-mediated miRNA pathway in normal spermatogenesis and found that this cross-talk is switched off in severe teratozoospermia. We discovered a novel cancer amplicon (TMEM97, IFT20, TNFAIP1, POLDIP2 and TMEM199) organized in a complex sense-antisense architecture (CSAGA) on 17q11.2 and demonstrated its strong and reproducible co-regulatory transcription pattern in breast cancer tumours. (Figure 3). Data analysis of expression profiles of 410 breast cancer patients revealed survival significance of these genes and identified patients with low and high risk of the disease recurrence.

The group received “The Best Paper Award” at The First International Conference on BioMedical Engineering and Informatics, 27-30 May 2008, Sanya, Hainan, China for a Paper titled “Data-driven Networking Reveals 5-Genes Signature for Early Detection of Lung Cancer’.

incompleteness and diverse sources of technical and biological noise. We have developed an exploratory mixture probabilistic model for specific and non-specific transcription factor-DNA (TF-DNA) binding (Figure 2). Within ChIP-seq data sets, the statistics of specific and non-specific DNA-protein binding is defined by a mixture of sample size-dependent skewed functions described by Kolmogorov-Waring (K-W) function (Kuznetsov, 2003) and exponential function, respectively (Figure 2). Using available ChIP-seq data, we estimate (i) specificity and sensitivity of the ChIP-seq binding assays and (ii) the number of specific but not experimentally validated binding sites (BSs) in the genomes of cancers and embryonic stem cells. We conclude that estimation of the binding sensitivity of a TF cannot be technically resolved by current ChIP-seq, compared to former techniques. Our results suggest that low- and moderate- avidity TFBSs are highly abundant in the mouse and other mammalian genomes and can play biologically meaningful functional roles.

Figure 2: TFA-DNA binding in a ChiP-seq experiment and its modeling. Our modeling of the frequency distribution of relative avidity of TF(Nanog)-DNA binding loci in the genome of mouse embryonic stem cells (ChIp-Seq data by Chen et al.2008).

Figure 1: TTF mapping (http://ggeda.bii.a-star.edu.sg/~piroonj/TTS_mapping/ TTS_mapping.php <http://ggeda.bii.a-star.edu.sg/%7Epiroonj/TTS_mapping/TTS_mapping.php>).


Adult stem cells have the potential to differentiate into a wide variety of tissue specific cells. These cells can therefore be used to treat a variety of disorders ranging from myocardial infarction to osteoporosis. Mesenchymal stromal cells (MSC) which are the non hematopoietic cells found in the marrow have been used in many such therapies. Although these cells are already being used clinically, we know very little about the mechanisms these cells use to differentiate to different lineages. My group aims to understand signaling pathways involved in mesenchymal stromal cell differentiation and how these pathways are regulated. For this purpose we developed an approach to gather information about cellular signaling from gene expression data. Using this approach, we identified 3 pathways critical for MSC growth and differentiation. We are also using this approach to understand differentiation of embryonic stem cells. Currently my laboratory is also involved in understanding the role of micro RNAs (miRNAs) in MSC differentiation.

Projects:1. Development of serum free medium for culturing MSC. This

project was carried out in collaboration with Invitrogen. Using time course gene expression analyses of differentiating MSC, we identified three pathways critical for growth and differentiation of MSC (Ng et al Blood 2009). This information was used to develop STEMPRO® MSC-SFM, the first commercially available serum free medium for culturing MSC.

2. Identification of miRNAs secreted by MSC and their role in indirect regulation of signaling networks. This project was carried out in collaboration with Sai Kiang Lim from the Institute of Medical Biology (IMB). Her group had sequenced small RNAs found intracellularly as well secreted by MSC in exosomes. We identified differentially secreted miRNAs from the next generation sequencing data & showed that the let-7 family of miRNAs regulated a network of genes with the transcription factor HNF4A

Research Associates: Betty TAN Bee Tee; LEE Qian Yi; Michelle KWAN Kah Yian

Vivek TANAVDE18

as its hub (Koh et al BMC Genomics 2010). The HNF4A gene has no predicted miRNA binding sites in its untranslated region. However using next generation sequencing we were able to predict and experimentally verify that let-7 family of miRNA indirectly regulates expression of HNF4A.

3. Identification of miRNAs important in differentiation of MSC into bone cartilage and fat. miRNA’s have been shown to be important regulators of differentiation in embryonic and hematopoeitc stem cells. In this project we aim to identify differentially expressed miRNAs in fetal limb derived MSC as they differentiate into bone cartilage and fat. The miRNA expression profile coupled with mRNA expression profile of the same cells will enable us to identify miRNAs that are critical in trilineage differentiation of MSC & the genes and signaling pathways they target (direct as well as indirect targeting) to achieve this regulation.

4. Identification of tranlsationally regulated genes in embryonic stem cell differentiation. In this study we were able to successfully identify translationally regulated genes in differentiating human embryonic stem cells using microarray. This project is being carried out in collaboration with Prabha Sampath at Institute of Medical Biology, A*STAR.

5. Identification of differentially expressed signaling pathways in Lamin A mutants. Lamin A is a protein that controls movement of macromolecules across the nuclear membrane. Using microarray, we identified differentially expressed genes in Lamin A mutant cells subjected to stress. This enabled us to predict the signaling pathway responsible for the Lamin A mutant phenotype. This project is being carried out in collaboration with Colin Stewart at Institute of Medical Biology, A*STAR.

6. Identification of biomarkers for assessing response of toxic compounds to human and murine embryonic stem cells. Embryonic stem cells have the potential to serve as valuable tools to test toxicity of different compounds. As models of embryonic stem cell differentiation develop, our ability to use this information to screen compounds that affect these differentiation processes should also improve. In this project we aim to identify biomarkers for assessing the toxic response of a drug to neuronal differentiation of MSC. This project is carried out in collaboration with Suzanne Kadereit, University of Konstanz.

Recent Publications1. Winston Koh, Chen Tian Sheng, Betty Tan, Qian Yi Lee, Vladimir Kuznetsov,

Lim Sai Kiang, Vivek Tanavde. (2010) Analysis of Deep sequencing microRNA expression profile from human embryonic stem cells derived mesenchymal stem cells reveals possible role of let-7 microRNA family in downstream targeting of Hepatic Nuclear Factor 4 Alpha. BMC Genomics (In Press)

2. Lai RC, Arslan F, Tan SS, Tan B, Choo A, Lee MM, Chen TS, Teh BJ, Eng JK, Sidik H, Tanavde V, Hwang WS, Lee CN, Oakley RM, Pasterkamp G, de Kleijn DP, Tan KH, Lim SK. (2010) Derivation and characterization of human fetal MSCs: An alternative cell source for large-scale production of cardioprotective microparticles. J Mol Cell Cardiol. (In Press)

3. Vivek M. Tanavde, Lailing Liew, Jiahao Lim and Felicia Ng (2009) Signaling Networks in Mesenchymal Stem Cells. In: Regulatory Networks in Stem Cells, V.K. Rajasekhar, M.C. Vemuri (eds.), Humana Press.

Principal Investigator’s BiographyDr. Vivek Tanavde joined the Bioinformatics Institute, Singapore as a Research Scientist in the Genome & Gene Expression Data Analysis in 2006. Prior to joining BII, he was heading the Hematopoietic Stem Cell Lab at Reliance Life Sciences, Mumbai where his work focused on developing mesenchymal stromal cell based therapies for cardiac and neuronal disorders. From 1999 to 2002 he was a post doctoral fellow with Dr. Curt Civin at the Sidney Kimmel Cancer Centre, Johns Hopkins University working on expansion of hematopoietic stem cells from umbilical cord blood. Dr. Tanavde obtained his Ph.D from the Cancer Research Institute, Mumbai (1999) in Applied Biology. (Details: http://www.bii.a-star.edu.sg/research/biography/vivek.php)

4. Ng F,.Boucher, S, Koh S, Sastry K. S., Chase L, Lakshmipathy U, Choong C, Yang Z, Vemuri M. C, Rao M. S, Tanavde, V.(2008) PDGF, TGF-b and FGF signaling is important for differentiation and growth of mesenchymal stem cells (MSCs): transcriptional profiling can identify markers and signaling pathways important in differentiation of MSC into adipogenic, chondrogenic and ostoegenic lineages. Blood. 112(2):295-307

5. Shah VK, Desai AJ, Vasvani JB, Desai MM, Shah BP, Lall TK, Mashru MR, Shalia KK, Tanavde V, Desai SS, Jankharia BJ. (2007) Bone marrow cells for myocardial repair-a new therapeutic concept. Indian Heart J. 59(6):482-90.

6. P. Shetty, K. Bharucha, V. Tanavde (2007) Human umbilical cord blood serum can replace fetal bovine serum in the culture of mesenchymal stem cells. Cell Biol International 31.293-298.

Figure 1: HNF4A is a common hub for networks derived from alignment data and TargetScan predictions. Gene interaction network in this figure is derived from the dataset of genes with overlapping regions corresponding to peaks from previous mapping.

Figure 2: This figure shows the gene interaction network derived from computationally predicted gene targets from TargetScan. A similar topology was observed for gene interaction networks in Figures 1 and 2, with HNF4A as a node amongst the interactions suggesting HNF4A as a possible downstream target for let-7 family miRNAs.

EXPRESSION AND SIGNALING IN MESENCHyMAL AND HEMATOPOIETIC STEM CELLS

Igor KUROCHKIN20


PREDICTIVE AND FuNCTIONAL ANALySIS OF LONG NONCODING RNASOur group is focused on the discovery and functional analysis of transcripts expressed from the human genome that do not encode proteins. In particular, we are interested in the cellular roles of the so called long noncoding RNAs (lncRNAs) defined as RNAs longer than 200 nucleotides. This goal is achieved by exploiting bioinformatics and multiple molecular and biochemical approaches. Over the past decade, numerous cDNA cloning and sequencing projects and genome-tiling array analyses revealed that the mammalian genomes are almost entirely transcribed leading to the generation of the tens of thousands of ncRNAs. Diverse ncRNA species include short miRNAs, piRNAs and much longer ncRNAs (lncRNAs). Although the involvement of miRNAs in various biological processes including cellular proliferation and differentiation is increasingly evident, the function of much more abundant class of lncRNAs is largely unknown. Few studies performed so far on biological role of lncRNAs suggest extremely diverse mechanisms of action of this class of molecules. Computational prediction of lncRNA function thus faces a serious challenge of decoding the information contained within the sequence of these molecules. The task is complicated by the fact that RNAs can encode not only sequence-specific interactions using base-pairing rules but also assume various secondary and tertiary structures that bind to proteins important for transcription or epigenetic modifications. We aim to develop a novel computational framework for functional and structural analyses of lncRNAs that integrates high-throughput data related to transcriptional control of lncRNA, secondary structure of these molecules, evolutionary conservation, and functional annotation of co-expressed protein-coding RNAs. The predictions are assumed to be validated through wet-lab experimentation.

Information about when and where lncRNAs are expressed is useful for probing their function. Thus the understanding of the function and mechanistical aspects of lncRNAs action will be facilitated by the analyses of dynamical changes in their expression that occur

Postdoctoral Fellows: Antonis GIANNAKAKIS; Aliaksandr YARMISHYN

in the course of development, cell differentiation and response to the environmental stress conditions. At the initial stage of this project we set out to identify those lncRNAs whose expression is significantly changed during retinoic acid-induced neuronal differentiation of human neuroblastoma cell line SH-SY5Y. A custom-built oligonucleotide microarray profiling revealed that a small fraction of lncRNAs was highly regulated. A large part of these transcripts mapped to the intronic regions of the protein-coding genes. About 21% of the intragenic lncRNAs mapped to the annotated genes in antisense direction, in line with the previous reports that over 20% of human transcripts might form sense-antisense pairs. Most of these antisense lncRNAs correlated positively in their expression pattern with the sense strand of the genes with a small minority showing negative correlation.

While antisense lncRNAs are expected to regulate the expression or stability of their sense counterparts, the functional role of the intergenic lncRNAs remains a mystery. We specifically focus on this group of transcripts, hoping to discover novel functions performed by RNA molecules. In order to address an issue of functionality of intergenic lncRNAs, we determined their conservation in different eukaryotic genomes. The analysis revealed that although regulated intergenic lncRNAs are not conserved at the level of their entire sequence, most of them contain short patches of highly conserved sequences. In addition, very high conservation levels are observed in the promoters of the regulated lncRNAs. Interestingly, retinoic acid-induced intergenic lncRNAs were often found to be adjacent to the genes encoding transcription factors. Among them, we identified a lncRNA located in the homeobox D gene cluster that is significantly up-regulated during neuronal differentiation (Figure 1). Hox genes are known to play a regulatory role in patterning of the CNS, as well as in cell specification. A major issue of whether intergenic lncRNAs can be a driving force behind activation of the neuronal-specific program is addressed by analyzing the effects of knockdown and overexpression of selected lncRNAs on the whole-genome expression pattern and cell phenotype.

Recently, integrative bioinformatics and experimental approaches enabled us to predict several novel proteins with possible roles in peroxisomal biochemistry and metabolism. The transport of most proteins into the peroxisomal matrix is mediated by two poorly defined peptide motifs, PTS1 and PTS2. In our approach, we combined computational searches of PTS1 and PTS2 motifs in protein sequence databases with the analysis of co-occurring motifs, expression patterns, secondary structure properties, orthologues and variants, literature search and manual curation. This approach has predicted the long-sought peroxisomal processing protease encoded by Tysnd1 gene. This protease was demonstrated to localize to peroxisomes and process enzymes catalyzing all steps of the peroxisomal α-oxidation pathway of fatty acids, thus suggesting its involvement in the control of lipid metabolism (Figure 2). The U.S. Patent Office has granted us a patent for the method of screening for agents that modulate Tysnd1 levels or activity in cells. The issuance of this patent represents a step forward in the development of drugs for treatment of peroxisomal disorders.

Principal Investigator’s BiographyIgor Kurochkin joined the Bioinformatics Institute in 2009. He received his B.S. from Kiev National University, majoring in biochemistry, and then earned a Ph.D. in molecular biology from the Institute of Molecular Biology and Genetics in Kiev. After postdoctoral work in the School of Pharmaceutical Sciences at Toho University, Japan (1990-1993), he joined the Holland laboratory of American Red Cross, MD as a visiting research fellow supported by the International Fellowship from the Fogarty International Center, NIH (1993-1995). During 1996-2002, he was a research scientist at Chugai Pharmaceutical Co., Ltd. (now a member of the Roche group). He returned to the academic sector as a research scientist in RIKEN Genomic Sciences Center, Japan (2002-2009). (Details: http://www.bii.a-star.edu.sg/research/biography/igork.php)

Recent Publications1. Mizuno Y, Kurochkin IV, Herberth M, Okazaki Y, Schönbach C. Predicted mouse

peroxisome-targeted proteins and their actual subcellular locations. BMC Bioinformatics. 2008 Dec 12;9 Suppl 12:S16.

2. Zhang L, Volinia S, Bonome T, Calin GA, Greshock J, Yang N, Liu CG, Giannakakis A, et al. Genomic and epigenetic alterations deregulate microRNA expression in human epithelial ovarian cancer. Proc Natl Acad Sci U S A. 2008 May 13;105(19):7004-9.

3. Kurochkin IV, Mizuno Y, Konagaya A, Sakaki Y, Schönbach C, Okazaki Y. Novel peroxisomal protease Tysnd1 processes PTS1- and PTS2-containing enzymes involved in beta-oxidation of fatty acids. EMBO J. 2007 Feb 7;26(3):835-45.

4. FANTOM Consortium; RIKEN Genome Exploration Research Group and Genome Science Group (Genome Network Project Core Group). The transcriptional landscape of the mammalian genome. Science. 2005 Sep 2;309(5740):1559-63.

5. Kurochkin, IV, Nagashima T, Konagaya A, Schönbach C. Sequence-based discovery of the human and rodent peroxisomal proteome. Appl Bioinformatics. 2005;4(2):93-104.

Figure 1: Expression of a novel intergenic lncRNA associated with HOXD cluster genes is significantly induced during neuronal differentiation of human neuroblastoma cells.

Figure 2: Computationally predicted peroxisomal protein Tysnd1 is localized to peroxisomes where it processes enzymes involved in -oxidation pathway of fatty acids.

Texture segmentationImage segmentation is indispensable in many applications. It facilitates the extraction of useful information for subsequent high level image analysis. For instance, in pathological research, digital microscopy becomes increasingly popular since the introduction of high-throughput tissue microarrays (TMA) into bioimaging communities. It is therefore crucial to segment each tissue image into a meaningful partition in an accurate, fast, automated and robust manner. In particular, textures extracted from the image have higher discriminant power than intensity. Since a high-dimensional feature space is usually considered for texture segmentation, the data is very sparse as for each given pixel the number of relevant features is usually small. Therefore, a practical approach is needed to filter out the irrelevant features to make the description of each segment by its feature more compact.

We propose a novel image segmentation model, called the Subspace Mumford-Shah model, which incorporates subspace clustering techniques into a Mumford-Shah model to solve texture segmentation problems. To optimize the objective, our first attempt is to use a supervised procedure to determine several optimal subspaces. These subspaces are then embedded into a Mumford-Shah objective function so that each segment of the optimal partition is homogeneous in its own subspace. The method outperforms standard Mumford-Shah models since it can separate textures which are less separated in the full feature space. The method also has an increased robustness and convergence speed compared to existing subspace clustering methods. Experimental results are presented to confirm the usefulness of subspace clustering in texture segmentation. To make our work more practical, our next goal is to develop a fully unsupervised approach to optimize the objective.

quantitative analysis of neural stem cellsUnderstanding the biology of neural cells is important in designing treatments for neural related diseases. A unique trait of neural cell is that it has neurites that connect other neural cells. These neurites outgrowth is a fundamental characteristic of neurons and they eventually form synapses and proper functioning of the nervous system depends on the formation of proper connections.

In this study, a high throughput image screening system is developed. This system includes the microscopy setup as well as advanced


COMPuTER VISION AND PATTERN DISCOVERy FOR BIOIMAGESThe group focuses on developing advanced computer vision, machine learning and mathematical models to elucidate the complex behavior of biological systems. We analyze images from wide-field, confocal, single plane illumination microscopes, including data sets from high-throughput image screens.

Field theoretical image restorationMicroscopy has become a de facto tool for biology. However, it suffers from a fundamental problem of poor contrast with increasing depth, as the illuminating light gets attenuated and scattered and hence can not penetrate through thick samples. The resulting decay of light intensity due to attenuation and scattering varies exponentially across the image. The classical space invariant deconvolution approaches alone are not suitable for the restoration of uneven illumination in microscopy images. We developed a novel physics-based field theoretical approach to solve the contrast degradation problem of light microscopy images.

Our proposed formulation is radically different from all existing physics based restoration techniques, in which we do not assume constant extinction coefficient in the attenuating medium. Moreover, in our formalism, we make no distinction between the image object and the attenuating medium. We derived a general set of equations to handle any geometrical setup in the image acquisition. To use our method, one only needs to specify details of the light source and the detection equipment such as a camera.

In our formalism, we assume a volume of interest in which our biological sample resides. As in most field theories, volume is divided into infinitesimal elements. Light intensities in each volume elements are then calculated based on the physical principle of light attenuation and scattering. This allows us to calculate the amount of light emitted from each infinitesimal volume element. We then relate this to the light detected by the CCD/photomultiplier tube. In this way, from the information collected by the CCD/photomultiplier tube and the relationship between the amount of light emitted and amount of light detected, we can restore the image and remove the problem of light attenuation and scattering. We apply our theory on confocal microscopy and show using controlled experiments that our restoration method works.

Postdoctoral Fellows: Ivy LAW Yan Nei; YU Weimiao; Patrick KOH Yang Wei; CELIK Turgay

Research Associate: YAP Choon Kong

LEE Hwee Kuan 22

Segmentation of semi-transparent objectsWe consider the problem of segmenting two overlapping objects whose intensity level in the intersection is approximately the sum of the level of the individual objects. This is a fundamental image processing task with many real world applications especially those that involve some measurements of concentration using imaging techniques.

Examples include X-ray images, images of absorbent paper with mouse scent marks, and microscopy images recording protein expression levels. Although many applications of such a model can be found, there has been very little, if not none, study of this problem.

We propose a variant of the Mumford-Shah model for the segmentation of overlapping objects with additive intensity value. Unlike standard segmentation models, it does not only determine distinct objects in the image, but also recover the possibly multiple membership of the pixels. To accomplish this, some a priori knowledge about the smoothness of the object boundary is integrated into the model. Additivity is imposed through a soft constraint which allows the user to control the degree of additivity and is more robust than the hard constraint. We also show analytically that the additivity parameter can be chosen to achieve some stability conditions. To solve the optimization problem involving geometric quantities efficiently, we apply a multi-phase level set method. Segmentation results on synthetic and real images validate the good performance of our model.

Recent Publications1. W. Yu, H. K. Lee, S. Hariharan, W. Bu, S. Ahmed, “Evolving Generalized

Voranoi Diagram of Active Contours for Accurate Cellular Image Segmentation”, Cytometry Part A, 2010 (accepted).

2. Y. N. Law, H. K. Lee, A. M. Yip, “Semi-supervised Subspace Learning

for Mumford-Shah Model Based Texture Segmentation”, Optics Express, 2010 (accepted).

3. H. K. Lee, M. S. Uddin, S. Sankaran, S. Hariharan, S. Ahmed, “A field theoretical restoration method for images degraded by non-uniform light attenuation : an application for light microscopy”, Optics Express, 2009; 17(14): 11294-11308

4. Y. N. Law, H. K. Lee, A. M. Yip, “Segmentation of Semi-transparent

Objects Using a Variant of the Mumford-Shah Model”, Proceedings of the 2009 International Conference on Image Processing, Computer Vision & Pattern Recognition, Volume II.

5. Y. N. Law, H. K. Lee, A. M. Yip, “Supervised Texture Segmentation Using the Subspace Mumford-Shah Model”, Proceedings of the 2009 International Conference on Image Processing, Computer Vision & Pattern Recognition, Volume II.

6. W. Yu, H. K. Lee, S. Hariharan, S. Sankaran, P. Vallotton, S. Ahmed,

“Segmentation of Neural Stem/Progenitor Cells Nuclei within 3-D Neurospheres”, Advances in Visual Computing, ISVC 2009, Lecture Notes in Computer Science , 2009; 5875: 531-543.

7. Q. Ho, W. Yu, H. K. Lee, “Region Graph Spectra as Geometric Global

Image Features”, Advances in Visual Computing, ISVC 2009, Lecture Notes in Computer Science , 2009; 5875: 253-264.

8. W. Yu, H. K. Lee, S. Hariharan, W. Bu, S. Ahmed, “Detection and

Quantitative Measurement of Neuronal Outgrowth in Fluorescence Microscopy Images”, Proceedings of the Medical Image Understanding and Analysis (MIUA) 2009.

Figure 3: Nucleus segmentation of the neurosphere.

Figure 1A: Image of skin cells with a mutant annotated by a polygon.

Figure 1B: Detection of this mutant automatically by our intelligent vision system, red representing high probability of occurrence of mutants.

Figure 2: Images of a neurosphere.

computer vision applications to process the images. Our task in the computer vision and pattern discovery group is to develop fast and accurate software to process thousands of images generated through the high throughput microscopy system.

Our new method combines the level-set and watershed methods in a specific way to achieve fast and accurate segmentation of the neural cells. Neural cells have outgrowths and cytoplasm that touch each other. Many algorithms in the literature could not separate cells that touch each other. Our method is designed specifically to overcome this difficulty.

Our method performs much better than currently available software, the error rate of our method, validated against a set of about 6000 cells is 6.5% while the error rate for METAMORPH on the same set of data is 25.5%.

The paper titled “Level Set Segmentation of Cellular lmages based on Topological Dependence” was awarded the “Mitsubishi Electric Research Laboratories Best Paper Award” at the 4th International Symposium on Visual Computing, 1-3 Dec 2008, Las Vegas, USA.

Principal Investigator’s BiographyHwee Kuan obtained his PhD in Theoretical Condensed Matter Physics from Carnegie Mellon University in 2001. He then held a joint postdoctoral position with Oak Ridge National Laboratory (USA) and University of Georgia where he worked on advanced Monte Carlo methods and nano-magnetism. In 2003, with an award from the Japan Society for Promotion of Science, Hwee Kuan moved to Tokyo Metropolitan University where he developed solutions to extremely long time scaled problems and a reweighting method for nonequilibrium systems. In 2005 he returned home to join Data Storage Institute, proposing a novel magnetic recording method using magnetic resonance. In 2006, he joined Bioinformatics Institute as a Principle Investigator in the Imaging Informatics Division. (Details: http://www.bii.a-star.edu.sg/research/biography/leehk.php)

Geometric global image featuresIn quantitative biology studies such as drug and siRNA screens, robotic systems automatically acquire thousands of images from cell assays. Because these images are large in quantity and high in content, detecting specific patterns (phenotypes) in them requires accurate and fast computational methods. To this end, we have developed a geometric global image feature for pattern retrieval on large bio-image data sets. This feature is derived by applying spectral graph theory to local feature detectors such as the Scale Invariant Feature Transform, and is effective on patterns with as few as 20 keypoints. We demonstrate successful pattern detection on synthetic shape data and fluorescence microscopy images of GFP-Keratin-14-expressing human skin cells.

Principal Investigator’s Biography Martin Wasser obtained the Biology degree from the University of Cologne in Germany in 1993. In 1998, he received his PhD in Molecular and Cell Biology from the IMCB in Singapore. As a PhD student and postdoc, he has conducted extensive research on the role of nuclear architecture and chromatin structure in cell proliferation and animal development. While doing postdoctoral wet-lab research he obtained a Master’s degree in Knowledge Engineering from the Institute of Systems Science. Before joining BII he worked as a research fellow at the Temasek LifeScience Laboratory in Singapore. Since 2007 he has been heading a research team at the BII, focusing on live-cell imaging and the development of image analysis systems. (Details: http://www.bii.a-star.edu.sg/research/biography/martinw.php)


Postdoctoral Fellows: Rambabu CHINTA; Janos KRISTON-VIZI; DU Tiehua

Research Associates: PUAH Wee Choo; Gina PAN Jinghong; TAN Joo Huang; Rahul KUMAR

Martin WASSER24

LIVE CELL IMAGING AND AuTOMATION OF IMAGE ANALySISThe group is interested in studying animal development using 3D time-lapse microscopy and computer vision. Their principal goal is to build image analysis systems that can recognize tissues, cells and organelles in multi-dimensional image data and measure their static and dynamic properties. The major research activities are directed at constructing the components of a computational pipeline and integrating them into semi-automated image analysis systems. Computational pipelines cover preprocessing, segmentation, feature extraction, classification and cell tracking. Currently, the efforts are directed at the phenotypic characterization of two biological processes in the model system Drosophila melanogaster; (1) Cell cycle progression of embryonic cells and (2) apoptosis and remodeling of muscle cells during metamorphosis.

In 2009, BII acquired a Zeiss 5 Live high-speed confocal laser scanning microscope (Figure 1). This instrument will enhance the group’s ability to produce images of live cells in sufficient quality and quantity to support algorithm development as well as biological discovery.

quantitative Microscopy of Cell Cycle Progression in Drosophila EmbryogenesisThe study of cells in their natural tissue environment promises to uncover novel insights into the mechanics and regulation of cell proliferation that cannot be easily gained from observations of cultured cells in Petri dishes. Drosophila embryos are a powerful system in which the dynamics of synchronized nuclear as well as non-synchronized cell division can be easily monitored by tagging chromosomes with fluorescent fusion proteins. 3D movies acquired by time-lapse microscopy are not only pretty to look at but also provide a rich source of quantitative cellular features, such as DNA content, which combined with derived categorical features, such as the cell cycle phase, will be useful in characterizing the function of known and unknown genes. However, the task of analyzing colossal amounts of multi-dimensional image data is not trivial. To address this challenge we have developed collection of tools for image segmentation, feature extraction, tracking, classification, visualization, annotation, validation of computer vision algorithms and file conversion. Image measurements rely heavily on the accuracy of the chosen segmentation algorithm. We have developed a fast 3D nuclear segmentation method that adapts to inhomogeneous signal intensities, poor signal to noise ratios and histone-GFP localized to the cytoplasm (Figure 2). To improve the performance of computer vision methods and to support biological interpretation, we apply machine learning for cell cycle phase classification (Figure 3). To test our approach in the phenotypic characterization of gene function we applied our image analysis pipeline to 3D live cell movies of diploid wildtype and haploid mutant embryos. Our analysis has provided new insights into the function of the maternal haploid gene and the control of the size of the nucleus.

Tissue Destruction and Remodeling in MetamorphosisThe second biological theme is the destruction and remodeling of tissues during metamorphosis. The group focuses on the muscular system and uses fluorescence live cell imaging to study apoptosis of obsolete and remodeling of persistent larval into adult muscles. The structural organization of muscles is accompanied by initially decreasing and later increasing thickness of the muscle fiber. Therefore, studying the dynamics of muscle remodeling in flies might evolve into an animal model for muscle atrophy and hypertrophy. A challenge in studying developmental by 3D time-lapse microscope is that rapid tissue movements can affect visualization and quantitative analysis. To overcome this problem, a non-rigid stack registration method was developed. In an Editorial of Cytometry A (75A:279281, 2009), this study was highlighted as “a true masterpiece of cell analysis”.

Recent Publications1. Ong SM, Zhao Z, Arooz T, Zhao D, Zhang S, Du T, Wasser M, van Noort

D, Yu H (2009). Engineering a scaffold-free 3D tumor model for in vitro drug penetration studies. Biomaterials, [Epub ahead of print].

2. Rambabu Chinta, Wee Choo Puah, Martin Wasser (2009). 3D segmentation for the study of Cell Cycle Progression in Live Drosophila Embryos. International Conference on Biomedical Electronics and Devices, First International Workshop on Medical Image Analysis and Description for Diagnosis Systems - MIAD 2009, Porto, Portugal, 14-17 Jan 2009.

3. Du Tiehua and Wasser Martin (2009). 3D Image Stack Reconstruction in Live Cell Microscopy of Drosophila Muscles and its Validation. Cytometry A. 2009 Apr, 75(4): 329-43.

4. Wasser Martin, Zalina Bte Osman, Chia, William (2007). “EAST and Chromator control Muscle Destruction and Remodeling in Drosophila Metamorphosis”. Developmental Biology, Vol. 2, 380-393.

5. Wasser, Martin and Chia, William (2007). The extrachromosomal EAST Protein of Drosophila can associate with Polytene Chromosomes and regulate gene expression. PLoS ONE 2: e412.

Figure 1: BII’s Zeiss 5 Live confocal microscope is used for multi-dimensional imaging of live cells.

Figure 2: Automated 3D nuclear segmentation method.

Figure 3: Machine learning techniques are used for automatic phenotypic classification. Here cell cycle phases are assigned to segmented nuclei.

26 | IT SCIEnTIFIC SERvICES IT SCIEnTIFIC SERvICES | 27

BIO-COMPuTING CENTRE

Team Leader: YONG Tai Pang

Team Members: Caleb KHOR Ken Swee, Zahari Jeffrey, Johnny LIM Gek Wee, CHAN Ang Loon, Charlie TAN Chee Khiong, TOE Chin Siang, HARRON Hanafi and Violet LIN Liling.

The IT Services department provides all IT technical services to the Institute’s research and administrative needs. Such services include web services, scientific, storage and networking infrastructural support as detailed below:

Scientific Computing TeamThe role of the Scientific Computing team is to provide technical support and expertise in the areas of compute, storage and networking, in a manner which suits the unique needs and requirements of the various BII scientific divisions. The team’s main focus is on providing highly customized IT resources on demand, at short notice, while seeking innovative and elegant architectures and solutions. Our areas of specialization include:a. High throughput Linux clustersb. Distributed, general purpose file systemsc. High volume IP networks for large data transfersd. Data backup, replication and archivale. Design, implementation and operation of corporate services

Enterprise Computing System Team The Enterprise Computing System Team provides infrastructural support in the areas of desktop computing. Bearing in mind the unique needs and work culture of scientists, the team has provided commodity as well as custom built high-end workstations, to meet the technical needs of our users.

The team also provides ad-hoc server-side compute facilities on demand, on a project by project basis. For example, a group of scientists may require an entry-level server to host services including FTP, MySQL, Apache, or SVN. Thus the team will design, deploy and maintain the necessary server hardware, storage and networking based on the specific requirements laid down by our users.

Web Services TeamThe Web Services team is mainly responsible for the design and maintenance of the BII home pages on the Internet and our Intranet web sites. They are also in charge of developing, upgrading and the on-going maintenance of BII e-Services such as the room-booking system and the e-Calendar.

Other services provided by the Web Services team include graphic and multimedia design, web site design and layout, scanning and photography services, web publishing support and web and database hosting. It is important to ensure that the electronic information published by BII is of high accuracy and quality as the BII home page and its associated web pages are the first point of contact between the institute and public. Hence, the quality of the content published is fundamental in upholding the strong reputation and image of the Institute. The team ensures that the information published electronically is accurate, visually appealing, clearly presented and complies with the Singapore Government Web Interface standards set by IDA.

IT SCIEnTIFIC SERvICES26

Two major projects were completed in year 2009 with a focus on increasing the scientific computing resources of BII.

1. Construction of new BII Data Centre

In 2009, BII has set up its own data center to house its scientific computing facilities and corporate servers. This data center at Matrix has a total capacity of 33 racks comprising of high (7kW) and medium (3kW) density racks and network racks. Additionally 6 racks at 2 kW/rack were catered for at the new BII development room.

Considerable redundancy (N+1) was installed for the equipment serving the data center where possible, including UPS (uninterruptible power supply) and CRACs (Computer Room Air Conditioning) units which are critical to the operation of the data center.

2. Computational Clusters

BII currently owns and operates 2 clusters to meet its needs:

a. Annotator Cluster This consists of 24 compute nodes, 2 job schedulers and a

pair of high end production and developmental servers. The cluster was put together and tuned specifically to meet the unique workflow characteristics of the Annotator project.

b. Cluster for Molecular Dynamic Simulations and Generic Computation

This consists of 46 compute nodes, each fitted with 8 CPUs (3Ghz) and 32GB memory, sharing 14TB of storage capacity. 24 nodes are on Infiniband interconnect while the rest are on Gigabit Ethernet.

There are plans to expand the cluster by a batch of latest generation (Nehalem) compute nodes in 2010.

2. ScreensaverThis is a collaboration project with the Institute of Molecular and Cell Biology, A*STAR (IMCB), Harvard Medical School (HMS) and The Netherlands Cancer Institute (NKI). The objective is to assist our collaborator from IMCB to setup and customize Screensaver, an open source LIMS for high-throughput screening developed by HMS. Work completed to date includes customizing Screensaver user interface, changing upload/export file tailored to IMCB format, and integrating analysis tools in Screensaver, e.g. Annotator and cellHTS2. The collaboration with HMS and NKI involves code development, particularly in integrating Screensaver with cellHTS2. The team is also working together with the Annotator Group on integrating Screensaver and the Annotator. A local version of Screensaver production is currently hosted at BII.

CHEOK Leong Poh Photo by Vivek Tanavde, BII YONG Tai Pang Photo by Vivek Tanavde, BII

SOFTWARE ENGINEERING

Team Leader: CHEOK Leong Poh

Team Members: LUA Seow Chin, Mohamed HANIFA, NG Wee Thong, VOON Kian Loon

The software engineering team is made up of software engineers who work in close collaboration with scientists from BII and A*STAR’s research institutes to address their needs in scientific software solutions. At present, the team has a number of collaboration projects with BII’s scientific groups, as well as other A*STAR’s research institutes. Majority of the projects undertaken by software engineering team are Java web-based applications with relational database, and constantly involve other open source projects. The team’s current focus is on Laboratory Information Management System (LIMS) related software projects with Genopolis/BASE and Screensaver as prominent examples.

Roles and Responsibilities:• Translate user requirements into design and technical

specifications, develop and deliver software solutions to meet scientific objectives.

• Provideexpertise insoftwareengineeringtoassist researchgroups in BII and other institutes.

• WorkcloselywithotherBII’sITteamstoleverageontheirITinfrastructures and services to provide comprehensive IT solutions to the collaborators.

• BridgeBIIwithotherA*STAR’sresearch institutesthroughscientific software collaborations.

Project Highlights1. Genopolis/BASE

This is a collaboration project with Singapore Immunology Network, A*STAR (SIgN). The objective is to assist SIgN to find and replace its existing LIMS software – Genopolis with BASE, which can support both Affymetrix and Illumina platforms. BASE is an open source web-based database solution for microarray experiments supported by Lund University. Initial development work completed includes helping SIgN to setup and migrating data from Genopolis to BASE, and re-implementing Genopolis analysis features in BASE. Current phase is focus on simplifying user data entry, storing additional annotation information, improving user interface and usability. Local version of BASE production was launched in April 2009 and is currently hosted at BII.

3. BioImage Informatics on the WWWThis is an internal collaboration project with Computer Vision and Pattern Discovery Group. The objective is to create a web-based application integrating with several image processing algorithms. The application is made up by Java Applet as web front-end allowing user to upload and display images, and the uploaded images will be sent over for server-site image processing. The software is currently in development.

28 | ADJunCT SCIEnTISTS vISITInG SCIEnTISTS | 29

Prof. Roger Beuerman’s team at the Singapore Eye Research Institute (SERI) has with Dr. Chandra Verma’s group at BII developed novel antimicrobials for some of the most resistant forms of bacteria and also fungus. The combination of ocular chemo-molecular abilities with the computational design efforts of Dr. Verma’s group has been very productive. They have, over the last 5 years, successfully developed unique molecules which formed the basis of two patents. They have received around SGD$3M in grants from the Singapore government in support and have generated considerable commercial interest. Most recent, they have designed a molecule that has shown spectacular activity against bacteria from patients that have shown resistance to commonly used antibiotics. This is opening up a new frontier in efforts to tackle the growing problem of bacterial resistance, both in the clinic, and, worryingly, more recently, outside the clinic. Also, recently a promising anti-fungal agent has been identified. The problem of resistance requires the urgent development of new antibiotics to prevent it from assuming epidemic proportions, and the teams of Prof. Beuerman, Dr. Verma and associates at Nanyang Technological University have formed a synergistic collaboration for this purpose.

Dr. Nathan Andrew Baker Associate ProfessorDept. of Biochemistry and Molecular BiophysicsCenter for Computational Biology Washington UniversitySt. Louis, USAVisit Period: 6 - 8 April 2009

Dr. Gary McMaster Chief Scientific OfficerAffymetrix, Inc., Fremont, California, USAVisit Period: 23 April 2009

Prof. Patrice Koehl Professor, Computer Science Associate Director of Bioinformatics Genome Center, University of California, USAVisit Period: 18 May 2009

Dr. Marc A. Marti-Renom Head of the Structural Genomics UnitBioinformatics & Genomics Department Prince Felipe Research CenterValencia, SpainVisit Period: 12 - 17 July 2009

Prof. Philippe Derreumaux Director UPR9080 CNRS, IBPC at CNRSProfessor at University Paris Diderot - Paris 7Paris, FranceVisit Period: 22 July 2009

Dr. M. Michael Gromiha Senior Research Scientist Computational Biology Research CenterNational Institute of Advanced Industrial Science and Technology (AIST)Tokyo, JapanVisit Period: 4 - 5 October 2009

Prof. Alexander Lyubartsev Professor Division of Physical ChemistryStockholm University, SwedenVisit Period: 22 October 2009

vISITInG SCIEnTISTS FOR Fy200929

ADJunCT SCIEnTISTS AT BII28

A/Prof. Gerhard Grüber has longstanding experience in structure-function of multi-subunit complexes like the classes of ATP synthases (A1AO ATPsynthase, F1FO ATPsynthases) and hydrolases (V-ATPase, Helicase and AAA-ATPases). In order to get insight into the structure of these macromolecular complexes, techniques like solution X-ray scattering, X-ray crystallography, NMR- and fluorescence spectroscopy are used in his laboratory. In a collaborative project with Dr. Frank Eisenhaber (BII, A*STAR) the 45 kDa subunit PIG-K of the glycosylphosphatidylinositol transamidase complex was generated, purified and the first low resolution solution structure of this protein has been determined. Since two successful years of collaboration with Dr. Chandra Verma (BII, A*STAR), a platform has been generated to implement the experimental structural data into docking and molecular dynamics in order to provide an atomic level insight into the structure, dynamics and energetics of the coupling subunits in the biological motor proteins.

A/Prof. Gerhard GrüberAssociate Professor and Deputy Head, Division of Structural and Computational Biology, School of Biological Sciences, Nanyang Technological University

Dr. Lim Yoon Pin’s laboratory is interested in the discovery of 1) novel oncogenes in breast cancer; 2) novel tyrosine kinase substrates in oncogenic EGFR signaling and 3) biomarkers in gastric cancer. He has served as an advisor for the gastric cancer knowledgebase created by BII. In collaboration with BII, an online interactive biological interaction network (BIN) of EGFR signaling has also been generated and this is hosted within BII’s webpage. The BIN, which is an important resource for researchers in the field of EGFR research, is constantly being updated as new data are being produced in Dr. Lim’s laboratory.

Dr. Lim yoon PinSenior Scientist,Cancer Science Institute of SingaporeAssistant Professor, Department of Biological Sciences,National University of Singapore

Dr. Birgit EisenhaberResearch Scientist,Mass Spectrometry Group,Experimental Therapeutics Centre, A*STAR

Prof. Roger BeuermanSenior Scientific Director,Singapore Eye Research InstituteProfessor, SRP in Neuroscience and Behavioral Disorders,Duke-NUS Graduate Medical School

With a strong background in protein sequence analysis and function prediction, Dr. Birgit Eisenhaber, currently affiliated with the Experimental Therapeutics Center, A*STAR (ETC), provides her expertise in collaboration projects within ETC and with other A*STAR units in biomolecular mechanism-focused research. On the one hand, the link with BII allows her to leverage on the bioinformatics infrastructure, especially the usage of the ANNOTATOR suite in the research; on the other hand, BII benefits from methodical developments and from supervision of interns and new incoming staff.

Dr. Kristian Vlahovicek Head of the Division of Biology Bioinformatics Group Department of Molecular BiologyDivision of Biology Faculty of ScienceUniversity of Zagreb, CroatiaVisit Period: 22 October 2009

Prof. Constantino Tsallis Professor Brazilian Center for Physics Research BrazilVisit Period: 14 - 22 November 2009

Prof. Frederic Rousseau Group Leader Flanders Institute for Biotechnology (VIB)Free University of Brussels BelgiumVisit Period: 13 - 18 December 2009

Prof. Joost Schymkowitz Group Leader Flanders Institute for Biotechnology (VIB)Free University of Brussels BelgiumVisit Period: 13 - 18 December 2009

Dr. Remo Rohs Associate Research Scientist Columbia University New York, USAVisit Period: 21 December 2009

Dr. Vasily V. Kuvichkin Senior Staff Scientist Laboratory of Mechanisms of ReceptionInstitute of Cell Biophysics of the Russian Academy of SciencesMoscow, RussiaVisit Period: 15 February - 8 March 2010

30 | SCIEnCE OuTREACh ACTIvITIES COnFEREnCES AnD vISITS | 31

SCIEnCE OuTREACh ACTIvITIES30

BII’s achievements go beyond scientific development through extending its contributions to the organization of two major events which were the highlights during Science. 09. This annual national event promoting scientific awareness to the general public is jointly organized by the Agency for Science, Technology and Research (A*STAR) and the Singapore Science Centre.

Biopolis Flu Forum4 September 2009Auditorium, Matrix @ Biopolis

The Science Centre and the A*STAR Research Institutes took part in the Biopolis Flu Forum as part of Science.09. At the Biopolis Flu Forum, a panel of scientists and doctors gathered to discuss, debate and answer to the audience about general and controversial questions concerning the swine flu 2009 outbreak. A wide range of questions from “what sort of steps are public health officials taking to safeguard the public?” to “have news organizations played a responsible role in educating and explaining events to the public?” were addressed and covered during the forum.

Photo by Tobias Gattermayer, BII

Photos by Fernanda Sirota and Christine Low, BII

The Bioinformatics Institute organised and participated in international conferences, symposiums and conducted training workshops such as follows:

Conference and Symposiums• International Conference on Bioinformatics 2009

(InCoB 2009) 7 – 11 September 2009

• uK-Singapore Partners in Science Symposium: uK-Singapore Symposium on Current Strategies in Antimicrobial Therapies 16 – 17 March 2009

Organised by the BII, A*STAR, the Singapore Eye Research Institute (SERI) and the British High Commission Singapore

• School of Biological Sciences, Nanyang Technological university (SBS)-Bioinfomatics Institute Joint Symposium

21 October 2009 Co-organised by SBS, NTU and BII, A*STAR

• uK-Singapore Symposium on p53: The Next 30 years 25 – 26 November 2009 Organised by the A*STAR, BII and British High Commission

Singapore

• 1st Singapore-Italy Joint Symposium on Biomedical Sciences 10 – 11 December 2009 Organised by the A*STAR BMRC RIs with support from

Regione Lombardia

COnFEREnCES AnD vISITS31

UK-Singapore Partners in Science Symposium: UK-Singapore Symposium on Current Strategies in Antimicrobial Therapies, 16 – 17 March 2009Photo by Vivek Tanavde, BII

UK-Singapore Symposium on p53: The Next 30 Years, 25 – 26 November 2009Photo by British High Commission Singapore

Training Workshops• The Advanced Flow Cytometry Workshop 23 – 24 July 2009 Organised by Biopolis Shared Facilities, A*STAR and BII,

A*STAR

• Joint BII - Department of Biological Sciences, National university of Singapore (DBS) Workshop - Modern Approaches to Biological Problems

3 – 4 August 2009 Organised by DBS, NUS and BII, A*STAR

• Joint School of Computer Engineering, Nanyang Technological university (SCE)-BII Workshop on Bioinformatics and Computational Biology

28 September 2009 Organised by SCE, NTU and BII, A*STAR

Visits to Bioinformatics InstituteThe Bioinformatics Institute has also hosted a number of delegations such as follows:

• The Hungarian delegation from the 2nd Singapore-Hungarian Symposium on Biomedical Devices and Computational Sciences

24 April 2009

• The NuS-Zhejiang delegation 6 July 2009

• The Max Planck delegation 29 July 2009

• The Italian delegation from the 1st Singapore-Italy Joint Symposium on Biomedical Sciences

11 December 2009

Visit by the Italian Delegation on 11 December 2009Photo by Vivek Tanavde, BII

X-periment! Science Carnival14 – 16 August 2009Marina Square Central Atrium

BII’s participation in the X-periment! science carnival allowed the audience to explore “The World of Proteins” from their building blocks, the amino acids, to their complex structures and evolutionary relationship. The visitors were able to build up an amino acid and fold a small protein motif on the palm of their hands. In addition, they interacted with different protein molecules with the help of a game controller and see how humans can be related to flies by playing the evolutionary protein sequence game.

AcknowledgementsFernanda Sirota played a key role as BII’s representative in both events. She was a member of the Biopolis Flu Forum working committee and was assisted by Tobias Gattermayer, Sebastian Maurer-Stroh and the BII’s web team in some aspects.

For the X-periment! Science Carnival, Fernanda invigorated the carnival with many innovative ideas. The success of the 3-day weekend event was achieved with valuable participation and contributions from colleagues like Aliaksandr Yarmishyn, Devanathan Raghunathan, Ivana Mihalek, Janos Kriston-Vizi, Lee Tze Chuen, Lua Wai Heng, Madhumalar Arumugam Oleg Grinchuk, Rowena Cheong, Sebastian Maurer-Stroh, Zhang Zong Hong, Charlie Tan, Violet Lin & Hanafi Harron etc, whose team efforts made the event exciting, fun and educational.

32 | RECREATIOn CluB

The Administrative Team supports the institute’s leadership to create conditions for scientific work at BII. It also serves as a link to the BMSI Business Centre (BBC), the centralized administrative body of A*STAR’s biomedical science institutes. Within this setting, the Administrative Team facilitates all auxiliary services rendered such as administration, procurement, finance and human resource management, to BII scientists so that the latter can concentrate on their areas of expertise in their research work.

BII lOCATIOn

ADMInISTRATIvETEAM

33

Map courtesy of JTC Corporation

(Left to Right) Noraini SULAIMAN, Christine LOW, Betty KEE, FONG Chew Peng Photo by Vivek Tanavde, BII

RECREATIOn CluB32

Christmas Celebration with APSN Centre for Adults

Not forgetting our part in giving back to our community, we visited the Association for Persons with Special Needs (APSN) Centre for Adults. This is a voluntary welfare organization that caters to people with mild intellectual disability. The centre provides skills training to people who are intellectually challenged (IQ 50-70), so that their students would be able to live an independent and fulfilling life in the society. Members of BII brought cheers to the students in celebrating Christmas with them in games, songs and gifts.

AddressBioinformatics Institute30 Biopolis Street#07-01, MatrixSingapore 138671

By CarFor visitors who drive, please park your vehicle at B3 (basement 3) and follow the signage “To Matrix Lift Lobby” to locate lift D. You may take lift D to level 1 and approach the receptionist for the visitor’s pass.

By Bus The following are the Singapore Bus Service Numbers that stop along North Buona Vista Road: 74, 91, 92, 95, 191, 196, 198, 200

By MRT Board the East-West line heading towards Boon Lay and alight at Buona Vista MRT Station. After alighting, you may take the one-north free shuttle bus service to Biopolis, which operate from 7.30 am to 7.30 pm every Monday to Friday and 7:30 am to 1:30 pm on Saturday.

The Bioinformatics Institute Recreation Club (BII Rec Club) is a voluntary group consisting of staff from various divisions, who come together to organize activities that will foster cohesiveness and fun in working in our institute. We organize team bonding and social events for staff to interact with one another, building greater rapport in our workplace. Besides this, we are aware that our work community consists of people from different nationalities, thus we seek to achieve an appreciation and understanding of various cultures by creating a “Festivals of the World” link on our intranet, to allow staff and students to share with us the festivals celebrated in their country.

Highlights of Events:Chek Jawa, Pulau ubin A trip to the last kampong (village) of Singapore! It was a great getaway from the hustle and bustle of the main island of Singapore to lush nature, fresh air, and tranquility. We took a tour to Chek Jawa, where the several ecosystems, plants and animals, which are fast disappearing for other parts of the world, can be seen. Besides doing a nature walk, we played group games, which in all fostered interaction and team bonding among staff of BII. We also organized a photography contest on the trip, where we savored on the happenings during the trip. Photo taking enthusiasts shared their great works on our online Best Photo and Best Caption voting contest, which every staff of BII was involved.

BII Movie Screening – Kungfu PandaWe brought home movie screening to the workplace, top with great snack and most importantly wonderful working partners. It was simply a time of relaxation as we enjoy a good laugh over an entertaining movie.

All set to go

The show is on... ssshhh

Best Caption - “If you really love your bike, you push it”

Group photo - “Say Cheese”

Creative Artwork by Talented Members of APSN

Handicrafts for Sale

Future Plans

The BII Rec club seeks to make working in BII a fun place to be. We hope that the regular activities that we organize provide an avenue for members of BII to mingle with one another, strengthening working relationship, and fostering deeper friendship.

Committee Members 2009/2010 The BII Rec Club is led by Chairlady, Betty Tan and the committee comprises of Janos Kriston-Vizi, Kavitha Bharatnam, Fala Atkha, Aliaksandr Yarmishyn, Mohamed Hanifa, Zack Toh, Piroon Jenjaroenpun, Lua Wai Heng and Vachiranee Limvipavadh.

Image Source for Back Cover Design: Computer Model of MDM2 Interacting with Inhibitors by Chandra Verma and Team

BII_YearBook10

Documents

Transcript of BII_YearBook10