UniView

Post on 10-May-2015

854 views 0 download

Tags:

Transcript of UniView

WebWeb--based application based application

to survey properties of to survey properties of homologous proteinshomologous proteins.

Candidato:

Diego Poggioli

Relatore:

Prof. Rita Casadio

Correlatore:

Dr. Brigitte Boeckmann

• Bio-problem: Visualization and interaction with

biological data and performing a comparative protein analysis

• Info-solution: Web application – CGI

The portal gives access to four web pages: 1) Function-related annotation derived from UniProtKB/Swiss-Prot; 2) Feature of the protein group; 3) Conservation score; 4) Tree.

Members of a protein family normally perform a general biochemical function in common, but one or more subgroups may evolve a slightly different function, such as different

substrate specificity.

By comparing groups and subgroups of proteins it is possible to identify or estimate:

• similarity and differences between the proteins sequences

as well as the information available for the given protein

group;

• the ranges, within which functional information on proteins can be transferred from experimentally characterized proteins

to their homologs from poorly studied organism;

• errors in the annotations of proteins;

Visualization and interact with biological dataVisualization and interact with biological data

HTML JavaScript, PHP, Perl, Python, Ajax, ASP, Ruby…

C GIphp

System and browser

independent

Dinamic page

Available from

any PC

P02701

P56732

P56734

O13153

P56733

P56735

P56736

AVID_CHICK

AVR2_CHICK

AVR4_CHICK

AVR1_CHICK

AVR3_CHICK

AVR6_CHICK

AVR7_CHICK

ID AVID_CHICK Reviewed; 152 AA.

AC P02701; Q91958; Q98SH4;

DT 21-JUL-1986, integrated into

DT 11-SEP-2007, sequence version 3.

DT 10-JUN-2008, entry version 87.

DE Avidin precursor.

GN Name=AVD;

OS Gallus gallus (Chicken).

OC Eukaryota; Metazoa; Chordata

OC Archosauria; Dinosauria

OC Neognathae; Galliformes

OX NCBI_TaxID=9031; RN [1] RP NUCLEOTIDE SEQUENCE [MRNA].

RX MEDLINE=87203384; PubMed

RA Gope M.L., Keinaenen R.A.,

RA Zarucki-Schulz T., O'Malley B.W.,

RT "Molecular cloning of the chicken

RL Nucleic Acids Res. 15:3595

RN [2] RP NUCLEOTIDE SEQUENCE [MRNA].

RX MEDLINE=90355928; PubMed

RA Chandra G., Gray J.G.;

RT "Cloning and expression of

RL Methods Enzymol. 184:70

Form filling and data type

BioViewBioView• overview on biological informations

• taxonomic descriptive statistics

a compact summary view on the biological information of

a protein group is important especially when having a large dataset. This way it will be possible to observe,

compare and count all common and dissimilar characteristics; it is also possible to analyze in every single detail of component with the same featuring.

- gene name, functional (catalytic activity, enzyme regulation, pathway…) and general

descriptive information;

- organism classification (OC) and organism species (OS);

- non-experimental qualifiers (by similarities, putative or probable).

ID, AC, DE, CC:'FUNCTION', 'PATHWAY', 'CATALYTIC

ACTIVITY', 'ENZYME REGULATION', 'SUBUNIT',

'SIMILARITY', 'COFACTOR', 'DEVELOPMENTAL STAGE',

'INDUCTION', 'PTM', 'SUBCELLULAR LOCALIZATION',

'TISSUE SPECIFICITY'

OS, OC

Eukaryota -

Viridiplantae Eukaryota

Streptophyta Viridiplantae

Embryophyta Streptophyta

Tracheophyta Embryophyta

... ...

Pipeline BioView page

Nuber of entries

Non-redundant annotation

Number of entries with non-experimental qualifier

Number of entries with annotated experimental qualifier

Expande all the hierarchy

On mouse-click the relevant entry names are listed

FeatureViewFeatureView

• Interactive interface for visualizing function-related features on the protein sequence and 3D structure

• This page should allow the user to analyze combined sequences-structure on a broad set of data showing the greatest number of information available in a clear and intuitive way.

Function-related features derived from the FT lines of UniProtKB:

active sites, binding sites, domain, transmembraneregion, DNA binding domain…

are mapped on the alignment and highlighted to allow a clear and compact presentation of the relevant information. The characteristics are mapped on the structure in the same way, allowing to identify regions and conserved sites.

Sequence � FT � Structure

FeatureView

•• Choose the best structureChoose the best structure

• Alignment

• Mapping the feature on the alignment and on the structure

F.P.A. David and Y.L. Yip. SSMap*: a new UniProt-PDB mapping resource for the curation of structural-related

information in the UniProt/Swiss-Prot Knowledgebase. Submitted

...

'91 ' => ‘91',

'25 ' => ‘25',

'92 ' => ‘92',

'81 ' => ‘82',

'71 ' => ‘71',

'21 ' => ‘23',

'-' => 'x',

'61 ' => ‘61',

'37 ' => ‘37',

'68 ' => ‘68',

'50 ' => ‘50',

'18 ' => ‘15',

...

Choose the best structureChoose the best structure

*

Jmol: an open-source Java viewer for chemical structures in 3D. http://www.jmol.org/

FeatureView

• Choose the best structure

•• AlignmentAlignment

• Mapping the feature on the alignment and on the structure

Edgar, Robert C. (2004), MUSCLE: multiple sequence alignment with high accuracy and

high throughput, Nucleic Acids Research 32(5), 1792-97.

Input file

AlignmentAlignment

FeatureView

• Choose the best structure

• Alignment

•• Mapping the feature on the alignment Mapping the feature on the alignment

and on the structureand on the structure

I group: ('CA_BIND', 'NP_BIND', 'MOTIF', 'ACT_SITE', 'METAL',

'BINDING', 'SITE', 'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD',

'DISULFID', 'CROSSLINK');

II group: ('PEPTIDE', 'TOPO_DOM', 'TRANSMEM', 'DOMAIN',

'REPEAT', 'ZN_FING', 'DNA_BIND', 'REGION', 'COILED');

Input file

AlignmentAlignment

FT (Feature Table) lines

different background colour and a toolbox with the content as described above.

I group: ('CA_BIND', 'NP_BIND', 'MOTIF',

'ACT_SITE', 'METAL', 'BINDING', 'SITE',

'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD',

'DISULFID', 'CROSSLINK');

II group: ('PEPTIDE', 'TOPO_DOM',

'TRANSMEM', 'DOMAIN', 'REPEAT', 'ZN_FING',

'DNA_BIND', 'REGION', 'COILED');

distinct font color and with a toolbox containing the description of the feature (entry name, feature key, sequence position, description)

-overlapping into the first group � represented in toolbox.-ovelapping into the second group � different background color.

FT (Feature Table) lines

ATOM 1817 N MET B 3 -31.380 87.126 39.296 1.0 100.00

ATOM 1818 CA MET B 3 -30.684 88.400 39.176 1.0 100.00

ATOM 1819 C MET B 3 -30.858 88.967 37.771 1.0 100.00

ATOM 1820 O MET B 3 -30.195 88.514 36.832 1.0 100.00

ATOM 1821 CB MET B 3 -29.190 88.285 39.498 1.0 100.00

ATOM 1822 CG MET B 3 -28.465 89.628 39.501 1.0 100.00

ATOM 1823 SD MET B 3 -26.671 89.415 39.661 1.0 100.00

ATOM 1824 CE MET B 3 -26.312 90.705 40.863 1.0 100.00

ATOM 1825 N GLU B 4 -31.750 89.938 37.638 1.0 50.00

ATOM 1826 CA GLU B 4 -31.927 90.498 36.300 1.0 50.00

… … … … … … … … … … …

50.00

100.00

00.00Alignment position

On mouse-click run blastp on UniProt web page

On mouse-click start Jalview applet

Conservation

• Interactive interface for visualizing the structural conservation of protein groups on the protein sequence and 3D structure

• Highlight positions and regions conserved in the group of proteins

• Conservation scores are mapped on the multiple sequence alignment (MSA) and into the 3D-structure

Input file

Scoring residue conservationScoring residue conservation

0.000 # ---S--------

0.000 # ---T--------

0.000 # ---S--------

0.000 # ---T--------

0.000 # ---S--------

0.024 # ---TM-M-----

0.320 # MMMSV-VVMM--

0.278 # VVVDHMHHGGG-

0.500 # LLLYLLWWLLL-

0.603 # SSSSTTTSSSS-

0.391 # PAAAPAAEDDD-

0.424 # AAAAEEEVGGQT

0.809 # DDDDEEEEEEEE

Scoring methodsScoring methods

Method name Type of score Description

basicmdm Sum-of-Pairs (SP), matrix score Simplest SP score possible

entropynorm7 EntropicNormalized Shanon entropy with 7

symbol types

entropynorm21 EntropicNormalized Shannon entropy with

21 symbol types.

tridentEntropic, matrix score, sequence

weightedMixed model score.

valdar01SP, matrix score, sequence

weighted

Score used in Valdar & Thornton

2001

• develop a method to compare two or more protein subgroups

• profile

At the moment it is a framework integrated for the development of the visualization of info such as annotation and for the

visualization of sites that differ in conservation between protein

subgroups.

Input file

TreeTree

The phylogenetic tree of the protein group will be shown in this page .

Software for phylogenetic tree visualization and manipulations

http://bioinfo.unice.fr/biodiv/Tree_editors.html

- Treedyn: works in local machine but not in server side (graphical applet needed)

- Phylodendron: trouble with cgi script

-phyfi: private program it is not possible to install on own server, eventually URL

request

-nexplorer: NEXUS format needed and it is not possible to install on own server

- dnd2svg.pl: strict sequence number – output only in SVG format

-TreeFam: only private program

� ATV 1.92

http://www.phylosoft.org/atv/

Zmasek C.M. and Eddy S.R. (2001) ATV: display

and manipulation of annotated phylogenetic trees.

Bioinformatics, 17, 383-384.

Gascuel O.1997. BIONJ: an improved version of the NJ algorithm based on a

simple model of sequence data. Molecular Biology and Evolution, 14:685-695.

Input file

Tree in Newick format

((((ACADM_HUMAN:0.000925,ACADM_PANTR:0.003941):0.014922,ACADM_MACFA:0.021579):0.041621,((ACADM

_MOUSE:0.015113,ACADM_RAT:0.029420):0.051559,(ACADM_DROME:0.187088,((ACAD8_MOUSE:0.049728,ACAD

8_HUMAN:0.052753):0.013706,ACAD8_BOVIN:0.104627):1.146493):0.149078):0.010918):0.015504,ACADM_

PIG:0.057735,ACADM_BOVIN:0.023577);

http://www.jalview.org/

Clamp, M., Cuff, J., Searle, S. M. and

Barton, G. J. (2004). The Jalview Java

Alignment Editor. Bioinformatics, 20, 426-7

Future plansFuture plans

• Normalize HTML pages according to the W3C standard

• Improve the use of CSS

• Test the application on different web browser

• Write the application in a server side language

• Integrate the application with other databases

• Ensuring multiple access to the application and analysis history

• Develop a view of phylogenetic tree to show and to interact with additional information

• Hierarchical phylogeny-based classification in UniProtKB

Following the hierarchical

phylogeny-based classification in

UniProtKB

AcknowledgementsAcknowledgements

• Brigitte Boeckmann & Rita Casadio

• Swiss-Prot lab, Biocomputing group

• Fabrice David & Marco Vassura

• Tutti i miei amici e Fra

• Dolores e Davide

And now?And now?

- identify similarity and differences between the proteins

sequences as well as the information available for the given protein group;

- estimating the ranges, within which functional informationon proteins can be transferred from experimentally

characterized proteins to their homologs from poorly studied organism;

- identify errors in the annotations of proteins;

practical examples practical examples

Compact summary view on the biological information of a protein group is important

especially when having a large dataset. This way it will be possible to observe,

compare and count all common and dissimilar characteristics; it is also possible to

analyze in every single detail of component with the same featuring.

Acetylglutamate kinase family

Acyl-CoA dehydrogenase family

gatB/gatE family

IPP transferase family