Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos...
-
Upload
julie-gilbert -
Category
Documents
-
view
222 -
download
1
Transcript of Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos...
![Page 1: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/1.jpg)
Concept Modeling in Bio-informatics
Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents**
*University of Ljubljana, Slovenia** UPC, Barcelona, Catalonia
IPSI Firence-2007
![Page 2: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/2.jpg)
2/24
WHAT IS CONCEPT
?
![Page 3: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/3.jpg)
3/24
Decision Making Algorithm
![Page 4: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/4.jpg)
4/24
Concept Modeling Layer
What is concept? How is it modeled? How is it built? How is it exploited? How is it updated?
![Page 5: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/5.jpg)
5/24
Classification of concept modeling (CM) and decision making systems (DMS)
This classification is made based on the following assumption:
Any decision making system, regardless if the process is performed entirely by humans, supported by machines or totally automated, is a layered process, with one layer
(explicit or implicit) which can be called
Concept Modeling Layer
![Page 6: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/6.jpg)
6/24
Purpose (DMS)
General Specialized (Bio-informatics)
![Page 7: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/7.jpg)
7/24
Bio-informatics
Genomic researchers mostly deal with similarity issues between genomic sequences. Genomic sequences are treated as long sequences of letters:
A (adenine) G (Guanine) C (Cytosine) T (Thymine)
which represents nitrogenous bases in protein structure.
![Page 8: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/8.jpg)
8/24
DNA sequence
(A)
(T)
(G) (C)
DNA chemistry compound DNA sequence
DNA sequence is presented as an array of letters which are mapping the nucleotides in DNA (consisted of one of four types of nitrogenousbases A/G/C/T, a five-carbon sugar, and molecule of phosphoric acid).
![Page 9: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/9.jpg)
9/24
DNA sequence analysis
GATTCATCGA CCATCAAAT GATT
Start sequence
End sequence
Start sequence
Useful dataNoisy data
![Page 10: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/10.jpg)
10/24
Bio-informatics in DMS Sequence concept (still impossible/there is no protein
conceptual model)
Sequence analysis (software BLAST, Smith Waterman, FASTA, etc)
Sequence retrieval (easy/ available for free on the WEB: ENSEMBL.ORG, NCBI, UCSC, etc.)
Sequencing (hard/laboratory work on the level of chemical reactions to conclude weather C/T/G/A is in question in DNA chain)
![Page 11: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/11.jpg)
11/24
Sequence analysis
In the example shown at next two figures, one can see a fraction of the results obtained from a BLAST comparison of protein SLC7A7 (human) against a SwissProt database of proteins.
We selected two illustrative examples that show from a perfect (word) mach to a similar mach.
![Page 12: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/12.jpg)
12/24
BLAST Sample session, perfect match
>gi|12643348|sp|Q9UHI5|LAT2_HUMAN <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=Protein&list_uids=12643348&dopt=GenPept> Gene info <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=search&term=12643348%5BPUID%5D> Large neutral amino acids transporter small subunit 2 (L-type
amino acid transporter 2) (hLAT2) Length=535 Score = 665 bits (1717), Expect = 0.0, Method: Composition-based stats. Identities = 332/332 (100%), Positives = 332/332 (100%), Gaps = 0/332 (0%) Query 1 MGIVQICKGEYFWLEPKNAFENFQEPDIGLVALAFLQGSFAYGGWNFLNYVTEELVDPYK 60 MGIVQICKGEYFWLEPKNAFENFQEPDIGLVALAFLQGSFAYGGWNFLNYVTEELVDPYK Sbjct 204 MGIVQICKGEYFWLEPKNAFENFQEPDIGLVALAFLQGSFAYGGWNFLNYVTEELVDPYK 263 Query 61 NLPRAIFISIPLVTFVYVFANVAYVTAMSPQELLASNAVAVTFGEKLLGVMAWIMPISVA 120 NLPRAIFISIPLVTFVYVFANVAYVTAMSPQELLASNAVAVTFGEKLLGVMAWIMPISVA Sbjct 264 NLPRAIFISIPLVTFVYVFANVAYVTAMSPQELLASNAVAVTFGEKLLGVMAWIMPISVA 323 Query 121 LSTFGGVNGSLFTSSRLFFAGAREGHLPSVLAMIHVKRCTPIPALLFTCISTLLMLVTSD 180 LSTFGGVNGSLFTSSRLFFAGAREGHLPSVLAMIHVKRCTPIPALLFTCISTLLMLVTSD Sbjct 324 LSTFGGVNGSLFTSSRLFFAGAREGHLPSVLAMIHVKRCTPIPALLFTCISTLLMLVTSD 383 Query 181 MYTLINYVGFINYLFYGVTVAGQIVLRWKKPDIPRPIKINLLFPIIYLLFWAFLLVFSLW 240 MYTLINYVGFINYLFYGVTVAGQIVLRWKKPDIPRPIKINLLFPIIYLLFWAFLLVFSLW Sbjct 384 MYTLINYVGFINYLFYGVTVAGQIVLRWKKPDIPRPIKINLLFPIIYLLFWAFLLVFSLW 443 Query 241 SEPVVCGIGLAIMLTGVPVYFLGVYWQHKPKCFSDFIELLTLVSQKMCVVVYPEVERGSG 300 SEPVVCGIGLAIMLTGVPVYFLGVYWQHKPKCFSDFIELLTLVSQKMCVVVYPEVERGSG Sbjct 444 SEPVVCGIGLAIMLTGVPVYFLGVYWQHKPKCFSDFIELLTLVSQKMCVVVYPEVERGSG 503 Query 301 TEEANEDMEEQQQPMYQPTPTKDKDVAGQPQP 332 TEEANEDMEEQQQPMYQPTPTKDKDVAGQPQP Sbjct 504 TEEANEDMEEQQQPMYQPTPTKDKDVAGQPQP 535
![Page 13: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/13.jpg)
13/24
BLAST Sample session, similar match
>gi|12643378|sp|Q9UM01|YLA1_HUMAN <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=Protein&list_uids=12643378&dopt=GenPept> Gene info <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=search&term=12643378%5BPUID%5D> Y+L amino acid transporter 1 (y(+)L-type amino acid transporter
1) (y+LAT-1) (Y+LAT1) (Monocyte amino acid permease 2) (MOP-2) Length=511 Score = 257 bits (656), Expect = 4e-68, Method: Composition-based stats. Identities = 138/315 (43%), Positives = 203/315 (64%), Gaps = 10/315 (3%) Query 2 GIVQICKGEYFWLEPKNAFENFQEPDIGLVALAFLQGSFAYGGWNFLNYVTEELVDPYKN 61 GIV++ +G E N+FE +G +ALA F+Y GW+ LNYVTEE+ +P +N Sbjct 202 GIVRLGQGASTHFE--NSFEG-SSFAVGDIALALYSALFSYSGWDTLNYVTEEIKNPERN 258 Query 62 LPRAIFISIPLVTFVYVFANVAYVTAMSPQELLASNAVAVTFGEKLLGVMAWIMPISVAL 121 LP +I IS+P+VT +Y+ NVAY T + +++LAS+AVAVTF +++ G+ WI+P+SVAL Sbjct 259 LPLSIGISMPIVTIIYILTNVAYYTVLDMRDILASDAVAVTFADQIFGIFNWIIPLSVAL 318 Query 122 STFGGVNGSLFTSSRLFFAGAREGHLPSVLAMIHVKRCTPIPALLFTCISTLLMLVTSDM 181 S FGG+N S+ +SRLFF G+REGHLP + MIHV+R TP+P+LLF I L+ L D+ Sbjct 319 SCFGGLNASIVAASRLFFVGSREGHLPDAICMIHVERFTPVPSLLFNGIMALIYLCVEDI 378 Query 182 YTLINYVGFINYLFYGVTVAGQIVLRWKKPDIPRPIKINLLFPIIYLLFWAFLLVFSLWS 241 + LINY F + F G+++ GQ+ LRWK+PD PRP+K+++ FPI++ L FL+ L+S Sbjct 379 FQLINYYSFSYWFFVGLSIVGQLYLRWKEPDRPRPLKLSVFFPIVFCLCTIFLVAVPLYS 438 Query 242 EPVVCGIGLAIMLTGVPVYFL--GVYWQHKPKCFSDFIELLTLVSQKMCVVVYPEVERGS 299 + + IG+AI L+G+P YFL V +P + T Q +C+ V E++ Sbjct 439 DTINSLIGIAIALSGLPFYFLIIRVPEHKRPLYLRRIVGSATRYLQVLCMSVAAEMDLED 498 Query 300 GTEEANEDMEEQQQP 314 G E M +Q+ P Sbjct 499 GGE-----MPKQRDP 508
![Page 14: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/14.jpg)
14/24
BLAST output:Score = 257 bits (656), Expect = 4e-68, Method: Composition-based stats.Identities = 138/315 (43%), Positives = 203/315 (64%), Gaps = 10/315 (3%)
BLAST expresses the level of similarity between query sequence and database sequence in terms of: Score, Expectations, Method, Identities, Positives, and Gaps. Here is where our DMA layer is finishing, and from this point inferring need to be done by researchers on the bases of software (ex. BLAST) output, and knowledge gathered elsewhere (book, computers, brains…).
Also, a forthcoming challenge in the field of comparative genomic analysis is to compare large amounts of genomic data (letters).For example, if one wants to compare one mammalian genomic sequence against all existing mammalian sequences, one would need a database with memory storage of 60 GB (Saragasso Sea project).
![Page 15: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/15.jpg)
15/24
Application for text analysis: Frequency (number of occurrences) Distance--------------------------
Exclude stop word lists (and, if, or etc) Stemming (traveling => travel; traveled => travel) Synonyms (sick = ill)
Visual Basic
![Page 16: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/16.jpg)
16/24
Home-made Brandy Production Grape-gathering is the first phase in the
production of brandy, through it might be made also from plums, figs, pears or cornel berries. The gathered grapes are crushed and then poured into wooden barrels. They are mixed several times a day, the more often the better. The obtained mass is called wine-marc. The process of alcoholic fermentation usually lasts fifteen or thirty days. When it is finished, or when, as usually people say the marc is still, distillation begins i.e. the making of brandy, which is done in special copper cauldrons. Hand made copper cauldrons can still be found in Tuscany households…
![Page 17: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/17.jpg)
17/24
word word frequency distance
brandy grape 10 0
brandy alcohol 4 1
brandy distillation 3 3
brandy strength 3 5
brandy making 2 5
…
![Page 18: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/18.jpg)
18/24
Concept criteria: Frequency > 5 Distance < 2
Concepts:
brandy grape
brandy alcohol Transcription:
brandy - made of - grapes
brandy - kind of - alcohol
![Page 19: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/19.jpg)
19/24
Concept Modeling layer
Implicit (concepts are not explicitly mentioned):
Protein conceptual model
Explicit (concepts are explicitly mentioned and/or defined):
Frequency > 5
Distance < 2
![Page 20: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/20.jpg)
20/24
Concept definition (CM)
Node in concept network (semantic web)
Node in concept web
![Page 21: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/21.jpg)
21/24
Concept definitons Structure that carries meaning. Needs other concepts and relations among them to be defined. Without
relations concept can not exist.
Relations between concepts can also be observed as concepts. All concepts can be related among each other, forming whether:
1. concept web (where relations are concepts also)
2. concept network (where relations are not concepts)
![Page 22: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/22.jpg)
22/24
Concept Network Concept Web
![Page 23: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/23.jpg)
23/24
Concept Modeling Learning Module
![Page 24: Concept Modeling in Bio-informatics Sanida Omerovic*, Saso Tomazic*, Mateo Valero**, Milos Milovanovic**, David Torrents** *University of Ljubljana, Slovenia.](https://reader035.fdocuments.in/reader035/viewer/2022062515/56649ce65503460f949b48d9/html5/thumbnails/24.jpg)
24/24
Thank you for your attention!Questions?