Data mining ppt
-
Upload
sai-krishna -
Category
Documents
-
view
1.465 -
download
5
Transcript of Data mining ppt
![Page 1: Data mining ppt](https://reader030.fdocuments.in/reader030/viewer/2022020718/55504504b4c905b2788b4c53/html5/thumbnails/1.jpg)
Applications and Trends in Data Mining
Data Mining For
Biological Data Analysis
![Page 2: Data mining ppt](https://reader030.fdocuments.in/reader030/viewer/2022020718/55504504b4c905b2788b4c53/html5/thumbnails/2.jpg)
Factors that led for the development
• The past decade has seen an explosive growth in: 1.Genomics 2.Proteomics 3.Functional genomics 4.Biomedical research
• Identification and comparative analysis of genomes of humans and other species for investigation of genetic networks.
• Development of new Pharmaceuticals and advances in cancer therapies.
![Page 3: Data mining ppt](https://reader030.fdocuments.in/reader030/viewer/2022020718/55504504b4c905b2788b4c53/html5/thumbnails/3.jpg)
• DNA sequences form the foundation of genetic codes of all living organisms.
• DNA sequences are comprised of four basic building blocks called nucleotides:
1.adenine (A) 2.cytosine (C) 3.guanine (G) 4.thymine (T)
• These four nucleotides (or bases) are combined to form long chains that resemble a twisted ladder.
![Page 4: Data mining ppt](https://reader030.fdocuments.in/reader030/viewer/2022020718/55504504b4c905b2788b4c53/html5/thumbnails/4.jpg)
![Page 5: Data mining ppt](https://reader030.fdocuments.in/reader030/viewer/2022020718/55504504b4c905b2788b4c53/html5/thumbnails/5.jpg)
• DNA sequence … CTA CAC ACG TGT AAC …
• A gene usually comprises hundreds of individual nucleotides arranged in particular order.
• A genome is the complete set of genes of an organism.
• Genomics is the analysis of genome sequences.
• A proteome is the complete set of protein molecules present in a cell, tissue, or organism.
• Proteomics is the study of proteome sequences.
![Page 6: Data mining ppt](https://reader030.fdocuments.in/reader030/viewer/2022020718/55504504b4c905b2788b4c53/html5/thumbnails/6.jpg)
Data mining may contribute to the biological data analysis in
the following aspects.
![Page 7: Data mining ppt](https://reader030.fdocuments.in/reader030/viewer/2022020718/55504504b4c905b2788b4c53/html5/thumbnails/7.jpg)
Biological data mining has become an essential part of
new research field called bioinformatics.
![Page 8: Data mining ppt](https://reader030.fdocuments.in/reader030/viewer/2022020718/55504504b4c905b2788b4c53/html5/thumbnails/8.jpg)
1)Semantic integration of heterogeneous, distributed genomic and proteomic data bases.• Genomic and proteomic data sets are often generated at
different labs and by different methods.
• They are distributed, heterogeneous, and of wide variety.
• Integration of such data is essential to cross-site analysis of biological data .
• Such integration and linkage analysis would facilitate the systematic and coordinated analysis of genome and biological data.
![Page 9: Data mining ppt](https://reader030.fdocuments.in/reader030/viewer/2022020718/55504504b4c905b2788b4c53/html5/thumbnails/9.jpg)
• This has promoted the development of integrated data warehouses to store and manage derived biological data.
• Data cleaning, data integration, reference reconciliation, classification, and clustering methods will facilitate the integration of biological data and the construction of data warehouses for biological data analysis.
![Page 10: Data mining ppt](https://reader030.fdocuments.in/reader030/viewer/2022020718/55504504b4c905b2788b4c53/html5/thumbnails/10.jpg)
2)Alignment, indexing, similarity search, and comparative analysis of multiple nucleotide/protein sequences.
• BLAST and FASTA, in particular, are the tools for the systematic analysis of genomic and proteomic data.
• Biological sequence analysis methods differ from many sequential pattern analysis algorithms proposed in data mining.
• For protein sequences, two amino acids should also be considered a “match” if one can be derived from the other by substitutions that are likely to occur in nature.
![Page 11: Data mining ppt](https://reader030.fdocuments.in/reader030/viewer/2022020718/55504504b4c905b2788b4c53/html5/thumbnails/11.jpg)
• There is a combinatorial number of ways to approximately align multiple sequences:
1)reducing a multiple alignment to a series of pair wise alignments and then combining the result.
2)using Hidden Markow Models or HMMs.
• Multiple alignment can be used to identify highly conserved residues among genomes and they can be used to build phylogenetic trees to infer evolutionary relationships among species.
• Genomic and proteomic sequences isolated from diseased and healthy tissues can be compared to identify critical differences between them.
• Sequences occurring in the diseased samples may indicate the genetic factor of the disease.
![Page 12: Data mining ppt](https://reader030.fdocuments.in/reader030/viewer/2022020718/55504504b4c905b2788b4c53/html5/thumbnails/12.jpg)
3)Discovery of structural patterns and analysis of genetic networks and protein pathways.
• Protein sequences are folded into 3D structures, and such structures interact with each other based on the relative position and distances between them.
• Such complex interactions lead to the formation of genetic networks and protein pathways.
• It is important to develop powerful and scalable data mining to discover patterns and to study about regularities and irregularities among complex biological network.
![Page 13: Data mining ppt](https://reader030.fdocuments.in/reader030/viewer/2022020718/55504504b4c905b2788b4c53/html5/thumbnails/13.jpg)
4)Association and path analysis: identifying co-occurring gene sequences and linking genes to different stages of disease development .• Many studies have been focused on comparison of one gene
to another.
• Most diseases are not triggered by a single gene but by a combination of genes acting together.
• Association analysis methods can be used to determine the kinds of genes that are likely to co-occur in target samples.
• A group of genes may contribute to a disease process, here path analysis is expected to play an important role.
![Page 14: Data mining ppt](https://reader030.fdocuments.in/reader030/viewer/2022020718/55504504b4c905b2788b4c53/html5/thumbnails/14.jpg)
5)Visualization tools in genetic data analysis.
• Alignments among genomic or proteomic sequences and interactions between them can be expressed in
1)Graphic forms. 2)Transformed into various kinds of easy-to-understand visual displays.• They facilitate pattern understanding, knowledge discovery,
and interactive data exploration.
![Page 15: Data mining ppt](https://reader030.fdocuments.in/reader030/viewer/2022020718/55504504b4c905b2788b4c53/html5/thumbnails/15.jpg)
Thank you