Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among...
Transcript of Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among...
![Page 1: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/1.jpg)
Introduction to Bioinformatics
Richard Corbett
Canada’s Michael Smith Genome Sciences Centre Vancouver, British Columbia
June 28, 2017
![Page 2: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/2.jpg)
As a Process Development Coordinator I help ensure our laboratory and analytical approaches are providing the best possible results.
Our mandate is to advance knowledge about cancer and other diseases and to use our technologies to improve health through disease prevention, diagnosis, and therapeutic approaches.
2
![Page 3: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/3.jpg)
3
![Page 4: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/4.jpg)
Bioinformatics in Practice
GATCCTGGAGGATCCTGGGAG GATCCTGGAGGATCCTGGGAG GATCCTGGAGAATCCTGGGAG GATCCTGGAGAATCCTGGGAG
Sequence
Sequence data
Computational models
Bioinformatics
Environmental discoveries
Improved disease understanding and treatment
Scientific publications
Other experiments
4
![Page 5: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/5.jpg)
Bioinformatics in Practice
GATCCTGGAGGATCCTGGGAG GATCCTGGAGGATCCTGGGAG GATCCTGGAGAATCCTGGGAG GATCCTGGAGAATCCTGGGAG
Sequence
Sequence data
Computational models
Bioinformatics
Environmental discoveries
Improved disease understanding and treatment
Scientific publications
Other experiments
5
![Page 6: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/6.jpg)
Illumina Sequencing
1.) Cells
2.) DNA
3.) Sheared DNA, with sequencing adapters
AAAAAAAAAAAAAAAAACCCAAA CCCTTTTTGGGGAAGGGGGGGTT TCCCCCCCCCCCCCCCAAAAAAAT AAAGGGAAAGGGGTTTCCCAAA
4.) Sequencing
5.) Ready for bioinformatics
6
![Page 7: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/7.jpg)
Sequencers at the Genome Sciences Centre
Bases Per Second # Machines Total Bases / Sec.
HiSeq X 8,700,000 5 43.5 million
HiSeq 2500 3,100,000 4 12.4 million
NextSeq 1,300,000 2 2.6 million
MiSeq 50,000 3 150 thousand
~55 million bases per second
7
![Page 8: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/8.jpg)
How much sequence is that?
● Human Genome : 3,000,000,000 bases (approx.)
● At the Genome Sciences Centre, we can sequence the number of bases in 1 human genome every:
● 3 billion bases / 55 million bases per sec = 54.5 sec
● The first human genome draft sequence took roughly 10 years to sequence and assemble
8
![Page 9: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/9.jpg)
How do we extract meaning from the sequence data?
2,000,000,000 reads per sample
150 bases per read
3,000,000,000 base reference genome
9
![Page 10: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/10.jpg)
Data Interpretation
• For efficiency and to help interpretation, we often describe a sample by how it differs from a reference sample
• To compare samples, we:
– align sequence reads to a reference genome
– find locations where our sample differs from the reference
10
![Page 11: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/11.jpg)
AGCTAGCCCTAGGATAG
Alignment
GCTAGCTAGCCCTAGGA
...ACTCGCTAGCTAGCCCTAGGATAGCTTAGAGACCCTCGCGAAATAGACCCTCGAT...
CTAGCTAGCCCTAGGAT GCTAGCTAGCCCTAGGA GCTAGCTAGCCCTAGGA GCTAGCTAGCCCTAGGA
TAGCTAGCCCTAGGATA AGCTAGCCCTAGGATAG
AGCCCTAGGATAGCTTA
CTTAGAGACCCTCGCGA CTTAGAGACCCTCGCGA
TAGAGACCCTCGCGAAA AGACCCTCGCGAAATAG
GACCCTCGCGAAATAGA ACCCTCGCGAAATAGAC ACCCTCGCGAAATAGAC
Reference Genome
11
![Page 12: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/12.jpg)
This SNP, in combination with two others (not shown) give this individual a 90% chance of having blue/green eyes*
*Duffy, David L., et al. "A three–single-nucleotide polymorphism haplotype in intron 1 of OCA2 explains most human eye-color variation." The American Journal of Human Genetics 80.2 (2007): 241-252.
Real Alignments
12
![Page 13: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/13.jpg)
Assembly
...ACTCGCTAGCTAGCCCTAGGATAGCTTAAAAAAAGAGACCCTCGCGAAATAGACCCTCGAT...
Alignment
AGCTAGCCCTAGGATAG
...GCTAGCTAGCCCTAGGATAGCTTAGAGACCCTCGCGAAATAGAC...
CTAGCTAGCCCTAGGAT GCTAGCTAGCCCTAGGA
TAGCTAGCCCTAGGATA AGCTAGCCCTAGGATAG
AGCCCTAGGATAGCTTA
TAGAGACCCTCGCGAAA AGACCCTCGCGAAATAG
GACCCTCGCGAAATAGA ACCCTCGCGAAATAGAC ACCCTCGCGAAATAGAC
13
Contig
ACTCGCTAGCTAGCCCTAGGATAGCTTA GAGACCCTCGCGAAATAGAC Reference Genome
Contig
-------
![Page 14: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/14.jpg)
Summary So Far
• We sequence billions of reads per genome sample
• Useful / actionable results are identified via:
– Read alignment
– Read assembly
• We describe samples by how they differ from a reference sample
14
![Page 15: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/15.jpg)
Other Data Types
• Epigenomics – investigates the chromatin and methylation status of regions of the genome
• Proteomics – measures protein expression and modifications
• Exome/Capture – queries specific targeted regions of the genome
• RNA sequencing – measures RNA expression and variation 15
![Page 16: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/16.jpg)
Genome and Transcriptome
16
![Page 17: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/17.jpg)
Genome and Transcriptome
17
Genes encoded in the genome (DNA)
Expressed genes (RNA)
![Page 18: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/18.jpg)
Genome and Transcriptome
Genome sequencing allow us to find: • SNVs (single nucleotide variants)
•CNVs (copy number variants)
•SVs (structural variants)
Gene1 Gene2
The transcriptome can be sequenced to find: •Gene expression estimates •Gene fusions •SNVs in expressed genes
CCCTTTTGGGGAA
Gene1a Gene2b
18
![Page 19: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/19.jpg)
Personalized Medicine Normal
80X genome coverage and
transcriptome library
Tumour (melanoma)
Bioinformatics
19
40X genome coverage
![Page 20: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/20.jpg)
z
Personalized Medicine Intermediate Results
Somatic SNV calling RNA expression correlation
Gene fusion analysis
Somatic copy number
20
![Page 21: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/21.jpg)
Report
21
A detailed report of results for
a patient’s sample is provided
to the clinician allowing them
to view the genetic landscape
of a patient’s disease.
This approach has enabled
treating clinicians to make
informed clinical decisions
based on the genomic
information integrated with
other clinical features.
![Page 22: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/22.jpg)
Summary
• Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations yields a distilled list of actionable findings
• This is accomplished in equal parts by:
– Powerful computing infrastructure
– Advanced algorithms
– Trained individuals from a diverse set of fields 22
![Page 23: Introduction to Bioinformatics - WestGrid to Bioinformatics.pdf · •Bioinformatics is (among other things) the process through which the interpretation of billions of sequence observations](https://reader030.fdocuments.in/reader030/viewer/2022040215/5edadf1709ac2c67fa68718e/html5/thumbnails/23.jpg)
23
Upcoming webinars & training events can be found on the
WestGrid website:
https://www.westgrid.ca/events
If attendees would like to learn more about Compute Canada /
WestGrid high performance computing resources and services,
please visit:
https://docs.computecanada.ca