Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton

12
Bioinformatics Programming (Perl Programming) 2010 Davide Pisani

description

Bioinformatics programming for Perl

Transcript of Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton

Page 1: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton

Bioinformatics

Programming

(Perl Programming)

2010

Davide Pisani

Page 2: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton

Bioinformatics

• Using computers to store, organise and

interpret biological data

• In particular, data from high-throughput

technologies (-omics)

Page 3: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton

High-throughput technologies

• DNA & Protein sequences and structure

(genomics & Proteomics)

• Yeast two-hybrid screens (interactomics)

• Microarrays (transcriptomics)

• Metabolic networks (metabolomics)

Page 4: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton

How much sequence data is

there? 1371published complete genomes

188 ongoing archaeal genomes

4941 Bacterial ongoing genomes

1599 Ongoing eukaryotic genomes

242 metagenomes

Page 5: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton

How much data in each

genome? ftp://ftp.ncbi.nih.gov/refseq/release/

Page 6: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton

The human genomeftp://ftp.ncbi.nih.gov/refseq/release/

Page 7: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton

The human genomeftp://ftp.ncbi.nih.gov/refseq/release/

Page 8: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton

The human genomeftp://ftp.ncbi.nih.gov/refseq/release/

etc..

(70 base pairs per line, 57 lines per page = 3990 bases/page

Chromosome 1 is (about) 247,249,719 bases long

i.e. 62,000 pages

Whole genome (3.2 x 109) = 802,000 pages

Page 9: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton

Genome Base pairs No. of Genes

Phi-X 174 5,386 10

Nanoarchaeum equitans 490,885 552

E. coli 4,639,221 4,377

Saccharomyces

cerevisiae

12,495,682 5,800

Drosophila

melanogaster

122,653,977 13,379

Homo sapiens 3.2 x 109 30,000

Protopterus aethiopicus 1.3 x 109 ?

Psilotum nudum 2.5 x 1011 ?20-25,000

Amoeba dubia 6.7 x 1011 ?

Page 10: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton

Genbank contains much more

than just sequence data

Information on the Organism, the

gene, where it is expressed and so

forth.

Page 11: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton

Protein Structure

Page 12: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton

PDB: Protein Structure