Problem Set I review BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

24
Problem Set I Problem Set I review review BIOL221T BIOL221T : Advanced : Advanced Bioinformatics for Bioinformatics for Biotechnology Biotechnology Irene Gabashvili, PhD

Transcript of Problem Set I review BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Problem Set I Problem Set I reviewreview

BIOL221TBIOL221T: Advanced : Advanced Bioinformatics for Bioinformatics for

BiotechnologyBiotechnology

Irene Gabashvili, PhD

Tissue Specificity & Top Tissue Specificity & Top Tissues Tissues

Life is a complex orchestration of Life is a complex orchestration of genes to be expressed at the right time, genes to be expressed at the right time, place, and level. Basic cellular place, and level. Basic cellular functions require the expression of functions require the expression of certain genes in all cells and tissues certain genes in all cells and tissues (that is, in a (that is, in a ubiquitousubiquitous manner) while manner) while specialized functions require restricted specialized functions require restricted expression of other genes in a single or expression of other genes in a single or small number of cells and tissues (that small number of cells and tissues (that is, is, tissuetissue specificspecific). ).

Tissue Specificity vs Tissues Tissue Specificity vs Tissues with Most Frequent with Most Frequent

Expression Expression Not always the same Not always the same

Tissue specificityTissue specificity: : tissues expressing the gene tissues expressing the gene above the median valueabove the median value. . OMIM – just lists a few where OMIM – just lists a few where gene foundgene found microarray-based expression datamicroarray-based expression data

See e.g. See e.g. http://symatlas.gnf.org/SymAtlas/http://symatlas.gnf.org/SymAtlas/

http://expression.gnf.org/http://expression.gnf.org/ expressed sequence tag (EST)-based expression expressed sequence tag (EST)-based expression

datadata

See Stanford Source, UnigeneSee Stanford Source, Unigene RT-PCR dataRT-PCR data

Literature, commercial software, no good databasesLiterature, commercial software, no good databases

http://symatlas.gnf.org/SymAtlas/

MTHFR: Lymphoma;; Cardiac. Muscle (next probe)

http://expression.gnf.org/MTHFR: Pancreas, liver

Stanford Source: http://smd.stanford.edu/

MTHFR: lymph

MTHFR: heart, lungGeneCards: www.genecards.org

GeneCards: www.genecards.org

Stanford Source Stanford Source CalculationCalculation EST-exampleEST-example

Clones for a gene were isolated from skeletal Clones for a gene were isolated from skeletal muscle (8 unique clones) and cardiac muscle muscle (8 unique clones) and cardiac muscle (2 unique clones). (2 unique clones). Number of all clones isolated from skeletal Number of all clones isolated from skeletal

muscle: 16000, so frequency is 8/16000= 0.0005muscle: 16000, so frequency is 8/16000= 0.0005 Number of cardiac muscle clones is 10000, so Number of cardiac muscle clones is 10000, so

frequency is 0.0002frequency is 0.0002 0.000500 + 0.0002 = 0.00070.000500 + 0.0002 = 0.0007

Normalized gene expression is calculated by dividing by Normalized gene expression is calculated by dividing by 0.0007000.000700

Skeletal muscle = 0.0005 / 0.0007 = 71%Skeletal muscle = 0.0005 / 0.0007 = 71% Cardiac muscle = 0.0002 / 0.0007 = 29%Cardiac muscle = 0.0002 / 0.0007 = 29%

Tissue-Specificity Tissue-Specificity CalculationCalculation

2 unique clones for gene X were isolated 2 unique clones for gene X were isolated from cardiac muscle. Out of 10000 clones from cardiac muscle. Out of 10000 clones isolated from cardiac muscle, there are isolated from cardiac muscle, there are 9999 genes represented by only one clone 9999 genes represented by only one clone and one gene represented by 2 clones. and one gene represented by 2 clones. This gene is tissue-specific This gene is tissue-specific

dbSNP queriesdbSNP queries

SLC19A1[gene] AND human[orgn] SLC19A1[gene] AND human[orgn] SLC19A1[gene] AND human[orgn] SLC19A1[gene] AND human[orgn]

AND snp[snp_class]AND snp[snp_class] SLC19A1[gene] AND human[orgn] SLC19A1[gene] AND human[orgn]

AND "coding AND "coding nonsynonymous"[FUNC] nonsynonymous"[FUNC]

SLC19A1[gene] AND human[orgn] SLC19A1[gene] AND human[orgn] AND "coding synonymous"[FUNC]AND "coding synonymous"[FUNC]

dbSNP queriesdbSNP queries

ADRB1[gene] AND human[orgn] = ADRB1[gene] AND human[orgn] = 4848

ADRB1[gene] AND human[orgn] ADRB1[gene] AND human[orgn] AND "snp"[SNP_CLASS] =40AND "snp"[SNP_CLASS] =40

ADRB1[gene] AND human[orgn] ADRB1[gene] AND human[orgn] AND "in-del"[snp_class] = 5AND "in-del"[snp_class] = 5

ADRB1[gene] AND human[orgn] ADRB1[gene] AND human[orgn] AND heterozygous[snp_class] = 0 AND heterozygous[snp_class] = 0

ADRB1[gene] AND human[orgn] AND ADRB1[gene] AND human[orgn] AND mixed[snp_class] = 0mixed[snp_class] = 0

ADRB1[gene] AND human[orgn] AND ADRB1[gene] AND human[orgn] AND microsatellite[snp_class] = 3microsatellite[snp_class] = 3

ADRB1[gene] AND human[orgn] AND ADRB1[gene] AND human[orgn] AND "multinucleotide polymorphism"[snp_class] = "multinucleotide polymorphism"[snp_class] = 00

ADRB1[gene] AND human[orgn] AND "named ADRB1[gene] AND human[orgn] AND "named locus"[snp_class] = 0locus"[snp_class] = 0

ADRB1[gene] AND human[orgn] AND "no ADRB1[gene] AND human[orgn] AND "no variation"[snp_class] = 0variation"[snp_class] = 0

ADRB1 SNP summaryADRB1 SNP summary

48 SNPs48 SNPs 40 true SNPs40 true SNPs 5 insertion-deletions (in-dels)5 insertion-deletions (in-dels) 3 microsatellites3 microsatellites no other typesno other types

Type of variationType of variation SNP[snp_class], SNP[snp_class], True single nucleotide polymorphism True single nucleotide polymorphism

in-del, Insertion deletion polymorphism; ('-‘/’+’)in-del, Insertion deletion polymorphism; ('-‘/’+’) Heterozygous, Variation has unknown sequence Heterozygous, Variation has unknown sequence

composition but is observed to be heterozygouscomposition but is observed to be heterozygous Microsatellite/simple sequence repeat Microsatellite/simple sequence repeat Named: Allele sequences defined by name tag Named: Allele sequences defined by name tag

instead of raw sequence, e.g., (Alu)/instead of raw sequence, e.g., (Alu)/ no-variation, no-variation, invariant region in surveyed sequenceinvariant region in surveyed sequence Multiple nucleotide polymorphism (all alleles same length, where Multiple nucleotide polymorphism (all alleles same length, where

length >1)length >1)

DefinitionsDefinitions

Homozygote - has two identical Homozygote - has two identical alleles at a particular locus (for a alleles at a particular locus (for a given gene)given gene)

Heterozygote - has two different Heterozygote - has two different alleles at a particular locusalleles at a particular locus

Hemizygote – only one of a pair of Hemizygote – only one of a pair of genes for a specific trait. Example: genes for a specific trait. Example: male is hemizygote for the X-male is hemizygote for the X-chromosomechromosome

DefinitionsDefinitions

Heterozygous genotype = Occurs Heterozygous genotype = Occurs when the two alleles at a particular when the two alleles at a particular gene locus are different. A gene locus are different. A heterozygous genotype may include heterozygous genotype may include one normal allele and one mutation, one normal allele and one mutation, or two different mutations. The or two different mutations. The latter is called a compound latter is called a compound heterozygote.heterozygote.

Hete

rozy

gou

s S

NP

vs

AV

G.

Hete

rozy

gozy

ty

More on dbSNPMore on dbSNP

An An ssss number is the unique ID number is the unique ID number assigned to each number assigned to each ssubmitted ubmitted SSNP. Once aligned and processed, NP. Once aligned and processed, submissions are clustered and a submissions are clustered and a “reference SNP cluster”, or a “reference SNP cluster”, or a “refSNP” is created and given a “refSNP” is created and given a unique unique rsrs ID number, ID number,

DrugsDrugs

Some proteins are drug targets. Some proteins are drug targets. Example: glimepiride (antidiabetic: targets Example: glimepiride (antidiabetic: targets

KCNJ11 (blocker) (also, antagonist, agonist)KCNJ11 (blocker) (also, antagonist, agonist) Some drugs regulate activity of drugs Some drugs regulate activity of drugs

indirectly.indirectly. Diazoxide activates KCNJ11Diazoxide activates KCNJ11 Glucocorticoid decreases expression of Kcnj11 Glucocorticoid decreases expression of Kcnj11

mRNAmRNA Regulates binding of KCNJ11Regulates binding of KCNJ11

Some drugs are even more indirectly Some drugs are even more indirectly associated with SNPs in proteins causing associated with SNPs in proteins causing sensitivitiessensitivities

HaplotypesHaplotypes

Haplotypes are groups of linked SNPs which Haplotypes are groups of linked SNPs which are somewhat inherited in a linked fashionare somewhat inherited in a linked fashion

Haplotype blocks refer to sites of closely Haplotype blocks refer to sites of closely located SNPs which are inherited in blockslocated SNPs which are inherited in blocks

A set of closely linked genes that tends to be A set of closely linked genes that tends to be inherited together as a unit. Haplotype may inherited together as a unit. Haplotype may refer to only one locus or to an entire genomerefer to only one locus or to an entire genome

http://www.hapmap.org/ - the HapMap project - the HapMap project http://cgi.uc.edu/cgi-bin/kzhang/haploBlockFinder.cgi http://cgi.uc.edu/cgi-bin/kzhang/haploBlockFinder.cgi

Haplotype block namesHaplotype block names

Sometimes different for different Sometimes different for different populations/families. populations/families.

Still “in progress”Still “in progress” Sometimes linked via dbSNP (haplotype-Sometimes linked via dbSNP (haplotype-

tagged), available in other variation sitestagged), available in other variation sites HaplotypeHaplotype analysis of analysis of ABCB1ABCB1 revealed 2 revealed 2

major haplotypes, major haplotypes, ABCB1ABCB1*1*1 and and ABCB1ABCB1*13*13. . ABCB1ABCB1*13*13 contains T1236, contains T1236, T2677T, T3435, and 3 intronic variants.T2677T, T3435, and 3 intronic variants.