RNA informatics Unit 12 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.
Problem Set I review BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.
-
Upload
alexandrina-lamb -
Category
Documents
-
view
222 -
download
0
Transcript of Problem Set I review BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.
Problem Set I Problem Set I reviewreview
BIOL221TBIOL221T: Advanced : Advanced Bioinformatics for Bioinformatics for
BiotechnologyBiotechnology
Irene Gabashvili, PhD
Tissue Specificity & Top Tissue Specificity & Top Tissues Tissues
Life is a complex orchestration of Life is a complex orchestration of genes to be expressed at the right time, genes to be expressed at the right time, place, and level. Basic cellular place, and level. Basic cellular functions require the expression of functions require the expression of certain genes in all cells and tissues certain genes in all cells and tissues (that is, in a (that is, in a ubiquitousubiquitous manner) while manner) while specialized functions require restricted specialized functions require restricted expression of other genes in a single or expression of other genes in a single or small number of cells and tissues (that small number of cells and tissues (that is, is, tissuetissue specificspecific). ).
Tissue Specificity vs Tissues Tissue Specificity vs Tissues with Most Frequent with Most Frequent
Expression Expression Not always the same Not always the same
Tissue specificityTissue specificity: : tissues expressing the gene tissues expressing the gene above the median valueabove the median value. . OMIM – just lists a few where OMIM – just lists a few where gene foundgene found microarray-based expression datamicroarray-based expression data
See e.g. See e.g. http://symatlas.gnf.org/SymAtlas/http://symatlas.gnf.org/SymAtlas/
http://expression.gnf.org/http://expression.gnf.org/ expressed sequence tag (EST)-based expression expressed sequence tag (EST)-based expression
datadata
See Stanford Source, UnigeneSee Stanford Source, Unigene RT-PCR dataRT-PCR data
Literature, commercial software, no good databasesLiterature, commercial software, no good databases
Stanford Source Stanford Source CalculationCalculation EST-exampleEST-example
Clones for a gene were isolated from skeletal Clones for a gene were isolated from skeletal muscle (8 unique clones) and cardiac muscle muscle (8 unique clones) and cardiac muscle (2 unique clones). (2 unique clones). Number of all clones isolated from skeletal Number of all clones isolated from skeletal
muscle: 16000, so frequency is 8/16000= 0.0005muscle: 16000, so frequency is 8/16000= 0.0005 Number of cardiac muscle clones is 10000, so Number of cardiac muscle clones is 10000, so
frequency is 0.0002frequency is 0.0002 0.000500 + 0.0002 = 0.00070.000500 + 0.0002 = 0.0007
Normalized gene expression is calculated by dividing by Normalized gene expression is calculated by dividing by 0.0007000.000700
Skeletal muscle = 0.0005 / 0.0007 = 71%Skeletal muscle = 0.0005 / 0.0007 = 71% Cardiac muscle = 0.0002 / 0.0007 = 29%Cardiac muscle = 0.0002 / 0.0007 = 29%
Tissue-Specificity Tissue-Specificity CalculationCalculation
2 unique clones for gene X were isolated 2 unique clones for gene X were isolated from cardiac muscle. Out of 10000 clones from cardiac muscle. Out of 10000 clones isolated from cardiac muscle, there are isolated from cardiac muscle, there are 9999 genes represented by only one clone 9999 genes represented by only one clone and one gene represented by 2 clones. and one gene represented by 2 clones. This gene is tissue-specific This gene is tissue-specific
dbSNP queriesdbSNP queries
SLC19A1[gene] AND human[orgn] SLC19A1[gene] AND human[orgn] SLC19A1[gene] AND human[orgn] SLC19A1[gene] AND human[orgn]
AND snp[snp_class]AND snp[snp_class] SLC19A1[gene] AND human[orgn] SLC19A1[gene] AND human[orgn]
AND "coding AND "coding nonsynonymous"[FUNC] nonsynonymous"[FUNC]
SLC19A1[gene] AND human[orgn] SLC19A1[gene] AND human[orgn] AND "coding synonymous"[FUNC]AND "coding synonymous"[FUNC]
dbSNP queriesdbSNP queries
ADRB1[gene] AND human[orgn] = ADRB1[gene] AND human[orgn] = 4848
ADRB1[gene] AND human[orgn] ADRB1[gene] AND human[orgn] AND "snp"[SNP_CLASS] =40AND "snp"[SNP_CLASS] =40
ADRB1[gene] AND human[orgn] ADRB1[gene] AND human[orgn] AND "in-del"[snp_class] = 5AND "in-del"[snp_class] = 5
ADRB1[gene] AND human[orgn] ADRB1[gene] AND human[orgn] AND heterozygous[snp_class] = 0 AND heterozygous[snp_class] = 0
ADRB1[gene] AND human[orgn] AND ADRB1[gene] AND human[orgn] AND mixed[snp_class] = 0mixed[snp_class] = 0
ADRB1[gene] AND human[orgn] AND ADRB1[gene] AND human[orgn] AND microsatellite[snp_class] = 3microsatellite[snp_class] = 3
ADRB1[gene] AND human[orgn] AND ADRB1[gene] AND human[orgn] AND "multinucleotide polymorphism"[snp_class] = "multinucleotide polymorphism"[snp_class] = 00
ADRB1[gene] AND human[orgn] AND "named ADRB1[gene] AND human[orgn] AND "named locus"[snp_class] = 0locus"[snp_class] = 0
ADRB1[gene] AND human[orgn] AND "no ADRB1[gene] AND human[orgn] AND "no variation"[snp_class] = 0variation"[snp_class] = 0
ADRB1 SNP summaryADRB1 SNP summary
48 SNPs48 SNPs 40 true SNPs40 true SNPs 5 insertion-deletions (in-dels)5 insertion-deletions (in-dels) 3 microsatellites3 microsatellites no other typesno other types
Type of variationType of variation SNP[snp_class], SNP[snp_class], True single nucleotide polymorphism True single nucleotide polymorphism
in-del, Insertion deletion polymorphism; ('-‘/’+’)in-del, Insertion deletion polymorphism; ('-‘/’+’) Heterozygous, Variation has unknown sequence Heterozygous, Variation has unknown sequence
composition but is observed to be heterozygouscomposition but is observed to be heterozygous Microsatellite/simple sequence repeat Microsatellite/simple sequence repeat Named: Allele sequences defined by name tag Named: Allele sequences defined by name tag
instead of raw sequence, e.g., (Alu)/instead of raw sequence, e.g., (Alu)/ no-variation, no-variation, invariant region in surveyed sequenceinvariant region in surveyed sequence Multiple nucleotide polymorphism (all alleles same length, where Multiple nucleotide polymorphism (all alleles same length, where
length >1)length >1)
DefinitionsDefinitions
Homozygote - has two identical Homozygote - has two identical alleles at a particular locus (for a alleles at a particular locus (for a given gene)given gene)
Heterozygote - has two different Heterozygote - has two different alleles at a particular locusalleles at a particular locus
Hemizygote – only one of a pair of Hemizygote – only one of a pair of genes for a specific trait. Example: genes for a specific trait. Example: male is hemizygote for the X-male is hemizygote for the X-chromosomechromosome
DefinitionsDefinitions
Heterozygous genotype = Occurs Heterozygous genotype = Occurs when the two alleles at a particular when the two alleles at a particular gene locus are different. A gene locus are different. A heterozygous genotype may include heterozygous genotype may include one normal allele and one mutation, one normal allele and one mutation, or two different mutations. The or two different mutations. The latter is called a compound latter is called a compound heterozygote.heterozygote.
More on dbSNPMore on dbSNP
An An ssss number is the unique ID number is the unique ID number assigned to each number assigned to each ssubmitted ubmitted SSNP. Once aligned and processed, NP. Once aligned and processed, submissions are clustered and a submissions are clustered and a “reference SNP cluster”, or a “reference SNP cluster”, or a “refSNP” is created and given a “refSNP” is created and given a unique unique rsrs ID number, ID number,
DrugsDrugs
Some proteins are drug targets. Some proteins are drug targets. Example: glimepiride (antidiabetic: targets Example: glimepiride (antidiabetic: targets
KCNJ11 (blocker) (also, antagonist, agonist)KCNJ11 (blocker) (also, antagonist, agonist) Some drugs regulate activity of drugs Some drugs regulate activity of drugs
indirectly.indirectly. Diazoxide activates KCNJ11Diazoxide activates KCNJ11 Glucocorticoid decreases expression of Kcnj11 Glucocorticoid decreases expression of Kcnj11
mRNAmRNA Regulates binding of KCNJ11Regulates binding of KCNJ11
Some drugs are even more indirectly Some drugs are even more indirectly associated with SNPs in proteins causing associated with SNPs in proteins causing sensitivitiessensitivities
HaplotypesHaplotypes
Haplotypes are groups of linked SNPs which Haplotypes are groups of linked SNPs which are somewhat inherited in a linked fashionare somewhat inherited in a linked fashion
Haplotype blocks refer to sites of closely Haplotype blocks refer to sites of closely located SNPs which are inherited in blockslocated SNPs which are inherited in blocks
A set of closely linked genes that tends to be A set of closely linked genes that tends to be inherited together as a unit. Haplotype may inherited together as a unit. Haplotype may refer to only one locus or to an entire genomerefer to only one locus or to an entire genome
http://www.hapmap.org/ - the HapMap project - the HapMap project http://cgi.uc.edu/cgi-bin/kzhang/haploBlockFinder.cgi http://cgi.uc.edu/cgi-bin/kzhang/haploBlockFinder.cgi
Haplotype block namesHaplotype block names
Sometimes different for different Sometimes different for different populations/families. populations/families.
Still “in progress”Still “in progress” Sometimes linked via dbSNP (haplotype-Sometimes linked via dbSNP (haplotype-
tagged), available in other variation sitestagged), available in other variation sites HaplotypeHaplotype analysis of analysis of ABCB1ABCB1 revealed 2 revealed 2
major haplotypes, major haplotypes, ABCB1ABCB1*1*1 and and ABCB1ABCB1*13*13. . ABCB1ABCB1*13*13 contains T1236, contains T1236, T2677T, T3435, and 3 intronic variants.T2677T, T3435, and 3 intronic variants.