Post on 17-Jan-2018
description
11
Discovery of Structural and FunDiscovery of Structural and Functional Featuresctional Featuresin RNA Pseudoknotsin RNA PseudoknotsQingfeng Chen and Yi-Ping Phoebe Chen, Senior Member, IEEEQingfeng Chen and Yi-Ping Phoebe Chen, Senior Member, IEEE
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 21, NO. 7, JULY 2009IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 21, NO. 7, JULY 2009
Adviser: Yu-Chiang LiAdviser: Yu-Chiang LiSpeaker: Shao-Hsiang HungSpeaker: Shao-Hsiang Hung
Date:2009/12/10Date:2009/12/10
22
OutlineOutline IntroductionIntroduction Material and MethodsMaterial and Methods ResultsResults Conclusion and DiscussionConclusion and Discussion
33
I. IntroductionI. Introduction
44
I. Introduction(1/6) I. Introduction(1/6) Accurately predicting the functions of Accurately predicting the functions of
biological macromolecules is one of the biological macromolecules is one of the biggest challenges in functional genomics.biggest challenges in functional genomics.
RNA molecules play a central role in a RNA molecules play a central role in a number of biological functions within cells, number of biological functions within cells, from the transfer of genetic information from the transfer of genetic information from DNA to protein, to enzymatic from DNA to protein, to enzymatic catalysis.catalysis.
55
I. Introduction(2/6)I. Introduction(2/6) To fulfill this range of functions, a simple To fulfill this range of functions, a simple linear nucleotide string of RNA including:linear nucleotide string of RNA including:
uracil, guanine, cytosine, adenine, uracil, guanine, cytosine, adenine, forms a variety of complex three-dimensional strforms a variety of complex three-dimensional structures.uctures.
pseudoknot pseudoknot an RNA structure an RNA structure base pairing between a loopbase pairing between a loop
formed by an orthodox secondary structureformed by an orthodox secondary structure
66
I. Introduction(3/6) I. Introduction(3/6)
77
I. Introduction(4/6) I. Introduction(4/6) PseudoBase is the only online database cPseudoBase is the only online database containing:ontaining:
Structural, functional, and sequence data of Structural, functional, and sequence data of RNA pseudoknotsRNA pseudoknots Unfortunately, the analysis of this valuabUnfortunately, the analysis of this valuable data set is underdevelopedle data set is underdeveloped
Difficulty in modeling Difficulty in modeling Complexity in computing structural Complexity in computing structural
88
I. Introduction(5/6)I. Introduction(5/6) Association rule mining has been successfully Association rule mining has been successfully used to discover valuable information in a largused to discover valuable information in a larger data set.er data set.
Limitations with multivalued variables Limitations with multivalued variables Categorical multivalued valuables (such as color {red, bluCategorical multivalued valuables (such as color {red, blue, green}) e, green}) Quantitative multivalued variables (such as weight {[40, 5Quantitative multivalued variables (such as weight {[40, 50], [50, 75]})0], [50, 75]})
The relationships are captured by using The relationships are captured by using Conditional probability matrixConditional probability matrix XYM |
99
I. Introduction(6/6)I. Introduction(6/6) We develop a framework to identify potential top-k coWe develop a framework to identify potential top-k covering rule groups in RNA pseudoknots vering rule groups in RNA pseudoknots
RelationshipsRelationships Structure-function Structure-function Structure-category Structure-category Significant ratios of stems and loops. Significant ratios of stems and loops.
Allows users to regulate k and the Allows users to regulate k and the minsuppminsupp threshold and co threshold and compare between rules in the same group.mpare between rules in the same group. Handling high dimensional dataHandling high dimensional data Enhances the understanding of structure-function relationshiEnhances the understanding of structure-function relationshipsps
1010
II. Material and MethodsII. Material and Methods
1111
II. Material and Methods II. Material and Methods (1/20)(1/20)
Pseudoknot Data. Pseudoknot Data. S1, S2, L1, L2, and L3 S1, S2, L1, L2, and L3
stem 1, stem 2, loop 1, loop 2, and loop 3stem 1, stem 2, loop 1, loop 2, and loop 3 A, G, C, and U A, G, C, and U
adenine, guanine, cytosine, and uracil,adenine, guanine, cytosine, and uracil, vr, vt, vf, v3, v5, vo, rr, mr, tm, ri, ap, ot, and arvr, vt, vf, v3, v5, vo, rr, mr, tm, ri, ap, ot, and ar
viral ribosomal readthrough signals, viral tRNA-like structviral ribosomal readthrough signals, viral tRNA-like structures , viral ribosomal frameshifting signals, other viral 30-ures , viral ribosomal frameshifting signals, other viral 30-UTR, other viral 50-UTR ,viral others, rRNA, mRNA, tmRNA, UTR, other viral 50-UTR ,viral others, rRNA, mRNA, tmRNA, Ribozymes, Aptamers, artifical molecules, othersRibozymes, Aptamers, artifical molecules, others ss, tc and fs ss, tc and fs
self-splicing, translation control, and viral frameshiftingself-splicing, translation control, and viral frameshifting
1212
II. Material and Methods II. Material and Methods (2/20)(2/20)
Let X and Y be multivalued attribute valuablLet X and Y be multivalued attribute valuables es x and y be itemsx and y be items p(X)p(X) p(Y|X) p(Y|X) minsupp be the minimum support in the context
1313
II. Material and Methods II. Material and Methods (3/20)(3/20)
The data here is collected from PseudoBThe data here is collected from PseudoBasease Organism Organism RNA typeRNA type Bracket view of structureBracket view of structure
Classified by two stems and three loopsClassified by two stems and three loops Nucleotide squenceNucleotide squence SizeSize
1414
II. Material and Methods II. Material and Methods (4/20)(4/20)
A data set consisting of 225 H-pseudoknA data set consisting of 225 H-pseudoknots is obtainedots is obtained
1515
II. Material and Methods II. Material and Methods (5/20)(5/20)
1616
II. Material and Methods II. Material and Methods (6/20)(6/20)
Partition of AttributesPartition of Attributes {class, function, stem, loop, base, ratio, leng{class, function, stem, loop, base, ratio, length}th} the last one is a quantitative attribute. the last one is a quantitative attribute.
Propose a novel partition in conjunction with thPropose a novel partition in conjunction with the properties of pseudoknot data and top-k rule ge properties of pseudoknot data and top-k rule groups.roups.
1717
II. Material and Methods II. Material and Methods (7/20)(7/20)
1818
II. Material and Methods II. Material and Methods (8/20)(8/20)
The domain of quantitative attribute The domain of quantitative attribute has to be partitioned into intervalshas to be partitioned into intervals
1)1) The number of intervalsThe number of intervals2)2) The size of each intervalsThe size of each intervals
For exampleFor example (14,15] included in stem 1, stem 2, and (14,15] included in stem 1, stem 2, and
loop 1 but not in loop 3loop 1 but not in loop 3
1919
II. Material and Methods II. Material and Methods (9/20)(9/20)
Definition 1.Definition 1. a quantitative attribute y divided into a set of interva quantitative attribute y divided into a set of intervals {yals {y11, . . . , y, . . . , ynn} using the categorical item x} using the categorical item xii such tha such that for any base interval yt for any base interval yjj, y, yjj consists of a single value consists of a single value for 1 ≦ j ≦ n.for 1 ≦ j ≦ n. The partition using xThe partition using xii is defined as {(y is defined as {(y1i1i, max(y, max(y2i2i)]; . . . )]; . . . ; (max(y; (max(ym1im1i), max(y), max(ymimi)]}. )]}. Table 2 presents the distribution of sizes of stem 1 Table 2 presents the distribution of sizes of stem 1 and stem 2 of pseudoknots in PseudoBase.and stem 2 of pseudoknots in PseudoBase.
2020
II. Material and Methods II. Material and Methods (10/20)(10/20)
Definition 1.Definition 1. For exampleFor example
YY11 = {0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6], (6, 7], (7, 8], = {0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11, 12], (12, 13], (13, 14], (14, 15], (8, 9], (9, 10], (10, 11], (11, 12], (12, 13], (13, 14], (14, 15], (15, 16], (16, 17], (17, 18], (18, 19], (19, 20], (20, 21], (21, (15, 16], (16, 17], (17, 18], (18, 19], (19, 20], (20, 21], (21, 22]}22]}
2121
II. Material and Methods II. Material and Methods (11/20)(11/20)
Denfinion 2.Denfinion 2. Suppose YSuppose Yii = {y = {y1i1i, . . . , y, . . . , ymimi} and Y} and Yi+1i+1 = {y = {y1i+11i+1, . . . , , . . . , yyni+1ni+1} are two adjacent partitions. Let Y =ψ. } are two adjacent partitions. Let Y =ψ. The integration of them is defined asThe integration of them is defined as
2222
II. Material and Methods II. Material and Methods (12/20)(12/20)
Denfinion 2.Denfinion 2. For exampleFor example
stem 1 as Y1 ={0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, stem 1 as Y1 ={0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11, 12], (12, 16], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11, 12], (12, 13], (13, 14], (14, 15], (15, 16], (16, 17], (17, 18], (18, 13], (13, 14], (14, 15], (15, 16], (16, 17], (17, 18], (18, 19], (19, 20], (20, 21], (21, 22]}.9], (19, 20], (20, 21], (21, 22]}. stem 2 as Y2 ={0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, stem 2 as Y2 ={0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11, 12], (12, 16], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11, 12], (12, 13], (13, 14], (14, 15], . . . , (31, 32], (32, 33]}3], (13, 14], (14, 15], . . . , (31, 32], (32, 33]}
2323
II. Material and Methods II. Material and Methods (13/20)(13/20)
the integrated partition of Y1 and Y2 the integrated partition of Y1 and Y2 {0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, {0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5,
6], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], 6], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11, 12], (12, 13], (13, 14], (14, 15], (15, (11, 12], (12, 13], (13, 14], (14, 15], (15, 16], (16, 17], (17, 18], (18, 19], (19, 22], 16], (16, 17], (17, 18], (18, 19], (19, 22], (22, 33]}.(22, 33]}.
2424
II. Material and Methods II. Material and Methods (14/20)(14/20)
In comparison, the values of ratio attribIn comparison, the values of ratio attributes are positive real numbers rather thutes are positive real numbers rather than integers. an integers. |y|yii| = 1 in Definition 3.1 needs to be changed | = 1 in Definition 3.1 needs to be changed to |yto |yii| =1 or |y| =1 or |yii| =0.5. | =0.5. |x| =1 and |x|x| =1 and |xcc| =1 in Definition 3.2 are change| =1 in Definition 3.2 are changed to |x| =1 and |xd to |x| =1 and |xcc| =1 or |x| =0.5 and |x| =1 or |x| =0.5 and |xcc| =0.5. | =0.5. Avoid missing interesting knowledge.Avoid missing interesting knowledge.
2525
II. Material and Methods II. Material and Methods (15/20)(15/20)
Generation of rule groups.Generation of rule groups. Work out the conditional probabilities for X Work out the conditional probabilities for X and Y in the probability matrix below.and Y in the probability matrix below. the conditional probabilitythe conditional probability
Y = yY = yii, given X = x, given X = xii ,as p(y ,as p(yii|x|xii) = p(x) = p(xii|y|yii) * p(y) * p(yii)/p(x)/p(xii))
2626
II. Material and Methods II. Material and Methods (16/20)(16/20)
For example:For example:x,y x,y as stem1,the size interval => (3,4] of stem1as stem1,the size interval => (3,4] of stem1By Table2, By Table2, n = 255, p(255/255)=1n = 255, p(255/255)=1Addition Table2, (3,4] of stem1 with four nuleotides = Addition Table2, (3,4] of stem1 with four nuleotides = 4242AndAndSo So
19.0255/42)1]4,3(( stemxyp
19.0)1()1]4,3((
)1|]4,3((
stemxpstemxyp
stemxyp
2727
II. Material and Methods II. Material and Methods (17/20)(17/20)
Compute the entire conditional probabilCompute the entire conditional probabilities of stem 1, namely [p(yities of stem 1, namely [p(y11| stem1) p(y| stem1) p(y22 | stem1) . . . p(y| stem1) . . . p(yn n | stem1)]| stem1)] Stem 2, loop1, loop3 can computedStem 2, loop1, loop3 can computed
2828
II. Material and Methods II. Material and Methods (18/20)(18/20)
Suppose Suppose MMY|XY|X corresponding to an association corresponding to an association AS consists of a set of rows {rAS consists of a set of rows {r11, . . . , r, . . . , rnn}. }. A ={AA ={A11, . . . , A, . . . , Amm} be the complete set of antecedent it} be the complete set of antecedent items of AS ems of AS C = {CC = {C11, . . . , C, . . . , Ckk} be the complete set of consequent i} be the complete set of consequent items of AStems of AS NamelyNamely
}0)|(,|),{()( xypCyyxxPS jjj
2929
II. Material and Methods II. Material and Methods (19/20)(19/20)
Definition 3 (Rule group)Definition 3 (Rule group) Let Let
be a rule group with an antecedent item x be a rule group with an antecedent item x and consequent support set C.and consequent support set C.
Definition 4Definition 4 Let Let
)}()},(|{ xPSCxCxG jjx
)|()|( ifhigh than ranked is 1
: :
max
XYpXYpRRikk
YXRandYXR
jij
jjii
3030
II. Material and Methods II. Material and Methods (20/20)(20/20)
For exampleFor exampleIn Table 2In Table 2kkmax max = 21= 21top-1 covering rule group = {stem1→(2,3], stetop-1 covering rule group = {stem1→(2,3], stem2→(5,6]}.m2→(5,6]}.top-2 covering rule group = {stem1→(2,3], stetop-2 covering rule group = {stem1→(2,3], stem1→(3,4], stem2→(5,6], stem2→(4,5]}. m1→(3,4], stem2→(5,6], stem2→(4,5]}.
3131
III. ResultsIII. Results
3232
III. Results (1/4)III. Results (1/4)
3333
III. Results (2/4)III. Results (2/4)
3434
III. Results (3/4)III. Results (3/4)
3535
III. Results (4/4)III. Results (4/4)
3636
IV. Conclusion and IV. Conclusion and DiscussionDiscussion
3737
IV. Conclusion and Discussion IV. Conclusion and Discussion (1/2)(1/2)
If more rules are considered together, a fIf more rules are considered together, a further understanding of pseudoknot’s urther understanding of pseudoknot’s structure and function can be achieved. structure and function can be achieved. This paper aims to analyze increasingly This paper aims to analyze increasingly available RNA pseudoknot data and idenavailable RNA pseudoknot data and identifies interesting patterns from PseudoBtifies interesting patterns from PseudoBase. ase.
3838
IV. Conclusion and Discussion IV. Conclusion and Discussion (2/2)(2/2)
The obtained rule groups reveal the structural The obtained rule groups reveal the structural properties of pseudoknots and imply potential properties of pseudoknots and imply potential structurefunction and structure-class relationsstructurefunction and structure-class relationships in RNA molecules.hips in RNA molecules. Moreover, the interpretation of rules demonstrMoreover, the interpretation of rules demonstrates their significance in the sense of biology.ates their significance in the sense of biology.