Comp. Genomics Recitation 12 Bayesian networks Taken from Artificial Intelligence course, MIT, 6.034...

Post on 01-Jan-2016

218 views 0 download

Tags:

Transcript of Comp. Genomics Recitation 12 Bayesian networks Taken from Artificial Intelligence course, MIT, 6.034...

Comp. Genomics

Recitation 12Bayesian networks

Taken from Artificial Intelligence course, MIT, 6.034http://courses.csail.mit.edu/6.034s/handouts/6034-review-sol.pdf

Question 1.1

• Draw a Bayesian network among the following binary variables that model the outcome of an election:• I: candidate is Incumbent• M: has lots of Money for advertising• A: uses advertisements that focus on Attacking the

candidate’s opponent• Q: uses advertisements that focus on the candidate’s

Qualifications• L: candidate is Liked• D: opponent is Distrusted• E: candidate is Elected

Question 1.1 – cont’d

• Your network should encode the following beliefs:• Incumbents tend to raise lots of money.• Money can be used to buy advertising that either focuses

on the candidate’s qualifications or that attacks the candidate’s opponent. But if one does the first, there is less money to do the latter.

• Attack advertisements tend to make voters distrust the opponent but they also make the voters tend not to like the candidate.

• Advertisement focusing on qualifications tends to make the voters like the candidate.

• Candidates that people like tend to get elected.• Candidates whose opponent people distrust tent to get

elected.

Question 1.1 - solution

Question 1.2

• For each of the following, say whether it is or is not asserted by the network structure you drew (without assuming anything about the numerical entries in the CPTs).

1.P(L | A,Q,D) = P(L | A,Q)

2.P(A | M,Q) = P(A | M)

3.P(L,D | A,Q) = P(L | A,Q) P(D | A,Q)

Question 1.2 - solution

1. P(L | A,Q,D) = P(L | A,Q)

Asserted

2. P(A | M,Q) = P(A | M)

Not asserted

3. P(L,D | A,Q) = P(L | A,Q) P(D | A,Q)

Asserted

Question 2

• Show a Bayesian network structure that encodes the following relationships:• A is independent of B• A is dependent on B given C• A is dependent on D• A is independent of D given C

Question 2 - solution

• Nodes A and B have no parents

• Node C has two parents: A and B

• Node D has one parent: C

Question 3

• Which of the following conditional independence assumptions are true?

1. A and E are independent2. A and E are independent given D3. B and C are independent4. B and C are independent given A5. B and C are independent given D6. A and E are independent given B7. A and E are independent given F8. B and C are independent given E

Question 3 - solution

• A and E are independentFalse

• A and E are independent given DTrue

• B and C are independentFalse

• B and C are independent given ATrue

• B and C are independent given DFalse

• A and E are independent given BFalse

• A and E are independent given FFalse

• B and C are independent given EFalse

Question 4

• For each statement, name all of the graph structures, G1-G4, or “none” that imply it.

Question 4 – cont’d

1. A is conditionally independent of B given C

2. A is conditionally independent of B given D

3. B is conditionally independent of D given A

4. B is conditionally independent of D given C

5. B is independent of C

6. B is conditionally independent of C given A

Question 4 - solution

• A is conditionally independent of B given C

• G2

• A is conditionally independent of B given D

• none

• B is conditionally independent of D given A

• G3,G4

• B is conditionally independent of D given C

• none

• B is independent of C

• G2,G3

• B is conditionally independent of C given A

• G1,G2,G4

HW solution – ass. 2, q. 5

• Let G = (G1, … , Gn) be n contiguous DNA regions representing genes. For each Gi we define the mRNA concentration of the gene as Pi, s.t. their sum is equal to 1. P = (P1, … , Pn) can be interpreted as the normalized expression levels for the regions in G.

HW solution – q. 5 – cont’d

• Our model assumes that reads are generated by randomly picking a region R from G according to the distribution P, and then copying this region. The copying process is error-prone. This process is repeated until we have a set of m reads R = r1, … , rm generated according to the model described above.

HW solution – q. 5 – cont’d

• For each region Gj and read ri, we have a probability pij = P(rj | Gi), the probability of observing rj given that the locus of the read was gene Gi. In practice, for each read rj, this probability will be close to zero for all but a few regions.

Likelihood function

• Write the likelihood of observing the m reads.

Q function

• Write the Q(P | P(t)) term.

M-step

• Write the M-step term using argmax function.

Update rule

• Infer from c the update step for P.