Feedback About 2017’s Coursework in MATH20902: …mrm/Teaching/DiscreteMaths/Feedba… ·...

5
Feedback About 2017’s Coursework in MATH20902: Discrete Mathematics I made the notes below while marking the coursework. I also made remarks on individual papers and would be happy to discuss these. The marked work is available at the Reception desk near the entrance to the Alan Turing Building. Marks Count 0 20 40 60 80 0 1 2 3 4 5 CW3 (152 students) CW4 (149 students) CW1 (153 students) 0 1 2 3 4 5 0 20 40 60 80 CW2 (133 students) Total (Max. 20) Marks Count 0 5 10 15 20 0 5 10 15 20 25 Figure 1: Histograms for the individual questions and for the total score (right). There were 154 pieces of coursework and the average mark was 13.9 out of 20, or just under 70%. Overall Remarks Generally speaking, people did well: I saw a lot of really good arguments and examples while marking this work. On the other hand, the term “degree” was often used imprecisely. In directed graphs it doesn’t usually make sense to talk about the “degree” of a vertex: one should speak of in- degree and out-degree. This imprecision lead to claims (in the 4th problem) such as “If all vertices have even degree then the graph must contain a cycle”, which seemed to mean “If deg in (v) + deg out (v) is even for all v V then the directed graph contains a cycle.” This is certainly false, as the example in Figure 2 shows. I marked these papers partly with an eye to good mathematical style (see the question-by- question notes below for details). To get full marks you needed to make an argument that was technically correct, clear and concise. The last bit is especially important: long answers are not necessarily better ones. Some of the top-scoring papers used only three or four sides of A4. On the other hand, many weak students handed pages and pages of work, much of which I neither needed nor wanted to see.

Transcript of Feedback About 2017’s Coursework in MATH20902: …mrm/Teaching/DiscreteMaths/Feedba… ·...

Feedback About 2017’s Coursework inMATH20902: Discrete Mathematics

I made the notes below while marking the coursework. I also made remarks on individual papersand would be happy to discuss these. The marked work is available at the Reception desk nearthe entrance to the Alan Turing Building.

Marks

Cou

nt

0

20

40

60

80

0 1 2 3 4 5

CW3 (152 students) CW4 (149 students)

CW1 (153 students)

0 1 2 3 4 5

0

20

40

60

80

CW2 (133 students)

Total (Max. 20)

Marks

Cou

nt

0 5 10 15 20

05

1015

2025

Figure 1: Histograms for the individual questions and for the total score (right). There were 154pieces of coursework and the average mark was ≈ 13.9 out of 20, or just under 70%.

Overall Remarks

• Generally speaking, people did well: I saw a lot of really good arguments and examples whilemarking this work.

• On the other hand, the term “degree” was often used imprecisely. In directed graphs itdoesn’t usually make sense to talk about the “degree” of a vertex: one should speak of in-degree and out-degree. This imprecision lead to claims (in the 4th problem) such as “If allvertices have even degree then the graph must contain a cycle”, which seemed to mean “Ifdegin(v) + degout(v) is even for all v ∈ V then the directed graph contains a cycle.” This iscertainly false, as the example in Figure 2 shows.

• I marked these papers partly with an eye to good mathematical style (see the question-by-question notes below for details). To get full marks you needed to make an argument thatwas technically correct, clear and concise. The last bit is especially important: long answersare not necessarily better ones. Some of the top-scoring papers used only three or four sidesof A4. On the other hand, many weak students handed pages and pages of work, much ofwhich I neither needed nor wanted to see.

Figure 2: A digraph G(V,E) in which degin(v) + degout(v) = 2 for all v ∈ V , but that does notcontain any cycles.

Figure 3: All the digraphs that have a vertex with degout(v) = 2, degin(v) = 1 and that could begenerated by the methods of the coursework. The leftmost and middle ones have only a single Eu-lerian trail and, though the one at right has two trails, they do not correspond to distinct sequencesunless one of the edges is labelled.

Remarks about individual problems

(1) (Constructing graphs: 5 marks).I marked this problem generously. The main things for which students lost marks are:

• missing one of the two sequences in the second example or failing to explain why any validsequence for these fragments must end (and so also begin) in AG;

• incorrect graphs, especially for the fourth example, where many students left off the self-loopconnecting U to itself. Such loops can only arise from an anomalous fragment;

• failure to take the anomalous fragments into account, producing far too many sequences inthe second and fourth examples.

A few students dealt with the cases of multiple tours (part b) or multiple trails (part d)by checking whether their proposed sequences could have generated the fragments given in thequestion: I awarded full marks for this too, provided it gave the correct list of sequences.

(2) (Shortest ambiguous chain: 5 marks).This was the hardest problem and brought out some of the best work. Most people who at-tempted this question began with a discussion of properties that a minimal-length ambiguouschain must have, then attempted exhaustive enumeration of all possibilities until they discovered aminimal-length example. In particular, several students presented lovely arguments showing thatthe shortest possible ambiguous chain must contain at least two G’s and at least two bases drawn(perhaps with repetition) from the set {U, C}. Several others found a minimal-length example byeditting down one of the examples from Table 1 in the coursework.

There was a fair amount of exhaustive (and, for the reader, often exhausting) enumeration: ahandful of people wrote computer programs to test all 4k chains of length k against each other(python seemed especially well-suited to the task, requiring only around a single, readable page ofcode). Another group of students seem to have typed all possible chains of lengths 3 and 4 intoa spreadsheet and then digested them by hand. All of this struck me as much more work than

C CA C U

(a) Subgraphs that can arise only from anoma-lous fragments: no graph derived from a singleRNA chain can contain more than one of these.

C G C G U

U(b) Graphs producing minimal-length ambiguousRNA chains: there are several other examplessimilar to the one at left.

Figure 4: Graphs and subgraphs relevant to the discussion of Problem (2).

necessary, as I had hoped you would find one or both of the examples in Figure 4b by trial-and-error,then use following observations

• if one chain has the same digests as another, then both lead to the same digraph;

• such a digraph must contain two or more Eulerian tours or trails;

• a digraph with two tours or trails must have a vertex v with degout(v) ≥ 2;

• in a digraph that has Eulerian tours or trails,

|degin(v)− degout(v)| ≤ 1;

• a digraph produced by the methods of the coursework that has |E| = m edges generatesRNA chains that are at least m + 1 bases long

to conclude that the only way to find a chain shorter than those from Figure 4b would be toconstruct a digraph that

• has exactly three, unlabelled edges;

• has a vertex v with degout(v) = 2, degin(v) = 1;

• contains two Eulerian tours or trails.

In light of the restrictions on edges illustrated in Figure 4a (see also the discussion below), thereare only three candidates for such a graph—they’re shown in Figure 3, and none of them can belabelled in such a way that it can generate two distinct sequences.

Things that I wanted to see included an example of a pair of distinct chains that had theproposed minimal length and gave the same digest. For the two examples in Figure 4b the relevantpairs are CUGCG & CGCUG for the graph at left and GCGUG & GUGCG for the one at right.

Common mistakes included:

• Failure to realise that an example provides only an upper bound on the length of the shortestambiguous chain.

• Claims that the shortest possible ambiguous chain had to have an Eulerian trail (as opposedto a tour): see the right part of Figure 4b for a counterexample.

• Examples of graphs—presented while enumerating possibilities—that can’t actually arisefrom RNA digests. Figure 4a shows several subgraphs that can only arise from anomalousfragments. This means, for example, that no graph arising from RNA digests can have morethan one self-loop or more than one edge running from U to U, textttU to C C to C or G to G.

• Arguments claiming something like “U and C play the same role, so we need only considerchains that include one of them.” This is clearly false, as it rules out the example at theright of Figure 4b.

Finally, lots of students claimed that a minimal-length ambiguous chain couldn’t include anyA’s, as A’s could always be removed to produce a shorter ambiguous example, but I found thissomewhat unconvincing in that it’s not immediately obvious (at least to me) that an A can’t play auseful role in an edge label. All the same, this claim turns out to be true: after I’d constructed theexamples above I wrote a small program to test all chains of length 5 or less and, although I foundseveral other ambiguous pairs (all of length 5) that correspond to differently-labelled versions ofthe graph at left in Figure 4b, none of them involved A’s.

(a) Removing the cycle formed by the dotted edges breaks thegraph into multiple strongly connected components.

a b

(b) Although vertices a and b are stronglyconnected, there is no cycle that containsboth of them.

Figure 5: Examples mentioned in the feedback for Problem (4)

(3) (Counting possibilities: 5 marks).Most people did well on this question, though some had trouble with part (d), where the onlycorrect answer was the multinomial expression

n!

m1!m2! . . . mk!

which applies in the case where there are n ordinary (that is, non-anomalous) fragments of kdistinct types with mj fragments of type j.

Also, a few people wrote extremely cryptic explanations of what the various factors in theirexpression meant. An example is

The number of distinct RNA sequences is the division of the factorial of the total numberof fragments divide the factorial of the number of the same fragments with same ends.

One can see that the person who wrote this might have had the right idea, but that’s only if onealready knows what the right answer is. That’s not good enough for mathematical writing: agood mathematical description should be clear even to people who don’t already know what you’rewriting about.

(4) (Graph-theoretic foundations: 5 marks).Two of the arguments required here are essentially the same as those used for the related theoremfor undirected graphs, which we proved in lecture. But the argument used in lecture to show (ii)=⇒ (iii) doesn’t apply here, as it relies on a lemma saying that a graph G(V,E) with |V | = n and|E| ≥ n must contain a cycle, but this lemma doesn’t apply to directed graphs. My notes includethe following:

(i) =⇒ (ii) Most people followed the proof of Theorem 11.4 fairly closely—even slavishly in somecases—but I accepted any correct argument that worked in digraphs.

(ii) =⇒ (iii) As I mentioned above, this was the step that differed most substantially from thecorresponding argument in lecture. Successful answers were typically proofs by induction onm = |E| that worked by establishing two main things:

(a) A strongly connected graph in which degin(v) = degout(v) for all vertices v ∈ V containsa cycle. This is a new result, different from the lemma we had in lecture, though manystudents tried to invent a notion of “even degree” for digraphs: such attempts alwaysfailed because of the sort of example illustrated in Figure 2.

(b) Although the removal of such a cycle may break the graph into multiple strongly con-nected components (see Figure 5a), the resulting components all still have the propertythat degin(v) = degout(v) for all vertices.

A few people used the strong-connectedness of the graph to establish the existence of thecycle, but then failed to explain why removing such a cycle produced another strongly con-nected graph (or union of strongly connected components) that still had property (ii). Also,many students tried to argue that it was always possible to take a strongly connected graph

satisfying (ii) and then choose a vertex and find a cycle containing that vertex. This istrue—it is always possible—but some of the arguments offered weren’t correct. These flawedproofs typically tried to establish a more general claim similar to the following

In a strongly connected graph G(V,E) with |V | > 1 we can choose an arbitraryvertex a, and then find a second, distinct vertex b that is strongly connected to a.Strong connectedness means that there exist walks from a to b and from b to a,which implies that there must be a cycle that contains both a and b.

This is false, as the example in Figure 5b shows.

One final common mistake was to claim, in the inductive step of (ii) =⇒ (iii), that if weassume the result is true for |E| ≤ m0, then the next case we need to consider is |E| = m0+2,as there is no way to pass from a graph with |E| = m0 to one with |E| = m0 + 1 withoutspoiling the property that

degin(v) = degout(v) ∀v ∈ V.

First off, this isn’t even true: in the context of the coursework assignment, it’s natural toallow self-loops and thus one can add a single loop and still preserve the property above.Further, this introduces the possibility of cycles of length 1 (I read claims a cycle mustinvolve at least two—or sometimes even three—edges).

But the problem mentioned in the previous paragraph—assuming the inductive step involves|E| = m0 + 2—appears to be an example of a common conceptual mistake about how proofsby induction in graphs work: one doesn’t “build up” from graphs that satisfy the inductivehypothesis, but rather starts with a graph that (a) satisfies the hypotheses of the desiredtheorem and (b) is one step bigger than those covered by the inductive hypothesis. Onethen looks for a way to “cut down” the new graph in some way that makes the inductivehypothesis applicable.

(iii) =⇒ (i) Here I had hoped that students would expand on the sketch in my lecture notesand give an inductive proof based on the number of cycles in the partition. This is a bitsubtle in that, in the inductive step, one can’t simply merge two cycles—reducing the numberof elements in the partition by one—and then invoke the inductive hypothesis because theresult of merging two cycles is no longer a cycle. Instead, one should prove the following:

If G is a strongly connected graph whose edge set can be partitioned into closedtrails, then G is Eulerian.

The point is that if one merges two closed trails, the result is again a closed trail. Further,the proposition above has (iii) =⇒ (i) as an easy corollary because any cycle is also,automatically, a closed trail and so a partition of |E| into cycles is just a special case of apartition into closed trails.

Finally, a few people forgot to mention the (trivial) case where G consists of a single largecycle, but that’s more a matter of completeness than muddle.