CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths,...

34
CS 336 March 19, 2012 Tandy Warnow

Transcript of CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths,...

Page 1: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

CS 336 March 19, 2012

Tandy Warnow

Page 2: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Basic Graph Terminology

• Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated vertices, trees, forests

• Directed graphs: indegree, outdegree, trees

Page 3: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Advanced terminology

• Cliques• Independent sets• Chromatic number and vertex colorings• Eulerian cycles and Eulerian paths• Hamiltonian paths• Matchings• Dominating Set• Vertex Cover

Page 4: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Paths, Connected Components, etc.

• A path is a sequence of vertices v1, v2, …, vn

so that vi is adjacent to vi+1 for i=1,2,…,n-1. A simple path is one that does not have repeated vertices.

• A graph is connected if every pair of vertices in the graph is connected by some path.

• A connected component is a maximal subset of the vertices that is connected.

Page 5: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Cycles

• A cycle in a graph is a path that starts and ends at the same vertex.

• A simple cycle is a cycle that does not have any repeated vertices (other than the start and end vertex).

• A graph is acylic if it has no simple cycles.

Page 6: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Trees

• Two types: rooted and unrooted

• Unrooted (simplest): acylic connected graph

• Rooted: take an unrooted tree, pick one node to be the root, and direct all edges away from the root. Voila!

Page 7: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Theorems about trees

Let T be a connected acyclic graph (i.e., a tree) with n vertices (n>0). Then:

• T has at least one leaf (node with degree 0 or 1).

• T has n-1 edges.

• Every edge in T is a cut-edge.

• Every tree can be 2-colored.

Page 8: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Theorem: Every tree has at least one leaf (node of degree 1)

Theorem: For any tree T with at least one vertex, T has at least one leaf (node with degree 0 or 1).

Proof: • If n=1, then T is a single vertex which is a leaf. • Else, n>1. Let P be a longest simple path in T, so P=v1,v2,

…,vk.• If vk has degree 1, we are done. Otherwise, vk has at least

two neighbors, and so some neighbor w other than vk-1. If w is in P, then we have a simple cycle in T, contradicting that T is a tree. If w is not in P, then we can extend P and get a longer path, contradicting that P is a longest simple path in T.

• Hence, vk has degree 1, and we are done.

Page 9: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Theorem: Any tree with n>0 nodes has n-1 edges

• Proof: by induction on n.• Base case: n=1 (trivial)• Inductive hypothesis: for some positive

n, any tree on n nodes has exactly n-1 edges.

• Let T be a tree on n+1 nodes. We want to show T has exactly n edges.

Page 10: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Proof (cont’d)

• Let v be a node in T with degree 1.

• Remove v from T. The result is a tree T’ with n nodes, and hence n-1 edges (by the inductive hypothesis)

• T’ contains one fewer edge and one fewer vertex (node) than T, and so T has n edges.

Page 11: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Theorem: every edge in a tree is a cut-edge

Proof (by contradiction). • Suppose T is a tree, e=(v,w) is an edge in T that is

not a cut-edge.• Then G=T-{e} (but keeping v and w) is connected.

Hence there is a simple path P from v to w in G. Since e is not in G, P does not include edge e.

• Therefore, we can form a simple cycle C by adding edge e to P.

• Since every edge in C is in T, this means that T is not acyclic, contradicting the assumption that T is a tree (connected acyclic graph).

Page 12: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Vertex Coloring

• A (proper) vertex coloring of a graph is a function c: V -> {1,2,…,k}, s.t. no two adjacent vertices are mapped to the same color.

• The chromatic number of a graph is the minimum number of colors needed to properly color the graph.

• How many colors does a tree need?

Page 13: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

2-coloring a tree

• Theorem: every connected acyclic graph (i.e., tree) can be 2-colored.

• Proof: by induction on the number of vertices.

Page 14: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Proof that every tree can be 2-colored

• Let G be a tree on n vertices. The base case is n=1. Clearly every tree on 1 vertex can be 2-colored.

• The Inductive Hypothesis is that for some positive integer n, any tree on n vertices can be 2-colored.

• Let G be a tree with n+1 vertices. We want to show that G can be 2-colored.

Page 15: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Proof (cont’d)

• Let v be a node in G that has degree 1, and let w be its unique neighbor in G.

• Consider the graph G’ formed by deleting v (and its incident edge but not w) from G.

• G’ is also acyclic (why?) and has n-1 vertices.• Therefore, by the inductive hypothesis, G’ can be 2-

colored. • We extend the coloring from G’ to G, by letting c(v)

be 1 if c(w)=2, and c(v)=2 if c(w)=1.• Note that this coloring is proper for G.• Hence G can be 2-colored.

Page 16: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Structural Induction

• This was a proof by structural induction.

• Proofs by structural induction can be applied more generally!

Page 17: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Theorem about rooted trees

• A rooted tree in which every node has 0 or 2 children is called a “binary tree”

• Theorem: every binary tree with n nodes has (n-1)/2 internal nodes (defined to be nodes with more than 0 children).

• Proof: by strong induction on n.• Base case: n=1. Such a tree has no internal

nodes, so it is true.

Page 18: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Proof, cont’d.

• Strong Inductive hypothesis: for some n>0, and for all positive integers k up to n, all rooted binary trees with k nodes have (k-1)/2 internal nodes.

• Let T have n+1 nodes, and let the children of the root be A and B. (We know the root has two children, since if it had no children, T would have 1 node, contradicting our hypothesis.)

We want to show Int(T) = n/2

Page 19: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

We want to show Int(T) = n/2

• TA, the subtree of T rooted at A, is a binary tree; let nA be the number of nodes in TA

• TB, the subtree of T rooted at B, is a binary tree; let nB be the number of nodes in TB

• Let Int(T) be the number of internal nodes of T, and Int(TA) and Int(TB) be similarly defined.

Page 20: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

We want to show Int(T) = n/2

• Then nA and nB are both at most n, and by the inductive hypothesis

Int(TA) = (nA-1)/2

Int(TB ) = (nB-1)/2

• Therefore

Int(T) = (nA-1)/2 + (nB-1)/2 + 1

Page 21: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

We want to show Int(T) = n/2We have established that

Int(T) = (nA-1)/2 + (nB-1)/2 + 1

Simplifying this, we get

Int(T) = (nA-1 + nB -1 + 2)/2 = (nA + nB)/2

Note nT = nA + nB + 1

Therefore,

Int(T) = (nT - 1)/2

Recall that nT = n+1. Therefore,

Int(T) = n/2

Q.E.D.

Page 22: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Genome Assembly

• Given a DNA sequence, technology can allow you to get a collection of k-mers (substrings of length k) that come from analyses of the sequence.

• From these k-mers, your objective is to come up with the sequence.

Page 23: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Genome Assembly

• Let X be a very long DNA sequence

• Consider all k-mers in X, with k big enough so that no k-mer appears two or more times

• Goal: reconstruct X from its set of k-mers

Page 24: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Genome Assembly, attempt #1

Approach 1:

• Make a node for each k-mer, and put a directed edge from v to w if the k-1 suffix of v is the k-1 prefix of w.

• Create the graph for the following string, using k=5– ACATAGGATTCAC

Page 25: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Genome Assembly, attempt #1

Approach 1:

• Make a node for each k-mer, and put a directed edge from v to w if the k-1 suffix of v is the k-1 prefix of w.

• Every such graph has a Hamiltonian Path, as long as no k-mer appears more than once!

Page 26: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Hamiltonian Path

• A Hamiltonian Path in a graph visits every node exactly once

Page 27: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Genome AssemblyAttempt #1

• Create the graph for the following string, using k=5– ACATAGGATTCAC

• Does the graph have a Hamiltonian Path?• Is it unique?• Can you reconstruct the sequence from the

path?

Page 28: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Hamiltonian Path

• A Hamiltonian Path in a graph visits every node exactly once

• Determining if a graph has a Hamiltonian Path is NP-Complete

• So this approach to Genome Assembly is computationally intensive (infeasible)

Page 29: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Eulerian Cycles

• An Eulerian cycle is one that goes through every edge exactly once

• It is easy to see that if a graph has an Eulerian cycle, then every node has even degree. The converse is also true, but a bit harder to prove.

• For directed graphs, the cycle will need to follow the direction of the edges (also called “arcs”). In this case, a graph has an Eulerian cycle if and only if the indegree is equal to the outdegree for every node.

Page 30: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Eulerian Paths

• An Eulerian path is one that goes through every edge exactly once

• It is easy to see that if a graph has an Eulerian path, then all but 2 nodes have even degree. The converse is also true, but a bit harder to prove.

• For directed graphs, the cycle will need to follow the direction of the edges (also called “arcs”). In this case, a graph has an Eulerian path if and only if the indegree(v)=outdegree(v) for all but 2 nodes (x and y), where indegree(x)=outdegree(x)+1, and indegree(y)=outdegree(y)-1.

Page 31: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

de Bruijn Graph

Input: the set of k-mers for the DNA sequence

Output: the de Bruijn Graph• Vertices: the (k-1)-mers• Directed edges: from v->w if the (k-2)-suffix of

v is the (k-2)-prefix of w, and the k-mer formed by starting with v and ending with w is one of the k-mers in the input

Page 32: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

de Bruijn Graph

• If the k-mer set comes from a sequence and no k-mer appears more than once in the sequence, then the de Bruijn graph has an Eulerian path!

Page 33: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

Using de Bruijn Graphs

Given: set of k-mers from a DNA sequence

Algorithm: • Construct the de Bruijn graph• Find an Eulerian path in the graph• The path defines a sequence with the

same set of k-mers as the original

Page 34: CS 336 March 19, 2012 Tandy Warnow. Basic Graph Terminology Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated.

de Bruijn Graph

• Create the de Bruijn graph for the following string, using k=5– ACATAGGATTCAC

• Find the Eulerian path• Is the Eulerian path unique?• Reconstruct the sequence from this path