Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC...

36
Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State University
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    1

Transcript of Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC...

Page 1: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Longest Common Subsequence Algorithm on

ASC Processors using Coterie Network

Sabegh Singh Virdi

ASC Processor GroupComputer Science Department

Kent State University

Page 2: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Presentation Outline Introduction to String matching and its

variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie

Network Exact match Approximate match

Summary and Future work

Page 3: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

String Matching One of the most fundamental operation in

computing. Comparing two linear arrays of character Application in bioinformatics, searching genetic

databases String involved are how ever enormous, efficient

string processing is therefore a requirement

Page 4: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

String Matching Variations Is Exact match the only solution? What if the pattern does not occur in the text? It still makes sense to find the longest

subsequence that occurs both in the pattern and in the text. This is the longest common subsequence problem

Longest Common Subsequence, Longest Common Substring, Sequence alignment, Edit distance Problem are all variation of SM problem

Page 5: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Presentation Outline String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie

Network Exact match Approximate match

Summary and Future work

Page 6: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Role of LCS in Molecular biology

DNA sequences (genes) can be represented as sequences of four letters A, C, G, and T corresponding to the four submolecules forming DNA

When biologists find a new sequences, they typically want to know what other sequences it is most similar to

One way of computing how similar (homologous) two sequences are, is to find the length of their longest common subsequence

Page 7: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Role of LCS in Molecular biology

This is a simplification, since in the biological situation one would typically take into account not only the length of the LCS, but also e.g. how gaps occur when the LCS is embedded in the two original sequences.

An obvious measure for the closeness of two strings is to find the maximum number of identical symbols (preserving symbol order)

This by definition, is the longest common subsequence of the strings

Page 8: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Overview of LCS Algorithm

Given two strings, find the LCS common to both strings. Example:

String 1: AGACTGAGGTA String 2: ACTGAG

AGACTGAGGTA - -ACTGAG - - - list of possible alignments - -ACTGA - G- - A- -CTGA - G- - A- -CTGAG - - -

The time complexity of this algorithm is clearly O(nm);

Page 9: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Overview of LCS Algorithm Actually this time does not depend on the sequences u

and v themselves but only on their lengths The bottleneck in efficient parallelization of LCS

problem are the calculating the value of diagonal elements, as shown

As seen, the value of {i,j} depend upon the previous element {i-1,j-1}, when a match is found

Page 10: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Possibility of more then one LCS Associate some parameters The Smith-Waterman Algorithm uses the same

concept that of LCS algorithm, but gives us the optimal result

Overview of LCS Algorithm

Page 11: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Overview of LCS Algorithm

1 1 1 1 1

11

2111

1 222222

111111

3

1

1

1

44443222

3333

43332

5

55

43332 6

5

4

3

2 2

666

5 5

4

3

0 0 0 0 0 0 0 0 0 0 0 0

A G A C T G A G G T A

0

0

0

0

0

0

A

C

T

G

A

G

Page 12: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Communication between PE’s In 2D mesh network,

Communication between P.E’s themselves take place in two different ways

By using the nearest neighbors mesh interconnection network

Powerful variation on the nearest-neighbor mesh called the “Coterie network”, developed in response to the requirement for nonlocal communication

Properties significantly different from the usual mesh

Page 13: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Presentation Outline Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie

Network Exact match Approximate match

Summary and Future work

Page 14: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Coteries[ Weems & Herbordt ]“A small often selected group of persons who

associate with one another frequently” Features:

Related to other Reconfigurable broadcast network Describable using hypergraphs Dynamic in nature

Advantages: Propagation of information quickly over long

distances at electrical speed Support of one-to-many communication within

coterie, reconfigurability of the coterie

Page 15: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

PE’s form Coteries

5 x 5 coterie network with switches shown in “arbitrary” settings. Shaded areas denotes coterie (the set of PEs Sharing same circuit)

Page 16: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Coterie’s Physical Structure

In the Physical implementation, each PE controls set of switches Four of these switches control

access in the different directions (N,S,E,W)

Two switches H and V are used to emulated horizontal and vertical buses

The last two switches NE and NW are used to creation of eight way connected region

NWNE

WSES

V

H E

S

W

: Switch

N

Page 17: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Presentation Outline Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie

Network Exact match Approximate match

Summary and Future work

Page 18: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

LCS Algorithm on Coterie Network

A G A C T G A G G T A

Page 19: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

LCS Algorithm on Coterie Network

A G A C T G A G G T A

Page 20: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

LCS Algorithm on Coterie Network

A G A C T G A G G T A

A G A C T G A G G T A

A G A C T G A G G T A

A G A C T G A G G T A

A G A C T G A G G T A

A G A C T G A G G T A

Content of each PE’s after MULTICAST operation

Page 21: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

LCS Algorithm on Coterie Network

A

C

T

G

A

G

Page 22: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

LCS Algorithm on Coterie Network

A

C

T

G

A

G

Page 23: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

LCS Algorithm on Coterie Network

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

Content of each PE’s after MULTICAST operation

Page 24: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

LCS Algorithm on Coterie Network

1 0 1 0 0

00

0000

0 000001

100010

1

0

0

1

11010001

0000

00010

0

01

10001 1

0

0

1

0 0

001

0 1

0

0

A G A C T G A G G T A

A

C

T

G

A

G

Page 25: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

LCS Algorithm on Coterie Network

1 0 1 0 0

00

0000

0 000001

100010

1

0

0

1

11010001

0000

00010

0

01

10001 1

0

0

1

0 0

001

0 1

0

0

A G A C T G A G G T A

A

C

T

G

A

G

Inject unique token

Page 26: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

LCS Algorithm on Coterie Network

We try to refine the algorithm to support approximate matching

We make use of tokens The next example demonstrate this problem

For the string: Text :AGACTGAGGTA Pattern : ACTAAG

Page 27: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Presentation Outline Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie

Network Exact match Approximate match

Summary and Future work

Page 28: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

LCS Algorithm on Coterie Network

1 0 1 0 0

00

0000

0 000001

100010

1

1

0

1

00100010

0000

00010

0

01

10001 1

0

0

1

0 0

001

0 1

1

0

A G A C T G A G G T A

A

C

T

A

A

G

Inject unique token

Page 29: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Token method

In this method, we explicitly close the W-S switch based on some condition

We inject unique token symbols as shown in the next slide

Where this two symbol intersect within a PE’s, we close the W-S switch as shown,

Thus we get a path from first row to the last row as shown

Page 30: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

LCS Algorithm on Coterie Network

1 0 1 0 0

00

0000

0 000001

100010

1

1

0

1

00100010

0000

00010

0

01

10001 1

0

0

1

0 0

001

0 1

1

0

A G A C T G A G G T A

A

C

T

A

A

G

Inject unique token

Page 31: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Presentation Outline Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie

Network Exact match Approximate match

Summary and Future work

Page 32: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Summary and Future work

We have presented two variation of the lcs algorithm

We have Explored a new network for this problem

Constant time algorithm for Exact matchApproximate algorithm depends upon the

diameter of the network

Page 33: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Summary and Future work

Future Work: Optimize the algorithm for Approximate match Implementing the algorithm on FPGA’s model Incorporating the Don’t Care Symbol Extend the idea to support sequence alignment Conserve memory by using encoding scheme We can use Virtual simulation of PEs, in case we

ran out of PEs

Page 34: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Acknowledgements

Professor Walker Professor Baker Professor Weems Professor Herbordt Professor Piontkivska Committee members for their time Kevin Schaffer, Hong Wang, Shannon Steinfadt, Jalpesh

Chitalia, and Michael Scherger

Page 35: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

THANK YOU

Page 36: Longest Common Subsequence Algorithm on ASC Processors using Coterie Network Sabegh Singh Virdi ASC Processor Group Computer Science Department Kent State.

Questions….