Dynamic programming
It is used, when the solution can be
recursively described in terms of solutions
Dynamic programming
to subproblems (optimal substructure)
Longest Common Subsequence
Algorithm finds solutions to subproblems
and stores them in memory for later use
More efficient than “brute-force methods”,
which solve the same subproblems over
and over again
1 2
Longest Common Subsequence LCS Algorithm
(LCS) if |X| = m, |Y| = n, then there are 2m
subsequences of x; we must compare each
Application: comparison of two DNA strings with Y (n comparisons)
Ex: X= {A B C B D A B }, Y= {B D C A B A} So the running time of the brute-force
Longest Common Subsequence: algorithm is O(n 2m)
X= AB C BDAB Notice that the LCS problem has optimal
substructure: solutions of subproblems are
Y= BDCAB A parts of the final solution.
Subproblems: “find LCS of pairs of prefixes
of X and Y”
3 4
1
Longest Common Subsequence Example
Definition 1: Subsequence
Given a sequence X = <A,B,D,F,M,Q>
X = < x1, x2, . . . , xm> Z = <B, F, M>
then another sequence
Z = < z1, z2, . . . , zk>
is a subsequence of X if there exists a strictly Z is a subsequence of X with index
increasing sequence <i1, i2, . . . , ik> of indices of x sequence <2,4,5>
such that for all j = 1,2,...k we have xij= zj
5 6
More Definitions
Example
Definition 2: Common subsequence
– Given 2 sequences X and Y, we say Z is a X = <A,B,C,B,D,A,B>
common subsequence of X and Y if Z is a
Y = <B,D,C,A,B,A>
subsequence of X and a subsequence of Y
Then, what is/are LCS?
Definition 3: Longest common
1. BCBA length 4
subsequence problem
– Given X = < x1, x2, . . . , xm> and Y = < y1,
y2, . . . , yn> find a maximum length
2. BDAB length 4
common subsequence of X and Y
3. BCAB length 4
7 8
2
Brute Force Algorithm
Brute force algorithm would compare each subsequence of
X with the symbols in Y Yet More Definitions
1 for every subsequence of X
2 Is there subsequence in Y?
Definition 4: Prefix of a subsequence
If X = < x1, x2, . . . , xm> , the ith prefix of X for
3 If yes, is it longer than the longest i = 0,1,...,m is Xi = < x1, x2, . . . , xi>
subsequence found so far?
What about Complexity? Example
if|X| = m, |Y| = n, then there are 2m subsequences – if X = <A,B,C,D,E,F,H,I,J,L> then
of x; we must compare each with Y (n comparisons)
X4 = <A,B,C,D> and X0 = <>
So the running time of the brute-force algorithm is
O(n 2m)
9 10
Optimal Substructure Sub-problem structure
Theorem Optimal Substructure of LCS Case 1
Let X = < x1, x2, . . . , xm> and Y = < y1, y2, . . . , yn> – if xm = yn then there is one sub-problem to solve
be sequences and let Z = < z1, z2, . . . ,zk> be any find a LCS of Xm-1 and Yn-1 and append xm
LCS of X and Y
1. if xm = yn then zk = xm = yn and zk-1 is an LCS of
Case 2
xm-1 and yn-1 – if xm yn then there are two sub-problems
2. if xm yn and zk x m Z is an LCS of Xm-1 and Y • find an LCS of Xmand Yn-1
3. if xm yn and zk yn Z is an LCS of Xm and Yn-1 • find an LCS of Xm-1 and Yn
• pick the longer of the two
11 12
3
Cost of Optimal Solution The Recurrence
Cost is length of the common 0 if i 0 or j 0
subsequence
c[i, j] c[i 1, j 1] 1 if i, j 0 and x i y j
We want to pick the longest one
max(c[i, j 1],c[i 1, j]) if i, j 0 and x i y j
Let c[i,j] be the length of an LCS of the
sequences Xi and Yj
Base case is an empty subsequence--
then c[i,j] = 0 because there is no LCS
13 14
Tables used for LCS LCS recursive solution
c[i 1, j 1] 1 if x[i ] y[ j ],
c[i, j ]
max(c[i, j 1], c[i 1, j ]) otherwise
Table c[0..m, 0..n] stores the length of
an LCS of Xi and Yj c[i,j] We start with i = j = 0 (empty substrings of x
Table b[1..m, 1..n] stores pointers to and y)
optimal sub-problem solutions Since X0 and Y0 are empty strings, their LCS
is always empty (i.e. c[0,0] = 0)
LCS of empty string and any other string is
empty, so for every i and j: c[0, j] = c[i,0] = 0
15 16
4
LCS recursive solution LCS recursive solution
c[i 1, j 1] 1 if x[i ] y[ j ],
c[i 1, j 1] 1 if x[i ] y[ j ], c[i, j ]
c[i, j ] max(c[i, j 1], c[i 1, j ]) otherwise
max(c[i, j 1], c[i 1, j ]) otherwise
When we calculate c[i,j], we consider two Second case: x[i] != y[j]
cases: As symbols don’t match, our solution is not
First case: x[i]=y[j]: one more symbol in improved, and the length of LCS(Xi , Yj) is
strings X and Y matches, so the length of LCS the same as before (i.e. maximum of LCS(Xi,
Xi and Yj equals to the length of LCS of Yj-1) and LCS(Xi-1,Yj)
smaller strings Xi-1 and Yi-1 , plus 1 17 18
LCS Length Algorithm LCS Example
LCS-LENGTH(X,Y)
1 m length[X]
We’ll see how LCS algorithm works on the
2 n length[Y] following example:
3 for i to m
4 do c[i,0] X = ABCB
5 for j to n
6 do c[0,j] Y = BDCAB
7 for i to m
8 do for j to n
9 do if xi = yj
then c[i,j] c[i-1,j-1] + 1
What is the Longest Common Subsequence
b[i,j] ”” of X and Y?
12 else if c[i-1,j] c[i,j-1]
13 then c[i,j] c[i-1,j]
14 b[i,j] ”” LCS(X, Y) = BCB
15 else c[i,j] c[i,j-1]
16 b[i,j] ”” X=AB C B
17 return c and b 19
Y= BD CAB 20
5
ABCB ABCB
LCS Example (0) BDCAB
LCS Example (1) BDCAB
j 0 1 2 3 4 5 j 0 1 2 3 4 5
i Yj B D C A B i Yj B D C A B
0 Xi 0 Xi 0 0 0 0 0 0
1 A 1 A 0
2 B 2 B 0
3 C 3 C 0
4 B 4 B 0
X = ABCB; m = |X| = 4
Y = BDCAB; n = |Y| = 5 for i = 1 to m c[i,0] = 0
Allocate array c[5,4] for j = 1 to n c[0,j] = 0
21 22
ABCB
LCS Example (15) BDCAB LCS Algorithm Running Time
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0 LCS algorithm calculates the values of each
1 A 0 0 0 0 1 1
entry of the array c[m,n]
B So what is the running time?
2 0 1 1 1 1 2
3 C 0 1 1 2 2 2 O(m*n)
4 B 0 1 1 2 2 3 since each c[i,j] is calculated in
if ( Xi == Yj ) constant time, and there are m*n
c[i,j] = c[i-1,j-1] + 1 elements in the array
else c[i,j] = max( c[i-1,j], c[i,j-1] )
35 36
6
How to find actual LCS How to find actual LCS - continued
So far, we have just found the length of LCS, Remember that
but not LCS itself. c[i 1, j 1] 1 if x[i ] y[ j ],
c[i, j ]
We want to modify this algorithm to make it
max(c[i, j 1], c[i 1, j ]) otherwise
output Longest Common Subsequence of X
and Y So we can start from c[m,n] and go backwards
Each c[i,j] depends on c[i-1,j] and c[i,j-1] Whenever c[i,j] = c[i-1, j-1]+1, remember
or c[i-1, j-1] x[i] (because x[i] is a part of LCS)
For each c[i,j] we can say how it was acquired: When i=0 or j=0 (i.e. we reached the
2 2 For example, here beginning), output remembered letters in
c[i,j] = c[i-1,j-1] +1 = 2+1=3 reverse order
2 3 37 38
Algorithm to print LCS
Finding LCS
j 0 1 2 3 4 5
i Yj B D C A B
PRINT-LCS (b,X,i,j) 0 Xi 0 0 0 0 0 0
1if i = 0 or j = 0 A
2 then return 1 0 0 0 0 1 1
3 if b[i,j] = ”” 2 B 0 1 1 1 1 2
4 then PRINT-LCS(b,X,i-1,j-1)
5 print xi 3 C 0 1 1 2 2 2
6 else if b[i,j] ”” 4 B 0 1 1 2 2 3
7 then PRINT-LCS(b,X,i-1,j)
8 else PRINT-LCS(b,X,i,j-1)
39 40
7
Finding LCS (2)
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0
1 A 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
LCS (reversed order): B C B
LCS (straight order): B C B
(this string turned out to be a palindrome) 41