Vladimir Dancik:
Expected Length of Longest Common Subsequences
PhD Thesis, University of Warwick , 1994

Summary.

A longest common subsequence of two sequences is a sequence that is a subsequence of both the given sequences and has largest possible length. It is known that the expected length of a longest common subsequence is proportional to the length of the given sequences. The proportion, denoted by $\gamma_k$, is dependent on the alphabet size $k$ and the exact value of this proportion is not known even for a binary alphabet.

To obtain lower bounds for the constants $\gamma_k$, finite state machines computing a common subsequence of the inputs are built. Analysing the behaviour of the machines for random inputs we get lower bounds for the constants $\gamma_k$. The analysis of the machines is based on the theory of Markov chains. An algorithm for automated production of lower bounds is described.

To obtain upper bounds for the constants $\gamma_k$, collations -- pairs of sequences with a marked common subsequence -- are defined. Upper bounds for the number of collations of `small size' can be easily transformed to upper bounds for the constants $\gamma_k$. Combinatorial analysis is used to bound the number of collations.

The methods used for producing bounds on the expected length of a common subsequence of two sequences are also used for other problems, namely a longest common subsequence of several sequences, a shortest common supersequence and a maximal adaptability.

Note.

My PhD thesis (515K) is available in compressed postscript format. Note that it is in 600dpi resolution. The high resolution can cause problems on some printers, lower resolution version is here (400K). If you have a doble-sided printer, you can prefer to download double sided version in either 600dpi (517K) or 300dpi (402K) resolution.

Another copy of my thesis is kept by Computer Science department, Warwick Universty, United Kingdom on their ftp server.


Bibtex entry:

@PhdThesis{DanPhD,
  author = 	 "Vladimir Dan{\v c}{\'\i}k",
  title = 	 "Expected Length of Longest Common Subsequences",
  school = 	 "University of Warwick",
  year = 	 "1994",
}


Return to Previous Level

USC Computational Biology Home Page
http://www-hto.usc.edu/people/dancik/thesis/index.html, webmaster@hto.usc.edu, 25 October 1996