We propose to design efficient computer algorithms providing novel and/or improved methods and software for a number of computational problems in molecular biology. the emphasis will b eon blending theoretical results with practical concerns. All software emerging from the project will be made available free of charge. The proposal and the principal investigator's current research efforts are divided into three projects. The first project centers on computational problems in the physical mapping and sequencing of DNA. We propose to continue refining a software system for the fragment assembly problem, i.e., determine the most likely complete DNA sequence consistent with electrophoresis data gathered from cloned fragments. The refinements consist of improved algorithms for a number of the phases of the computation, expanding the functionality to support user- interaction, and developing a complete environment to support megabase sequencing projects. The methods we developed for fragment assembly also apply in large part to the problem of determining physical maps via various fingerprinting techniques. We have formulated a generalized assembly problem and plan to build a system that is capable of solving such problems for any combination of restriction map, digest, and hybridization information about the clones. The second project is to design algorithms for a number of computational problems arising in molecular biology. Progress in this arena tends to be inspired rather than calculated. We demonstrate our track record of producing interesting results and then describe the following problems for which we have a number of ideas and preliminary results: sublinear similarity searches, restriction map comparison, super pattern matching (gene recognition), determining restriction maps from digest data, designing oligonucleotide probes, and RNA secondary structure prediction. The objective of the last project is to develop a pattern matching system permitting the expression of complex patterns and their reduction to efficient search strategies. The pattern specification language is simple yet powerful enough to succinctly express the most complex patterns of biological interest. An "expert system" compiler for the language will examine a pattern and will choose a search strategy or combination of strategies from a built-in library of basic search techniques. The build- in library will contain implementations of the best available search algorithms for exact and approximate matches to keywords, repeats, and regular expressions. Using a dynamic-programming style calculation the expert compiler chooses the optimal backtracking strategy over the basic library searches. We have proven the efficacy of this approach on a small prototype for a subset of the language that is useful for specifying protein motifs. We now propose to embark upon the construction of a complete system.