We will design algorithms for searching and analyzing biosequence data. The algorithms will be subjected to both theoretical analyses and empirical testing. Those algorithms that perform best in practice will be implemented as portable software and distributed free of charge. Our research will focus on two areas, analysis of restriction maps and efficient searching of biosequence libraries. Our work on restriction maps is aimed at developing software to perform tasks currently done by hand. Such software will be useful during several phases of genetic-sequence analysis. Initially, it will aid assembly of a complete restriction map from partial maps. For mapped but unsequenced DNA, these tools can locate conserved or repeated regions, construct phylogenetic trees, etc. Eventual determination of the DNA sequence is facilitated by the software's ability to position a given clone or sequence fragment on a mapped genome. Moreover, map-analysis programs are potentially useful whenever restriction fragment length polymorphisms are analyzed, such as for diagnosis of genetic disorders or for paternity testing. Effective computer analysis of restriction maps will require research to extend classic sequence-comparison techniques. These advances, in turn, will lead to tools for handling other types of data that interest biologists, such as gel readings and chromosome banding patterns. During the next three years, our map-analysis software will be improved to more accurately model all characteristics of restriction data and to execute more efficiently. In addition, we will develop new programs that assemble a complete map from partial data and that locate regions common to several restriction maps. The potential use of parallel computers will be evaluated. Our second research project is to develop efficient software for searching biosequence libraries. Libraries of DNA and protein sequences are growing at an ever-increasing rate and huge amounts of money will soon be spent to gather even more sequence data. Development of improved methods to search and analyze sequence data is necessary before our tremendous investment will pay dividends. Specific research problems that we are proposing to address include extending several recent improvements in the sensitivity or efficiency of library-search routines to a wider class of biosequence patterns. For one class of patterns, namely regular expressions, we will develop methods that improve efficiency and that increase the analysis capability. Finally, we will investigate computer tools to assist biologists in analyzing sequences to discern patterns in the data.