I will develop improved methods for analyzing several types of genome mapping and sequencing data, which will then be incorporated into my existing software packages CRIMAP, SEGMAP and GENEFINDER and distributed to the genome analysis community. 1. I have begun work on a unified approach to analyzing, and integrating into a single physical map, the data from a variety of different mapping techniques, including STS-content mapping of YAC contigs, linkage mapping, radiation hybrid mapping, in situ hybridization, and pulsed field gel restriction mapping. This approach, called genomic segment analysis, is based on the observation that these different mapping methods can all be viewed as providing information about relationships between genomic segments of various types. A statistical approach to analysis of genomic segment data, inspired by linkage analysis methods, has been implemented in the program CRIMAP, and a combinatorial (deterministic) approach, inspired by methods for constructing STS content maps of YAC contigs, has been implemented in the program SEGMAP. These methods will be further developed, with the primary emphases being on allowing efficient joint analysis of different types (and large amounts) of data, on characterizing and representing ambiguities of map order and distance, and on detection of data errors. Simulation studies will be carried out to investigate the accuracy of maps constructed using these approaches, examining in particular the effects of data errors. 2. I will improve CRIMAP's ability to perform multilocus linkage analysis with disease loci, by extending its current efficient likelihood computation and maximization methods to handle incomplete pedigree information and more general disease locus models. 3. The program GENEFINDER uses a systematic statistical approach to identify and display probable exons in C. elegans genomic sequence. I will develop its ability to analyze other genomes, including the human. Other improvements will include the automated construction of candidate genes from their component exons, automatic identification of likely regions of sequencing errors, and extension of the display capabilities to include other types of genomic features, such as repeats, promoter sequences, and protein motifs. In addition, I will systematically compare the power of this approach with recent "neural net" approaches to gene identification.