The Human Genome Project (HGP) promises to deliver a complete human genome sequence by the year 2005. The applicant estimates that about one half of the sequencing effort is devoted currently to manual sequence editing and directed sequencing of "gap regions." Gap regions occur because of a variety of sequencing artifacts including mispriming of directed oligonucleotide primers, band-compressions in gel electrophoresis, and the formation of stable structures that inhibit DNA polymerase. In addition, DNA oligonucleotide hybridization arrays and other technologies, which are being developed for genome sequence verification and gene diagnostics, also show a high incidence of artifacts. These artifacts are due to formation of mismatched and template secondary structure. To solve these problems, the thermodynamic rules governing DNA hybridization and secondary structure formation will be determined. The Watson-Crick pairing rules are not sufficient to design robust DNA primers or hybridization arrays. This is because real hybridization reactions involve competition among "matched" and "mismatched" binding sites as well as intramolecular folding in the target and probe DNAs. In addition, hybridization equilibria are sensitive to solution conditions. DNA folding also alters diffusion properties, resulting in poor separation of fragments (e.g. electrophoresis band compression). A complete set of thermodynamic parameters for Watson-Crick pairs, all internal single mismatches, and an empirical Na+ concentration dependence has already been determined. These new data significantly improve hybridization predictions. To further improve the quality of predictions, thermodynamic measurements will be determined for various commonly occurring loop motifs in a variety of solution conditions. This database will be incorporated into new algorithms for primer design, base calling, and design of hybridization arrays. Novel modified oligonucleotides, called "structure breakers," are proposed to break stable DNA secondary structures that inhibit the processivity of DNA polymerases and to facilitate sequencing of difficult genome gap regions. The investigator predicts that the proposed studies will accelerate the production of genome sequencing centers and reduce the cost of sequencing. In addition, the investigator predicts that this information will find wide applications in biotechnology and bioinformatics.