With the completion and public availability of the human genome sequence, it is now possible to perform large-scale, comprehensive genome analyses that were not possible even a few years ago. As the sequence has progressed from a working draft to a finished state, many groups have developed tools to annotate this sequence, thereby making it even more useful to the scientific community. My research focuses on developing methodologies to integrate, in an automated manner, these diverse sequence and annotation data with experimentally-generated data so that bench biologists can quickly and easily obtain results for their own large-scale, genome-wide experiments.[unreadable] [unreadable] The goal of one of my research projects is to take advantage of the publicly available set of sequence and annotations to develop automated tools for the computational characterization of experimentally identified genomic sequences. We align each sequence to the reference human genome assembly to determine its genomic location, and then compare the coordinates of this sequence to the coordinates of a variety of genome annotations. Using this approach, we can assign putative functions to the experimentally-identified sequences based on their proximity to known sequence features. In order to provide statistical rigor for the analysis, we have developed a pipeline to characterize sequences picked at random from the genome. [unreadable] [unreadable] We are applying this method to two types of research projects, which, although fundamentally different on a biological level, are identical from a computational perspective, as both involve determining the chromosomal location of a genomic sequence fragment and then analyzing the genomic context of the region. Dr. Gregory Crawford, a postdoctoral fellow in Dr. Francis Collins' lab, is developing an experimental strategy, based on the identification of DNAse I hypersensitive (DNAse HS) sites in the human genome, to identify regulatory regions in the human genome. He is pioneering a new technique, DNAse-chip, to identify DNAse HS sites using tiled microarrays. Our analysis of DNase HS sites found in a representative 1% of the human genome showed that the locations of these sites correlate well with other annotated regions of the genome known to mark gene regulatory elements, such as 5' ends of genes, CpG islands and highly conserved sequences.[unreadable] [unreadable] We have applied similar techniques during collaborations with NIH researchers to determine if retroviruses and retroviral vectors integrate randomly into the host genome during the process of retroviral gene therapy. With Dr. Fabio Candotti's lab at NHGRI, we have determined the integration sites in a patient treated in a retroviral gene therapy trial. We are in the process of determining whether any of these integrations could disrupt gene function and thereby affect the patient?s health, as well as whether the pattern of integration sites changes in the years post gene therapy. We are also collaborating on similar projects with Dr. John Tisdale of NIDDK and Dr. Cynthia Dunbar of NHLBI. Both labs are pursuing retroviral gene therapy in rhesus macaques (Macaca mulatta), with the eventual goal of improving techniques for retroviral gene therapy in humans. [unreadable] [unreadable] The completion of the human and other genome sequencing projects also makes it possible to perform comprehensive analyses on gene structure. With Dr. Lawrence Brody of NHGRI, we are exploring the role of exon size in protein evolution. We have expanded our analysis to include exons from eight organisms with complete or near-complete genome sequence, including human, mouse, chicken, zebrafish, nematode, fruit fly, rice, and thale cress.