With the completion and public availability of the human genome sequence, it is now possible to perform large-scale, comprehensive genome analyses that were not possible even a few years ago. As the sequence has progressed from a working draft to a finished state, many groups have developed tools to annotate this sequence, thereby making it even more useful to the scientific community. My research focuses on developing methodologies to integrate, in an automated manner, these diverse sequence and annotation data with experimentally-generated data so that bench biologists can quickly and easily obtain results for their own large-scale, genome-wide experiments. The goal of one of my research projects is to take advantage of the publicly available set of sequence and annotations to develop automated tools for the computational characterization of experimentally identified genomic sequences. We align each sequence to the reference human genome assembly to determine its genomic location, and then compare the coordinates of this sequence to the coordinates of a variety of genome annotations. Using this approach, we can assign putative functions to the experimentally-identified sequences based on their proximity to known sequence features. In order to provide statistical rigor for the analysis, we have developed a pipeline to characterize sequences picked at random from the genome. We are applying this method to two types of research projects, which, although fundamentally different on a biological level, are identical from a computational perspective, as both involve determining the chromosomal location of a genomic sequence fragment and then analyzing the genomic context of the region. Dr. Gregory Crawford, a postdoctoral fellow in Dr. Francis Collins' lab, is developing an experimental strategy to identify regulatory regions in the human genome. To achieve this goal, he clones and sequences DNAse I hypersensitive (DNAse HS) sites. Our analysis of 5600 hypersensitive sites from quiescent primary human CD4+ T cells suggests that the sites occur frequently in regions thought to be involved in gene regulation, including upstream of genes, and within CpG islands and regions of human/mouse conservation. We are now adapting our tools to analyze ~230,000 sites from CD4+ T cells. We have applied similar techniques during collaborations with NIH researchers to determine if retroviruses and retroviral vectors integrate randomly into the host genome during the process of retroviral gene therapy. With Dr. Cynthia Dunbar?s group at NHLBI, we have determined the integration sites of murine leukemia virus (MLV) and simian immunodeficiency virus (SIV) in primate hematopoietic stem and progenitor cells. Our finding that MLV integrants are located predominantly around transcription start sites while SIV integrants strongly favor transcription units and gene-dense regions of the genome suggests distinct safety implications for MLV and SIV vectors. We also collaborate with Jaya Jagadeesh in Dr. Fabio Candotti's lab at NHGRI, who is cloning and sequencing retroviral integration sites in a patient treated in a retroviral gene therapy trial. We are in the process of determining whether any of these integrations could disrupt gene function and thereby affect the patient?s health, as well as whether the pattern of integration sites changes in the years post gene therapy. Finally, trainees in Dr. Jennifer Puck?s lab at NHGRI have transduced CD34 cells with an X-linked severe combined immunodeficiency (XSCID) gene therapy retroviral vector. We are carrying out a computational characterization of the integration sites in different sources of CD34 cells. The completion of human genome sequencing also makes it possible to perform comprehensive analyses on small-scale projects. Previously, I discovered a novel gene family termed ADAM, for membrane proteins containing A Disintegrin and Metalloproteaase domain. A total of 39 members of the ADAM family have been identified to date, and they are involved in many events including fertilization, neurogenesis and myogenesis, as well as in the process of ectodomain shedding. I have carried out a comprehensive search for ADAM and ADAM-like genes in the completely sequenced genomes of human and mouse, and plan to extend this analysis to other organisms as their genomes near completion. This work will allow a more thorough understanding of the complex roles that the ADAM proteins play in these different organisms, as well as the evolutionary events that gave rise to this large gene family.