The recent discovery that copy number variations (CNVs), the loss or gain of small genomic segments, is common in all normal individuals and plays a major role in human phenotypic variation and disease, has had a profound impact on our understanding of human genetic variation. However, our ability to predict which CNVs have biological or health significance is severely limited;the acquisition of more comprehensive and accurate CNV data from multiple normal and patient populations is an urgent research and public health priority. Powerful technologies for high-resolution CNV assessment are now available and have moved into clinical diagnostic use to evaluate children with unexplained intellectual disabilities, autism, or multiple birth defects (termed "molecular karyotypes" or cytogenomic arrays). Current multicenter trials are also underway to determine the efficacy of these technologies for prenatal diagnosis. It is very likely that cytogenomic arrays will become the method of choice for both pediatric and prenatal cytogenetic analysis within the next few years. The large number of cytogenomic array tests now being performed by clinical cytogenetics laboratories presents an unusual and timely opportunity to capture large datasets from patient populations to contribute to our understanding of the consequence of CNVs from clinical populations compared to normal populations. The overall goal of this project is to leverage this large clinical dataset generated in the course of clinical care to create a research resource for gene discovery related to human developmental disorders as well as to build an invaluable clinical resource for learning about the clinical and public health impact of CNVs. This project will encompass four specific aims: 1) Collection of very large standardized datasets from clinical array testing in pediatric and prenatal populations. Using a novel "opt-out" consent mechanism in addition to a traditional full informed consent model, we will develop methods for a large consortium of clinical sites and clinical genetics testing laboratories to collect and submit CNV and clinical data to a central, public data repository. 2) Standardize array design and genotype (CNV) data formats for clinical laboratories. In partnership with the International Standard Cytogenomic Array (ISCA) Consortium, we are developing standards for array design, resolution, format, and guidelines for interpretation of benign versus pathogenic CNVs. 3) Develop standardized clinical (phenotype) data. A Phenotype Workgroup will develop standard vocabularies and data dictionaries for phenotypic information using current international recommendations. 4) Data collection/repository, curation and visualization tool development. A database workgroup will oversee development of software bridges and adaptors to automate data de-identification, reformatting and transfer to a central public repository (dbGaP, NCBI). Methods for automated and expert data curation will be developed prior to public release for the research community as well as clinicians. User-friendly tools for data visualization and analysis will be developed in partnership with academic groups and commercial vendors. PUBLIC HEALTH RELEVANCE: The loss or gain of small regions in our genome, Copy Number Variations (CNVs), have recently been recognized as a major cause of both common and rare human diseases with enormous potential public health impact. Our current understanding of CNVs is limited: some are truly benign, some are thought to be benign but may in the future be shown to be associated with common diseases, and others are associated with disease and can provide important information for patients and families. To better understand CNVs, we propose a unique strategy to obtain high-quality, standardized data on CNVs linked to clinical information from very large populations (hundreds of thousands) during the course of routine clinical care in a prenatal and pediatric setting at minimal cost.