The objective of this project is to develop software for the analysis of data from large- scale genotyping and sequencing genetic studies, building on the existing software package PLINK. PLINK, a software tool to manipulate and analyze whole-genome SNP datasets that has been actively developed over the past four years and has a wide base of users. The specific aims are to significantly upgrade core capacities, the interface, auxiliary resources and user-support: Core capacities: significantly adapt and upgrade data-storage capacities to handle a) order-of-magnitude larger datasets than can fit into memory and b) a more generic, unified representation of different types of genetic variation data and meta-information. Interface: extend the existing interface to provide a) a looser coupling between data storage and analysis components, via multiple interfaces in external languages, including standard bioinformatics tools such as R and Perl, and b) features designed to facilitate reproducible research and parallel processing. Auxiliary resources: package standard existing resources, including the functional annotation of variants, reference genome sequences and gene assemblies, pathways and ontologies, in a manner that allows seamless integration between genomic resources and user data. Support: create high-quality collection resources to support users, via online documentation and tutorials, including user-generated wiki pages, e-mail support and an annual training course. Particular attention will be paid to ensure interoperability with other major software, file-formats and resources that are generated by the broader genetics community. PUBLIC HEALTH RELEVANCE: This Project is to develop software for the analysis of large datasets from modern genetic studies. New high-throughput genotyping and sequencing technologies are capable of producing vast amounts of data, but there is a need for analytic tools that biomedical researchers can use. These studies have the potential to uncover genetic determinants for a large number of diseases and traits, which can be relevant for prediction of risk, and give insight into novel targets for treatments.