The genetics and genomics communities are advancing rapidly in the Next-Generation Sequencing (NGS) era. The identification of both common and rare genetic variants from large-cohort studies and Mendelian studies provides new opportunities to elucidate disease etiologies and underlying molecular mechanisms. That ultimately will lead to novel and personalized diagnostics, prognostics and therapeutic treatments. However, significant analytical challenges remain: (1) the discovery and haplotype phasing of rare variants remain difficult; (2) data analysis is fragmented when multiple datasets [SNP arrays, whole-exome sequencing (WES), and/or low-coverage whole-genome sequencing (WGS)] are available; and (3) bioinformatics methods and software are difficult to use for average users: there is no unified bioinformatics framework and many different tool sets are needed for an end-to-end process. Advanced computational and statistical methods and friendly software are urgently needed to meet the demand of the community. The overall goal of this application is to develop an integrative and novel analytical framework that can significantly improve the sensitivity and accuracy of rare variant discovery and haplotype phasing and harmonize multiple datasets in genomics studies. In order to do so, the following specific aims will be pursued: 1) Develop a framework for improvement of rare variant discovery and haplotype phasing using read information. 2) Develop a framework for integrating multiple genetic variation datasets. 3) Validate genotyping and phasing of rare variants for pipeline optimization and cross-evaluation between different methods using simulated and experimental data. 4) Develop software packages with Cloud deployment for the community. The approaches are innovative because they utilize novel concepts and methods to improve the accuracy of genotype calling and haplotype phasing from NGS data and to integrate multiple types of genotype data. Successful accomplishment of our proposed aims will dramatically improve the sensitivity and accuracy in rare variant discovery and phasing, expediting the understanding the genetic architecture of human diseases.