Abstract This proposal is largely motivated by our involvement with the Botswana Combination Prevention Project (BCPP) which is an on-going large scale human immunodeficiency virus (HIV) cluster randomized prevention trial conducted in 30 communities across Botswana. As in most HIV prevention studies, incomplete data on HIV status and nonresponse to queries about sexual behavior is an important challenge the study currently faces, with data likely missing not at random and in complex patterns across individuals. Recognizing that existing statistical methods for missing data are largely ill-suited to fully address this important problem in HIV research, we propose to develop the next generation of missing data methods going well beyond current theory of identification and inference. Specifically, we propose (1) to develop a unified theory of identification bringing together recent developments in the theory of identification based on causal graphs with recent identification results from the statistics literature. This will allow us to establish conditions under which in complex missing data settings as in the BCPP, one can untangle features of the underlying population which may be of scientific interest from features of the non-response process not necessarily of scientific interest;(2) to build on (1) to develop corresponding inverse-probability-weighted and doubly robust methods for statistical inference in the BCPP where data are likely to be missing not at random and in complex patterns; (3) to develop novel semiparametric imputation methods that solely rely on assumptions encoded in the nonresponse process, thus allowing the complete data distribution in the BCPP to remain unscathed by the imputation process; (4) to develop user-friendly software to facilitate widespread use of the methods developed in Aims 1-3, and to apply and demonstrate their good performance in extensive simulation studies as well as in answering scientific queries of primary interest in the BCPP.