PROJECT SUMMARY We do not have a broadly efficacious vaccine against HIV, a virus that causes approximately 2 million new infections each year. Current proof-of-concept studies using broadly neutralizing antibodies (bnAbs) against HIV aim to understand how prevention varies with genotypic characteristics of the virus. Since performing an exhaustive search over all genotypic characteristics results in low statistical power to detect effects after adjusting for multiple comparisons, researchers typically pre-specify a small number of features to focus on. There is growing interest in using machine learning-based methods to both corroborate prior understanding and suggest new important genotypic characteristics in predicting sensitivity of the HIV virus to bnAbs. While machine learning-based methods have the potential to yield valid predictive models, issues remain in using these methods for estimating importance. The proposed research will address three such issues: developing a model-free variable importance measure, incorporating information from complex sampling designs, and valid statistical inference both when a genotypic feature is truly important and when it is not. First, the main classical tool for evaluating the importance of characteristics is the ANOVA decomposition, which makes strong modeling assumptions. Machine learning-based methods use minimal assumptions; however, these methods do not generally admit valid statistical inference, and the importance estimates are intimately tied to the technique employed. We will employ an approach based on ideas from the theory of semiparametric estimation and inference to develop a model-free measure of variable importance with valid confidence intervals for the true importance. Second, many HIV vaccine trials incorporate a nested case-control study, where additional information is measured on a subset of the trial participants. Estimating importance only using the subset ignores information from the remaining participants, resulting in a loss of efficiency and potentially adding some bias in estimating variable importance. The proposed research will develop methods that properly account for the sampling design. Finally, to determine if a set of features can be excluded from further analyses, we need a procedure for testing if the feature set truly has no importance. Hypothesis testing using machine learning-based methods is challenging, but we will build on recent advances in semiparametric inference to develop valid procedures for hypothesis testing in the context of variable importance. By combining advances in machine learning technology with ideas from semiparametric estimation and inference, we will determine important feature sets in predicting sensitivity of the HIV virus to bnAbs. In addition to yielding a deeper understanding of HIV neutralization, this information will allow researchers to make the best possible use of data from current clinical trials. This, in turn, could lead to either a shorter time to an HIV vaccine or new bnAbs in the research pipeline that are more broadly efficacious or potent. Any of these outcomes will transform preventative care for patients at risk of HIV infection.