Better data about the risk behaviors and disease prevalence within high-risk groups are needed for understanding and controlling the spread of HIV/AIDS. Unfortunately, this information is difficult to collect with standard sampling methods. The goal of this research is to improve respondent-driven sampling, a promising new statistical method for collecting such information. Respondent-driven sampling (RDS) is a form of snowball sampling that allows researchers to study "hidden" or "hard-to-reach" populations that are difficult to study with standard sampling methods (e.g., men who have sex with men, injection drug users, and sex workers). RDS data is collected through a peer-referral process where current sample members recruit future sample members. This process results in a sample that, while not directly representative of the hidden population, can yield unbiased estimates of, for example, HIV prevalence, if certain conditions are met. Because of the pressing need to understand the hidden populations at high risk for HIV/AIDS and the limitations of previous methods to collect this information, RDS has already been used in more than 120 studies around the world including the Centers for Disease Control and Prevention's (CDC) National HIV Behavioral Surveillance System. Despite this widespread adoption, improvements to RDS are urgently needed because the statistical foundations of the method are still poorly understood and key implementation questions remain unanswered. In order to improve RDS, we propose to: 1) develop guidelines for RDS sample size calculation to ensure that studies have the desired level of statistical power;2) develop multivariate analysis procedures for RDS data;and 3) develop diagnostics to assess whether the assumptions behind RDS have been met. This research will achieve these specific aims through a combination of mathematical modeling, computer simulation, and the analysis of existing RDS data sets. Once complete, this research will help to establish statistical best practices for collecting and analyzing RDS data. Improvements to RDS will result in more accurate information about hidden populations that will facilitate research in the social sciences and public health. PUBLIC HEALTH RELEVANCE: The goal of this research is to improve respondent-driven sampling, a statistical method for studying "hidden" or "hard-to-reach" populations, including groups at high risk for HIV/AIDS (e.g., men who have sex with men, injection drug users, and sex workers). Improved information about risk behaviors and disease prevalence within these groups can be used to design and evaluate prevention programs, target resources where they are most needed, and ultimately help stop the spread of disease.