Introduction: The poor fidelity of the HIV reverse transcriptase enzyme leads to a significant sequence diversity across infected individuals. Although this complicates antiretroviral therapy, this diversity allows researchers to characterize relationships between sampled viruses using phylogenetic tools. Once the phylogenetic relationships of HIV sampled from infected individuals are characterized, socio-demographic data from the sampled individuals may be overlaid onto these phylogenetic relationships leading to inferred reconstruction of social networks that may create a better understanding of socio-demographic patterns and drivers of the sampled epidemic. Previous work has characterized some of the risk factors for HIV infection in both San Diego, CA and Tijuana, Mexico. We propose to utilize objective sequence data along with clinical and socio-demographic data to improve the understanding of the relationship of the HIV-1 sub-epidemics along the San Diego-Tijuana border. Methods: In this proposal, sequence, clinical, and socio-demographic data gathered from HIV positive individuals enrolled in seven different collaborating cohorts along the San Diego- Tijuana border will be combined for analysis. HIV pol sequence data will be collected from the collaborating cohorts or generated from blood samples provided by them. Socio-demographic and clinical data will be abstracted from the databases of the cohorts in a de-identified manner, except for location of residence. Maximum likelihood based methods will then be used to determine the phylogenetic structure (i.e. clustering) of the sampled sequences, and then associations between the clustering and socio-demographic variables will be assessed including: location of residence, HIV risk factors, drug resistance, duration of infection, cross- border movement, and others. We will next determine the spatial and temporal dynamics of the HIV epidemics across the border region using geographic information systems, and coalescent theory based Bayesian phylogeographic analyses. These analyses can incorporate temporal and geographic data into the prior estimations of phylogenetic structure so that the final results may identify temporal and spatial transmission 'hot spots'. In order to address privacy issues, all geographic data will be smoothed at the time of presentation so that identification of individuals will not be possible. We will also work closely with our bioinformatics colleagues to understand the limitations of our convenience data, and develop statistical techniques to make our findings generalizable. Conclusions: We anticipate that our findings will improve understanding of HIV transmission dynamics in this region so that prevention strategies can be designed and targeted more effectively. This research project has two major aims: 1) to identify risk factors related to HIV transmission, and 2) identify temporal and spatial 'hot spots of transmission in the San Diego-Tijuana border region. These results will assist public health agencies in Mexico and the US to more efficiently develop appropriate and targeted intervention strategies to curb ongoing HIV transmission.