Adeno-Associated Virus Type 2 (AAV2) preferentially integrates its genome into the AAVS1 locus on human chromosome 19. Preferential integration requires the AAV2 Rep68 or Rep78 protein (Rep68/78), a Rep68/78 binding site (RBS) and nicking site within AAVS1, and may also require an RBS within the virus genome. To obtain further information that might help to elucidate the mechanism and preferred substrate configurations of preferential integration, we amplified and sequenced junctions between AAV2 DNA and AAVS1 from AAV2-infected HeLaJW cells and human cell lines with defective Artemis or xeroderma pigmentosum group A (XPA) genes. The integration junction sequences show the three classical types of non-homologous end joining joints; microhomology at junctions (57%), insertion of sequences that are not normally contiguous with either the AAV2 or AAVS1 sequences at the junction (31%) and direct joining (11 %). These junctions were spread over 750 bases and were all downstream of the Rep68/78 nicking site within AAVS1. Two-thirds of the junctions map to 350 bases of AAVS1 that are rich in polypyrimidine (pPy) tracts on the nicked strand. Further evidence for a correlation between pPy tracts and integration junctions is seen in a 120 base pair region of AAVS1 that lies between two clusters of junctions, which contains no integration junctions and no pPy tracts of greater than 3 bases on the nicked strand. Computer-assisted analysis of the junction sites revealed that several were near potential binding sites for the human purine-rich element binding protein alpha (PUR alpha), a single-stranded DNA-binding protein that binds preferentially to the purine-rich element termed PUR, which is present at origins of replication and in gene flanking regions in a variety of eukaryotes from yeast through humans. PUR alpha is believed to be involved in the control of both DNA replication and transcription, and can promote the unwinding of double-stranded DNA containing polypurine/pPy tracts. [unreadable] The majority of AAV2 breakpoints at junctions were within the inverted terminal repeat sequences (ITRs), which contain RBSs. We never detected a complete ITR at a junction. Residual ITRs at junctions never contained more than one RBS, suggesting that the hairpin form, rather than the linear ITR is the more frequent integration substrate. Our data is consistent with a model in which a cellular protein other than Artemis cleaves AAV2 hairpins to produce free ends for integration. Based on our observations, we propose a novel predominant mechanism for Rep68/78-mediated integration into AAVS1 in which a hairpinned ITR is first processed by a cellular endonuclease to produce a 3-prime overhang. This overhang then anneals to its complementary sequence within the intact strand of AAVS1, which has been exposed by DNA unwinding ahead of an active or stalled replication complex assembled at the Rep68/78 nicking site. This annealed 3-prime end is then extended by a DNA polymerase to create the first de facto joining of AAV2 to AAVS1 DNA sequences. Further analysis of AAV2 integration in cells with known defects in DNA damage repair/signaling should help to further constrain models of preferential integration.