Most mammalian genes have been characterized at the nucleotide level with little information as to their function. To facilitate genetic and biochemical studies of mammalian gene function, new retrovirus gene trap vectors will be developed for large-scale mutagenesis of murine embryonic stem (ES) cells. The vectors are designed to function as 3' gene (or PolyA) traps and target most genes regardless of whether they are expressed in ES cells. Disrupted genes will be identified by sequencing cDNAs (cloned by 3' RACE) from virus-cell fusion transcripts. The 3' RACE products are suitable for preparing high density DNA microarrays, providing a resource to identify genes in the mutant library that are differentially expressed in specific cell types. An enhanced green fluorescent protein (EGFP) reporter can be used to monitor expression of the occupied genes in the intact animal. Finally, the vectors contain heterotypic recombination sequences (loxP) for the Cre site-specific recombinase. These sequences will permit rapid replacement of the targeting vector with DNA sequences introduced transiently into the original ES cell clones. Libraries of sequenced mutations offer the immediate potential of constructing mice with germline mutations in specific genes of interest. The ability to engineer tagged loci by Cre-mediated gene replacement has a number of applications that extend the utility of the mutant ES cell resource. These could include: (I) production of fusion proteins for studies of protein-protein interactions in vivo; (ii) construction of modified loci for conditional gene knockouts, (iii) conversion of the original loss-of-function mutations into an allelic series of missense mutations and (iv) expression of heterologous transgenes under the control of selected cell-type specific promoters in their normal chromosomal locations. Methods will be developed to engineer genes disrupted in ES cells so that they express affinity-tagged fusion proteins for studies of protein-protein interactions. For this, the retrovirus vector in selected clones will be replaced by cassettes encoding a protein tag that is designed to splice in-frame with upstream or downstream coding exons of the occupied cellular gene. The protein tag contains two separate affinity domains, allowing the fusion proteins and any associated cellular proteins to be purified by tandem affinity chromatography. Co-purifying cellular proteins will be identified by mass spectrometry. Finally, transmission of tagged fusion genes into the germline will be used to analyze protein-protein interactions in normal mouse tissues.