Most mammalian genes will soon be characterized as cDNA sequences with little information as to their function. To utilize this sequence information of large-scale functional studies, a process of tagged-sequence mutagenesis has been developed to disrupt genes expressed in mouse embryo-derived stem (ES) cells. Libraries of ES cell clones will be isolated in which expressed cellular genes have been disrupted by a gene trap retrovirus shuttle vector. Flanking regions adjacent to the integrated provirus in each clone will be isolated by plasmid rescue and sequenced. The flanking sequences, designated PSTs, will be compared with the nucleic acid databases to identify instances in which the provirus has disrupted known genes or previously characterized cDNAs. In practice, this requires only 300 nucleotides of flanking sequence, since gene entrapment selects for proviruses inserted in or near transcribed exons. Presently, approximately 15 percent of inserts (65 or 400) have disrupted previously characterized genes or cDNAs. Insertion mutations will be characterized at a rate of 300 per month, allowing a significant fraction of the estimated 10-20,000 genes expressed in ES cells to be disrupted and characterized in the next five years. ES cell clones containing specific mutations will be made available to interested investigators. Each clone will be stored in liquid nitrogen, providing an immediate source of mutated genes for transmission into the mouse germline. The appropriately annotated PST sequences will be submitted to GenBank, so that later, investigators worldwide can learn of the existence of mutations affecting specific genes and cDNA sequences by standard database searches. The appropriate ES cell clones will then be provided on request. The ability to induce, characterize and maintain mutations in ES cells circumvents many limitations associated with conventional mammalian genetics, and will greatly increase the number of mutant alleles (typically loss of function mutations) by which gene functions can be studied in mice and in cell lines derived from such mice. The process will facilitate a functional analysis of a mammalian genome and will provide animal models for human genetic diseases.