The primary goal of this proposal is to collect high-resolution information on the distribution of proteins within mammalian cells and to link it to nucleotide and protein sequences. It builds on extensive prior work on development of protein tagging methods by the co-PIs and on development of software systems for automated analysis of subcellular patterns in fluorescence microscope images by the PI. 25,000 independent cell lines expressing GFP protein fusions will be created in NIH 3T3 cells using high-throughput CD-tagging (protein-trapping) methods. As the cell lines are created, high-resolution fluorescence microscope images will be collected using fluorescence microscopy and the gene and protein tagged in each cell line will be determined by high-throughput molecular analysis methods. The images will be subjected to automated, computerized image analysis to group proteins with statistically indistinguishable patterns. The determined location for each protein will be compared to whatever information is available from protein databases, journal articles and location predictors. Each assigned location will be accompanied by a confidence estimate derived from combining these sources. In addition, the images for each protein group will be used to build generative models that can synthesize new protein distributions statistically equivalent to the original images. The ability to synthesize distributions will provide an important structural framework for systems biology modeling of cell behavior in normal and disease states.