Studies on sequences:POTE is one of the genes we found some time ago through our gene discovery project. The human genome contains 13 closely similar paralogs of this gene dispersed in 8 different chromosomes. Different paralogs are expressed in only a few normal tissues (prostate, ovary, testis and embryonic stem cells) but in numerous cancer cells and tissues. The POTE gene family is primate-specific, but we do not know the function of these genes. We found the following about these genes this year:(1) We identified what is likely to be the single ancestral gene, which has an ortholog in the mouse. Our experimental colleagues in Pastan's laboratory are conducting functional studies of this gene in mouse.(2) Some POTE gene paralogs acquired an actin transposon, which inserted in-frame in an exon of the parental POTE gene. Our experimental colleagues showed that this POTE-actin chimeric gene produces the ecpected fused protein product.In an unrelated study, we compared the genomes of human, chimpanzee and other species to find human-specific frameshift or nonsense mutations. We could confidently identify 18 such genes. For example, one of the genes that contain human-specific non-sense mutation is NPPA, which codes for a precursor protein for the peptide ANP, which plays a central role in the regulation of blood pressure. The non-sense mutation deletes the last two arginine residues from the peptide. Human population is polymorphic with respect to the non-sense mutation. It has been reported that the monkey form allele is significantly associated with the increased risk of stroke recurrence in humans.Studies on protein structure:In collaboration with Dr. Peter Munson's group at the Center for Information Technology (CIT) of NIH and with Drs. Jean Garnier and Jean-Francois Gibrat of the Institut National de la Recherche Agronomique, Jouy-en-Josas, France, we studied the world of protein structures through the well-known, manually procured protein structure classification database SCOP. We found that protein structure space is continuous and that it is often artificial to draw a line in this space to classify the protein structures in mutually exclusive groups. The manually procured SCOP database is based on recognition of particular structural elements, called the 'core', that is common for a group of proteins. However, we found that different groups of protein structures may share different core elements and that one core can transform to another core through a set of successive structures that contain some features of both cores. We are currently studyng automatic, as opposed to manual, clustering procedures.