Transport systems provide essential functions for all living cells. Their importance to medicine cannot be overemphasized as (1) they provide a basis for multidrug resistance in pathogenic bacteria, fungi and protozoans as well as in tumor cells, (2) they are defective in numerous human genetic diseases, and (3) they are essential elements of toxin secretion systems by virtually all pathogenic organisms. We have developed a classification system for transport proteins called the Transporter Classification (TC) system. The TC system is a functional/phylogenetic system designed for the classification of all transmembrane transport proteins found in living organisms on Earth. It parallels but differs from the strictly functional EC system, developed decades ago by the Enzyme Commission of the International Union of Biochemistry and Molecular Biology (IUBMB) for the classification of enzymes. The TC system has recently been adopted by the IUBMB as the internationally acclaimed system to the classification of transporters. TCDB is a curated repository for factual information compiled from more than 10,000 references, encompassing approximately 3,000 representative transport proteins and putative transporters, classified into about 400 families. The primary goal of this proposal is to facilitate the updating, expansion and improvement of TCDB for use by the international scientific community. Our primary goals are first, to automate text classification and information extraction software so as to facilitate the continual updating of TCDB as newly published information about transporters becomes available (Specific Aims 1-3), and second, to devise software for the semiautomatic prediction of transport functions for putative transporters for which complete sequence data but little or no experimental data are available (Specific Aims 4-6). Our specific aims are to: (1) Incorporate into TCDB a text classification system for automatically identifying published literature describing newly characterized transport systems or previously characterized systems where new information is provided or old data are corrected. (2) Design information extraction software for incorporation of pertinent information from publications identified in specific aim #1 into TCDB. (3) Automate links to the primary literature from which the information in specific aim #2 was extracted as well as links to protein and operon analysis tools. (4) Incorporate software that will allow identification of motifs, signature sequences, protein domains, phylogenetic relationships and structural features for the proteins in TCDB and their homologues for the purpose of functional prediction. (5) Incorporate software that will facilitate the extraction of operon analysis data that can similarly be used for functional prediction. (6) Develop approaches that will allow integration of the different lines of functional information revealed in specific aims #4 and 5.