Computer methods were developed to define, classify and analyze all segments of protein sequences of improbably low compositional complexity. These include residue clusters of predominantly one or a few amino acid types, which commonly contain homopolymeric tracts or mosaics of these, aperiodic patterns and sections of low-period repeats. The abundance of these segments in sequence databases was determined and their properties were related to evidence of biological functions. A. Methods: Different formal definitions of local compositional complexity were developed to permit unbiassed identification of low-complexity segments, irrespective of their specific residue clustering or repeat patterns. Algorithms were developed and refined for optimal partitioning of sequences into segments of low and high complexity. Various statistical properties of the segments were used as optimization and classification heuristics and were tuned to (a) select segments for further study and (b) filter out non-informative segments prior to database searches. B. Abundance and biological properties: Using a relatively stringent complexity threshold, approximately 15% of the residues in protein databases are in low-complexity segments of typical lengths 15-50 amino acids, and approximately 40% of proteins contain one or more such segments. They are highly abundant in many eukaryotic proteins crucial in morphogenesis and embryonic development, transcriptional regulation, signal transduction and aspects of cellular and extracellular structural integrity and interactions. The sequences show diverse molecular interactions and tend to evolve rapidly. Significance of project: The project has highlighted the high abundance and biological importance of low-complexity protein segments and emphasized the relative lack of knowledge of their molecular structure and dynamics. The new computer methods are enhancing sequence database searches and analysis.