The number of classes of proteins that recognize specific sequences and structures of DNA is continuing to grow rapidly. Several are large complexes of several types of subunit. In this project, computational strategies are being developed to recognize some of the functional amino acid sequence patterns involved, many of which are subtle and variable. Improved methods are being applied to the analysis of new sequences determined in other NIH laboratories, resulting in new insights into the molecular structures and functions of important gene products. For patterns of well-established DNA binding regions such as helix-turn-helix and basic regions, new methods for evaluation of the diagnostic power of pattern discriminators are being developed and applied to more novel patterns. Low-complexity sequences are frequent in DNA-binding proteins and require analysis and filtering before database searches. The gibbs method of automated local multiple alignment by iterative sampling is also included in these strategies. Test sets of various types of DNA- binding motifs have been judiciously constructed for evaluation of this and related methods. Specific sequences analyzed include (1) 127 kDa component of a UV-damaged-DNA binding complex, which is defective in some Xeroderma pigmentosum Group E patients and shows sequence similarities to proteins of unknown function from a wide range of eukaryotes, (2) six subunits of the transcription factor TFIID complex that is central in transcriptional regulation by facilitating promoter responses to various activators, and (3) various families of DNA repair enzymes.