Several families of proteins are under investigation for the determination of the relationships between structure and function. Most of these protein families are eukaryotic nuclear proteins and have the property of binding to DNA or RNA, for example, the histone proteins, the HMG-1 box proteins, heat shock factors, ets proteins, HMG-I proteins, and ribosomal proteins. As an example, the HMG-1 box family of proteins is not highly conserved and is represented by over 100 examples in the sequence databases. We have identified a signature for this family of proteins and have classified them into several groups according to their sequence and functional relation. In the past few years, numerous proteins have been identified as containing a stretch of about 75 amino acids which are homologous to an abundant non-histone chromosomal protein HMG-1. These proteins bind DNA and bends it on binding or bind preferentially to bent DNA. Several of these proteins have been implicated in numerous nuclear functions including transcription, replication, and chromatin structure as well as transcription regulation in mitochondria resulting, in some cases, in such phenotypes as sex and mating type determination. For example, we have compiled a database of the HMG-1 box family of proteins and are analyzing the sequences to determine the phylogeny between these functionally widely-diverse proteins. The 3D structure of several HMG-1 box domains have been determined by multi-dimensional NMR. We have used one of these structure to model the other members of the family by a threading method. We have successfully produced models for the complete family of HMG-1 box domains. The conclusion drawn from these calculations is that these proteins are mostly likely to fold into similar conformations. Iterative motif search algorithms are being used to detect new and as yet unidentified motifs of several families of DNA-binding and RNA-binding proteins. In a separate project, we have identified several proteins which contain domains which could be related to the histone fold in the nucleosome octamer. We have identified conserved residues in all the histone proteins and related the conservation to the protein-protein and protein-DNA contact preservation in the histone folds in nucleosomes. We recently updated the histone database (HistoneDB 2.0 with Variants) for researchers to obtain curated sequence alignments of the histone proteins and we have previously identified histone fold containing proteins including some bacterial and archaeal proteins which contain the histone fo.. In addition to the identification of nuclear protein family studies, we have also collaborated with several groups on the nomenclature of protein families as well as for the genes which encode those proteins. For example, HMGN, HMGB, and HMGA proteins were renamed several years ago, and more recently, the variant histone proteins names were unified in using a phylogeny-based nomenclature. In addition, the human histone gene names have been renamed following a more logical naming scheme by the HUGO Gene Nomenclature Committee who have consulted me on the appropriateness of the new gene assignments. More recently, we are performing molecular modelling experiments on nucleosomes. We performed extensively long unconstrained microsecond molecular dynamics simulations of nucleosomes including linker DNA segments and full-length histone tails at all-atom level in explicit solvent. For the first time we were able to characterize the dynamic rearrangements in nucleosome structure including the coupling between the conformation of histone tails and the DNA geometry. We found that certain histone tails conformations promoted DNA bulging near its entry/exit sites, caused the formation of twist-defects within DNA and led to rearrangement of histone-DNA interactions, suggestive of the formation of initial nucleosome sliding intermediates. We characterized the dynamics of histone tails upon their condensation on the core and linker DNA at the atomistic level and showed that tails may adopt conformationally constrained positions accompanied by the insertion of anchoring lysines and arginines into the DNA minor grooves. One of the additional products of this project was the development of two resources which assisted the project in identifying histone proteins and their several variants. The HistoneDB 2.0 and the MS_HistoneDB resources are freely available to query, classify, and identify histone proteins.