With the Conserved Domain Database (CDD) resource we are producing a database of expert-curated protein domain alignments. Such alignment models describe the sequence and 3D-structure conservation within protein families, facilitating the annotation of conserved functional features. The alignment models also describe the variability present in a domain family, facilitating the depiction of its functional diversity.[unreadable] [unreadable] This project describes curation of CDD alignments by human experts. The role of the CDD curators is multifaceted. First of all they must survey relevant scientific literature, to produce concise summaries of the known functions of each domain family, to study existing sub-family classifications, and to choose citations useful to users of NCBI?s web-based classification resources. Curators must also examine the results of automated sequence and structure comparison to infer the location of conserved core blocks, an iterative process that requires judgment with respect to elimination of incomplete or erroneous sequence and structure data. Curators must also identify apparent orthology groups, based on the consensus of results from alternative molecular evolution and clustering methods. The curator group has so far produced about 1500 curated CDD families. Both curated and un-curated multiple sequence alignments are used to generate position-specific scoring matrices (PSSMs), which may in turn be used in NCBI's web-based protein classification resources. [unreadable] [unreadable] A number of NCBI information services use CDD to identify conserved domains within protein sequences. Links to CDD are made, for example, by default from:[unreadable] [unreadable] 1) NCBI?s protein-BLAST resource, http://www.ncbi.nlm.nih.gov/BLAST/[unreadable] 2) proteins in NCBI?s Entrez browser, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein[unreadable] 3) records in NCBI?s HomoloGene system, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=homologene.[unreadable] [unreadable] Further information about CDD and these search services is available at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml. [unreadable] [unreadable] Curated domain models summarize the known functions of family members, using relevant citations from PubMed when possible, and may link to resources on the NCBI Bookshelf for further information. They also provide site-specific functional annotation, via sequence and structure alignments and via pre-recorded evidence-based features, such as interaction or active sites. The CDD alignment curation project differs from comparable efforts, upon which it builds, in two fundamental ways: (i) 3D-structure information is used in a quantitative way, whenever possible, to guide the alignments, and (ii) an explicit hierarchy of families and subfamilies, related by descend from a common ancestor, reflects the evolutionary history of each domain super-family. [unreadable] [unreadable] When at least one 3D structure is known within a domain family, this information is used to define the conserved homologous core structure, a set of un-gapped blocks that must be identified in all representative sequences included in the alignment. Representative sequences are aligned to this core structure using structure-informed alignment algorithms or, when multiple 3D structures are known, alignments obtained from structure superposition. These procedures assure high alignment accuracy, as needed for accurate transfer of annotation to new family members identified by searching. Representative sequences are picked from a set of ?preferred taxonomy nodes?, so that the domain alignments represent the taxonomic span of a family, which in turn indicates its apparent evolutionary age.[unreadable] [unreadable] Explicit hierarchies identify major gene duplication events in the molecular evolution of each family. Our basic strategy is to use domain-sequence clustering methods together with known domain architecture and phylogeny to identify what appear to be ancient orthology groups. These define explicitly annotated "children" of the overall "parent" alignment, and in turn provide more specific functional annotation. The CDD project employs a high level of automation, to produce structure-based alignments, to identify candidate orthology groups, to update CDD alignments with new sequences and structures, and to "publish" the results to web servers. These algorithms and associated software required are described under another project, "Alignment methods for a conserved domain database".