This research program focuses primarily on the computational analysis of proteins that play a fundamental role in metazoan development. A variety of bioinformatic approaches are used to understand the evolution and function of these proteins and their ultimate role in human disease. Homeobox (or Hox) genes are organized in conserved genomic clusters across a range of phylogenetic taxa. Over evolutionary time, the functional diversification of these Hox genes has contributed to the diversification of animal body plans. Building upon prior work on the origin and early evolution of these Hox genes, our focus has turned to analyzing the genomes of early-branching metazoan phyla to better-understand the relationship between genomic complexity and morphological complexity, as well as the molecular basis for the evolution of novel cell types. To fill the void regarding the availability of high-quality, genome-scale sequence data in this part of the evolutionary tree, in collaboration with our colleagues at the University of Hawaii and at the NIH Intramural Sequencing Center (NISC), we have sequenced, annotated, and performed a preliminary analysis of the 150-megabase genome of the ctenophore Mnemiopsis leidyi, with roughly 12x coverage of the genome. Briefly, a combination of Roche 454 sequencing and Illumina GA-II mate-pair sequencing was performed, yielding a final assembly of 5,100 scaffolds with 160-fold physical coverage and an N50 of 187 kb. In addition, RNAseq data from mixed-stage Mnemiopsis embryos, generated using Illumina GA-II sequencing, was assembled using our genome sequence. These RNAseq transcript fragments, along with 15,775 publicly available EST sequences and 138 publicly available cDNA sequences, were used as the basis for transcriptome annotation. Based on this work, the Mnemiopsis genome is predicted to contain 16,545 genes and 91,482 exons. The availability of these sequence data has already begun to benefit multiple scientific communities (i.e., marine, evolutionary, and developmental biologists) and has enabled us to answer some important questions regarding phylogenetic diversity and the evolution of proteins that play a fundamental role in metazoan development. Our initial analysis of the genome shows that many of the transcription factors and signaling pathway components present in other animal genomes are also present in the Mnemiopsis genome. However, several important developmental gene families present in bilaterians, cnidarians, and placozoans are conspicuously absent, arguing that these gene families likely originated after bilaterians, cnidarians, and placozoans. This relationship is consistent with most molecular-based phylogenies and supports the notion that ctenophores and sponges are the two earliest-branching animal lineages. Analysis of the gene content of these earliest metazoan groups is helping to redefine which components were required for the origin of such morphological complexity. Using the Mnemiopsis sequence data, we were able to identify a set of 76 homeobox-containing genes, then phylogenetically categorize this set into established gene families and classes. There is strong evidence that Mnemiopsis has homeodomains belonging to six of the 11 defined homeodomain classes, while missing five classes and several subclasses. Given that Trichoplax, Nematostella, and the bilaterians examined clearly possess all of these classes and subclasses, the most parsimonious animal tree would involve Ctenophora and Porifera branching off the main animal trunk PRIOR to the Placozoa, Cnidaria, and Bilateria. This led us to propose a new name for the latter group the ParaHoxozoa. We also determined that, based on lineage-specific paralog retention and average branch lengths, it is unlikely that these missing classes and subclasses are due to extensive gene loss or unusually high rates of evolution in Mnemiopsis. We have taken advantage of having this high-quality sequence data in-hand to investigate the evolution of additional protein families that play a critical role in human metabolic processes and development. First, with our collaborators at the University of Hawaii, we examined the Wnt/beta-catenin signaling pathway. Molecular phylogenetic analysis shows four distinct Wnt ligands, and most (but not all) components of the receptor and intracellular signaling pathways were detected. Notably absent in the Mnemiopsis geneome are most major secreted antagonists, which suggests that complex regulation of this secreted signaling pathway likely evolved later in animal evolution. With our collaborators at the Woods Hole Oceanographic Institute, we focused on nuclear receptors (NRs), which play key roles in the regulation of reproduction, development, and energetic homeostasis. Using phylogenomic approaches, we found that all ctenophore NRs lacked the highly conserved DNA-binding domain that has heretofore been characteristic of nuclear receptors. This may reflect an ancestral NR domain structure or a lineage-specific loss of this domain from an ancestral NR that contained the DNA-binding domain. Phylogenetic analyses of NRs support the idea that expansion of the NR superfamily occurred in a stepwise fashion. As an outgrowth of our studies on the homeodomain class of proteins, we have developed and continue to maintain the Homeodomain Resource, a curated collection of sequence, structure, interaction, genomic, and functional information on the homeodomain family (Moreland et al., 2009). The Resource is organized in a compact form and provides user-friendly interfaces for both querying and assembling customized datasets. The current release contains 1,536 full-length homeodomain-containing sequences from 31 distinct organisms, 107 experimentally-derived three-dimensional structures, 101 homeodomain protein-protein interactions, 122 homeodomain binding sites, 53 homeodomain proteins with documented allelic variants, and 186 homeodomain proteins implicated in human genetic disorders. The Homeodomain Resource is freely available at http://research.nhgri.nih.gov/homeodomain/.