This research program focuses on the use of phylogenetic and comparative genomic techniques to study developmental proteins that play a fundamental role in the specification of body plan, pattern formation, and cell fate determination during metazoan development. Our group uses a variety of bioinformatic approaches to understand the evolution and function of these proteins and their ultimate role in human disease. Our focus is on the analysis of the genomes of early branching metazoan phyla in an effort to better-understand the relationship between genomic and morphological complexity, as well as the molecular basis for the evolution of novel cell types. Thematically, our current research interests are centered on probing the interface between genomics and developmental biology and conducting comparative, genomics-based research with an evolutionary point of view, themes elucidated in NHGRI's most recent document outlining a vision for the future of genomic research. Until recently, only three of the four non-bilaterian metazoan lineages (Porifera, Placozoa, and Cnidaria) had at least one species whose genome had been sequenced. Ctenophora (the comb jellies) remained as the last non-bilaterian animal phylum without a sequenced genome, and its phylogenetic position remained uncertain. With the goal of understanding the molecular innovations that drove the outbreak of diversity and increasing complexity in the early evolution of animals, we sequenced, assembled, annotated, and analyzed the 150-megabase genome of the ctenophore, Mnemiopsis leidyi (Ryan et al., 2013). By addressing the void in the availability of high-quality, genome-scale sequence data in a critical part of the evolutionary tree, we were able to bring resolution to the question of the phylogenetic position of the ctenophores, with the results of our phylogenomic analyses strongly suggesting that ctenophores are the sister group to all other animals. Based on analyses of gene content, our results also suggest that neural and mesodermal cell types were either lost in Porifera and Placozoa or that (to some extent) these cell types evolved independently in the ctenophore lineage. These findings challenge long-held ideas regarding not only the phylogenetic position of the ctenophores, but of the evolution of the aforementioned cell types as well. The sequence data generated in the course of this project are available through GenBank, and we continue to add to the collection of additional comprehensive genomic information available through our Mnemiopsis Genome Project Portal (http://research.nhgri.nih.gov/mnemiopsis; Moreland et al., 2014) as new data becomes available. The availability of these sequence data has already begun to benefit multiple scientific communities (i.e., marine, evolutionary, and developmental biologists) and has enabled us to answer some important questions regarding phylogenetic diversity and the evolution of regulatory mechanisms that play a fundamental role in metazoan development. Previously, we demonstrated the vital role that microRNAs play in the regulation of gene expression by analyzing short RNA sequencing data and additional data from the assembled Mnemiopsis genome; in that study, we were able to show that this species appears to lack any recognizable microRNAs, as well as the nuclear proteins Drosha and Pasha, which are critical to canonical microRNA biogenesis (Maxwell et al., 2012). Building on this knowledgebase, we sought to better-understand how genomic variants alter miRNA regulation by modifying miRNA target sites; this question is of particular importance since multiple human disease phenotypes have been linked to such miRNA target site variants (miR-TSVs). However, systematic genome-wide identification of functional miR-TSVs is difficult due to high false positive rates; functional miRNA recognition sequences can be as short as six nucleotides, with the human genome encoding thousands of miRNAs. Furthermore, while large-scale clinical genomic data sets are becoming increasingly commonplace, existing miR-TSV prediction methods are not designed to analyze these data. To fill this gap, we developed and released an open-source tool called SubmiRine that is designed to perform efficient miR-TSV prediction systematically on variants identified in novel clinical genomic data sets (Maxwell et al., 2015). Most importantly, SubmiRine allows for the prioritization of predicted miR-TSVs according to their relative probability of being functional. SubmiRine was tested using integrated clinical genomic data from a large-scale cohort study on chronic obstructive pulmonary disease (COPD), identifying a number of high-scoring, novel miR-TSV predictions. We also demonstrated SubmiRine's ability to predict and prioritize known miR-TSVs that have undergone experimental validation in previous studies. Since non-bilaterians contain a surprisingly high number of human disease gene homologs within their genomes despite their evolutionarily distant position with respect to humans, some of our most recent work has focused on the enticing proposition that these early branching animals be used as 'emerging model organisms' in the context of human disease research. We used a comparative genomics approach encompassing a broad phylogenetic range of animals with sequenced genomes to determine the evolutionary patterns exhibited by human genes associated with different disease classes (Maxwell et al., 2014). Our results support previous claims that most human disease genes are of ancient origin but, more importantly, we also demonstrate that several specific disease classes have a significantly large proportion of genes that emerged relatively recently within the metazoans and/or the vertebrates. An independent assessment of the synonymous to non-synonymous substitution rates of human disease genes found in mammals reveals that disease classes that arose more recently also display unexpected rates of purifying selection between their mammalian and human counterparts. Our results reveal the heterogeneity underlying the evolutionary origins of (and selective pressures on) different classes of human disease genes. For example, some disease gene classes appear to be of uncommonly recent origin (specifically, vertebrate-specific genes) and, as a whole, have been evolving at a faster rate within mammals than the majority of disease classes having more ancient origins. The novel patterns that we have identified may provide new insight into cases where studies using traditional animal models were unable to produce results that translated to humans. Conversely, we note that the larger set of disease classes do have ancient origins, supporting the proposition that non-bilaterian animals have the potential to serve as viable models for studying various important classes of human diseases. Taken together, these findings emphasize why model organism selection should be done on a disease-by-disease basis, with evolutionary profiles in mind. Our current work continues to focus on how these early branching animals can be used in the context of human disease research. We are now leading an international effort to sequence two Hydractinia species. The regenerative abilities of these hydrozoan cnidarians make them excellent models for the study of key questions related to pluripotency, allorecognition, and stem cell biology, work that will be significantly advanced by the availability of high-quality whole-genome sequencing data from these organisms. As with the Mnemiopsis whole-genome sequencing project described above, we intend to create a new Web portal allowing for easy access to sequencing and annotation data generated in the course of this project.