The development of the heart and body wall muscles of the Drosophila embryo represents a powerful model for analyzing transcriptional regulatory networks and the molecular mechanisms of organ formation. The Laboratory of Developmental Systems Biology investigates these problems using an interdisciplinary approach that combines genetics, genomics, biochemical, molecular, cellular and computational methods, thereby providing a systems-level understanding of organogenesis. Prior work from our own and other laboratories established that various combinations of tissue-restricted and signal-activated transcription factors (TFs) function to direct unique patterns of gene expression during cellular specification and differentiation. In particular, we showed that a specific three-way combination of TFs serves as a code for a subset of genes expressed in muscle founder cells (FCs). However, since the diversity of FC genetic programs cannot be fully accounted for by this TF code, additional TFs must contribute to the complexity of gene expression that occurs in individual FCs. We are employing multiple strategies to uncover such TFs in FCs and other mesodermal cell types. One candidate family of cellular identity-conferring TFs bind to DNA through the highly conserved homeodomain (HD). A key challenge in understanding the distinct functions of different HD proteins in a variety of cell types lies in the relatively generic sequence specificity of this DNA binding domain. Two mechanisms have previously been identified for how such specificity can be achieved: cooperative binding between HD proteins and cofactors that alter sequence selectivity, and interaction between HDs and collaborating TFs that influence gene expression outputs. Together with Dr. Martha Bulyk's laboratory at Harvard Medical School, we have recently discovered that the autonomous binding of different HDs to preferred sequence motifs provides a third, previously unrecognized mechanism underlying HD functional specificity. We first determined the binding preferences of a set of muscle HD factors using protein binding microarray technology. Next, we identified occurrences of such preferred binding sites in known muscle FC enhancers. Finally, by targeted mutagenesis of the preferred motifs followed by transgenic reporter assays of the resulting constructs, we validated that these sites can mediate context-dependent transcriptional activation or repression by the same HD TF. Given the nature of the target genes, such regulation appears to occur at both proximal and distal nodes in the myogenic regulatory network. Moreover, the observed cis effects are mirrored in trans when a Drosophila strain with a loss-of-function mutation in the HD TF of interest is examined. Since HD proteins have widespread and evolutionarily diverse roles, our novel finding that discrete HD binding preferences can confer regulatory specificity in Drosophila myoblasts should be highly informative to further studies of cellular programming by HDs, and possibly additional TFs, in other developmental contexts. To extend our combinatorial model for FC-specific gene regulation, we have collaborated with Dr. Ivan Ovcharenko's group at NCBI to construct a computational classifier that characterizes known FC cis-regulatory modules (CRMs) based on the presence of various sequence features,including validated and putative TF binding sites, and that is capable of distinguishing CRMs of interest from unrelated noncoding regions. The first step in implementing this machine learning approach was to increase the size of the training set of FC CRMs by incorporating related elements from other Drosophila species based on defined levels of sequence similarity. In vivo reporter assays established that many of the predicted orthologous CRMs are indeed functional when tested in D. melanogaster. By training and testing the classifier on the available FC CRMs using a cross-validation strategy, we determined that it performs with high sensitivity and specificity. When the FC CRM classifier was run genome-wide, the top-scoring CRM predictions were highly enriched for regions associated with both known and subsequently validated FC genes. Many predicted FC CRMs also had the expected enhancer activity when tested in transgenic reporter assays. Finally, a number of the sequence motifs that were identified by the classifier are recognized by TFs having known FC regulatory activity. Work is in progress to assess the functions of novel motifs uncovered by the classifier as candidate myogenic regulators. We have applied a similar machine learning strategy to analyze CRMs that are active in subsets of cardiac cells. Experimental validation of predicted enhancers and TF binding motifs is also being undertaken in this system. One sequence motif discovered in this manner is capable of binding forkhead (Fkh) domain TFs. In one enhancer, this Fkh motif functions as a repressor binding site in both cardial cells (CCs) and pericardial cells (PCs) of the heart. Interestingly, the same motif also binds a repressor of this enhancer in somatic mesoderm-derived fusion-competent myoblasts (FCMs), whereas in visceral muscle cells this motif binds an activator belonging to the same TF family. These results suggest that different tissue-specific Fkh TFs mediate distinct gene expression responses through the same binding sites in a single enhancer, and support a role for Fkh TFs in determining the unique genetic programs that characterize different subtypes of mesodermal cells. The transcriptional codes responsible for FCM gene regulation are less well-characterized than those governing FC gene expression. We are searching for biologically relevant combinations of FCM regulators by determining the DNA binding specificities of TFs known to be expressed in FCMs, scanning validated and candidate FCM enhancers for occurrences of these sequence motifs, and then testing their in vivo cis-regulatory functions. In this way, we have identified two classes of TFs whose binding sites are critical for the full activity of at least one FCM enhancer. The generalizability of these findings are currently under investigation. In addition to studying how TFs orchestrate gene regulatory networks, we are characterizing the developmental functions of downstream pathways that are initiated by these proteins. Using RNAi and classical genetic methods, we found that two Fkh TFs are essential for proper morphogenesis of the Drosophila heart. In the absence of either of these TFs, the heart exhibits altered numbers and misalignment of CCs. This phenotype is due to abnormalities of both symmetric and asymmetric cell division, as well as to defective cytokinesis. Preliminary results suggest that the Fkh TFs may exert their effects on heart development through a kinase that is a known regulator of mitosis. Additional experiments are in progress to test this hypothesis. Collectively, the above studies provide new insights into the transcriptional codes that control muscle and heart gene expression, and into the specific developmental roles played by individual TFs that specify cellular identity and that control subsequent differentiation. These investigations also serve as an instructive experimental paradigm for investigating related questions in other biological systems where such knowledge can have a direct impact on the development of effective strategies for cell-based therapies and related approaches to regenerative medicine. Our findings also have implications for understanding the molecular pathways that are perturbed in human congenital heart disease.