Significant resources have been developed that include large amounts of microarray data, representing studies on both model organisms and humans. Many early studies incorporating microarray methods have been focused on identification of genes that are expressed at different levels in two conditions, ignoring potential confounding transcription from multiple regulation. This is a logical focus if the goal of the analysis is identification of biomarkers. However, in order to detect biological activity, it is necessary to obtain transcriptional signatures linked to processes rather than to conditions. Due to multiple regulation of the majority of genes and limited information concerning such multiple regulation, identification of transcriptional coregulation cannot be accomplished without significant mathematical modeling. The work outlined here will lead to an open-source, statistically powerful, and flexible algorithm for identification of transcriptional signatures that leverages existing biological knowledge available through pathway databases, gene ontology, and databases of gene regulation. The proposal consists of two specific aims. First, we will create a novel Markov chain Monte Carlo algorithm that can directly infer the activity of biological processes through the use of enrichment analysis. The algorithm will include swappable error models whose parameters are estimated during sampling. To the best of our knowledge, we are the first group to propose direct inference on biological processes within a mathematical framework allowing for multiple regulation. Second, we will encode the algorithm in a user friendly open-source tool and within the R language and as a GenePattern module. This work will provide an algorithm specifically designed to identify transcriptional signatures and changes in biological processes from noisy data using prior biological knowledge. While such data is now typical in microarray studies, it will soon exist in genotyping and proteomic studies as well. Our inclusion of a flexible, parameterized error model will make this algorithm useful in these emerging fields as well. In the future, we intend to focus our work on models of signaling networks in mammalian systems, relying on the results of this work to provide transcriptional signatures to guide inference on the these networks. This work has significant implications for the development of systems capable of utilizing the growing functional genomics data to infer the activity of specific biological processes, such as signaling networks and metabolic pathways. Such information is vital to understanding human disease and the response to therapy, especially with new molecularly targeted therapeutics.