Transcriptional modules (TM) are groups of co-regulated genes along with transcriptional factors regulating their expression. Identifying TMs based on experimental data and genomic regulatory sequences is an important and difficult problem in biomedicine. The data that can be used in reconstructing TMs comes from genome-wide gene expression profiling experiments, whole genome transcription factor binding experiments, sequences of experimentally established DNA regulatory motifs and sequences of gene regulatory regions. Benefits of using all available types of data in the process of identifying and characterizing TMs have been demonstrated in numerous studies. While precise probabilistic models generally do exist for analyzing different data types separately, unifying models for all available data types are scarce. Computational methods currently available to biomedical researchers are inadequate either due to the lack of appropriate computational tools, or due to inadequacies of underlying mathematical framework. Furthermore, protocols for establishing relative benefits of different strategies for joint modeling of different data types are non-existent. This leaves biomedical researchers without means to make an informed decision when choosing the optimal data analysis approach. We propose to develop Infinite Transcriptional Modules (ITM) framework consisting of a novel probabilistic model and related computational tools for identifying transcriptional modules by jointly modeling gene expression and regulatory data. The unifying probabilistic model will utilize the Infinite Mixtures Model mechanism for averaging over models with different number of modules and thus circumvent the problem of estimating the "correct" number of modules. Each different data type will be modeled separately within different context of a Context Specific Infinite Mixture Model. Such modular approach will facilitate the use of the most appropriate probabilistic models for representing different types of data. Our intention is not to develop new models and analytical approaches for different data types. Instead, we will focus on developing a principled probabilistic framework for integrating currently available state of the art models for individual data types. We hypothesize that our unifying modeling approach will result in significantly higher precision of identified transcriptional modules than it would be achieved by either separately analyzing different data types, or by applying currently available algorithms for joint analysis. We also expect that the posterior distribution of co-membership in a TM, based on our model, will offer credible assessment of statistical significance of identified TMs. Using real world data;we will construct datasets and protocols for objectively comparing key performance aspects of different methods for TM reconstruction.