My group continued to develop and apply computational methods that utilize and integrate large data sets to study gene regulation diseases with main focus on cancer. We also develop method to study new emerging datasets including HT-SELEX data and single cell expression data. Large data sets provide important window on human diseases (1). Within this general area, the main focus of my group is on developing new computational methods allowing to utilize large datasets (e.g. TCGA and ICGC) to obtain insists into etiology of cancer. In particular, we hypothesized that exploring the interplay between co-occurrence, mutual exclusivity, and functional interactions between genes will further improve our understanding of the disease and help to uncover new relations between cancer driving genes and pathways. Following our previous method to identify mutual exclusivity (2), we designed a general framework, BeWith, for identifying gene modules with different combinations of mutation and interaction patterns (3). We formulated the BeWith framework using Integer Linear Programming (ILP), enabling us to find optimally scoring sets of modules. Our results demonstrate the utility of BeWith in providing novel information about mutational patterns, driver genes, and pathways. In particular, our approach identified functionally coherent modules that might be relevant for cancer progression. In addition to finding previously well-known drivers, the identified modules pointed to other novel findings such as the interaction between NCOR2 and NCOA3 in breast cancer. Additionally, an application of the of the method revealed that gene groups differ with respect to their vulnerability to different mutagenic processes, and helped us to uncover pairs of genes with potentially synergistic effects, including a potential synergy between mutations in TP53 and the metastasis related DCC gene. Overall, BeWith not only helped us uncover relations between potential driver genes and pathways, but also provided additional insights on patterns of the mutational landscape, going beyond cancer driving mutations. This results has been selected for a podium presentation at the premier conference in Computational Biology conference RECOMB 2017and the final results of this study have been recently polished in PloS Comput Biol. (3). Along this same line of research, recognizing that in addition to the mutations that confer a growth advantage, cancer genomes accumulate a large number of somatic mutations resulting from normal DNA damage and repair processes as well as carcinogenic exposures or cancer related aberrations of DNA maintenance machinery, we developed software to decompose a cancer genome's mutation catalog into mutational signatures. In our recent paper 4 we proposed two complementary ways of measuring confidence and stability of decomposition results and applied them to analyze mutational signatures in breast cancer genomes. We identified both very stable and highly unstable signatures, as well as signatures that previously have not been associated with breast cancer. We also provided additional support for the novel signatures. Our results emphasize the importance of assessing the confidence and stability of inferred signature contributions (4). With respect to our research on gene regulation, we continued our long-standing collaboration with Brian Oliver's group on gene regulation in Drosophila. We worked on a further development of our novel network inference method NetREX, that given a context-agnostic network as a prior and context-specific expression data (e.g. expression in an adult fly), constructs a context-specific GRN by rewiring the prior network (the paper reporting preliminary results obtained the awarded the best paper award on prestigious RECOMB 2017 conference). We are preparing for publication a paper describing further improvements and applications of this method. As a part of the same collaboration we studied the impact of the impact of Drosophila melanogaster deletions on gene expression profiles. Specifically we asked whether increased expression variability owing to reduced gene dose might underlie this phenotypic heterogeneity. Indeed, we found that one-dose genes have higher gene expression variability relative to two-dose genes (5). These results can help explaining why DNA copy number variation is associated with many high phenotypic heterogeneity disorders. We also continued developing computational methods for analysis of HT-SELEX derived Aptamers. Among our recent contributions is is AptaSuite, a full-featured, open source, and platform-independent software collection for the comprehensive analysis of HT-SELEX experiments (6). In stark contrast to previous methods, each implementing their individual and frequently rudimentary data workflow, AptaSuite provides a unified and robust framework for managing aptamer-related data and leverages this framework to serve the required data in a standardized manner to any particular algorithm built with the software. In its core, AptaSuite consists of a collection of carefully designed APIs (application programming interfaces) and corresponding reference implementations for facilitating input, output, and manipulation of aptamer data (such as sequences, aptamer counts in individual selection cycles, structure information, and more). We have recently applied our tools in a collaborative project to identify Aptamer for Ebolavirus Secreted Protein (7) and other applications (8). One of the important applications of aptamer is related to RNA-based therapeutics. The main idea is to utilize synthetic RNA molecules to alter cellular functions. The success of this approach hinges on the ability to selectively deliver and internalize therapeutic RNAs into cells of interest. Cell internalizing RNA aptamers selected against surface receptors and discriminatively expressed on target cells hold particular promise as suitable candidates for such delivery agents. Specifically, these aptamers can be combined with a therapeutic cargo and facilitate internalization of the cargo into the cell of interest. This requires developing an approach design such aptamer-cargo complex. In order to address this challenge, we have developed AptaBlocks - a computational method to design RNA complexes that hybridize via sticky bridges. This results was selected for a podium presentation at RECOMB 2018 and published in Nucleic Acids Research (9). The approach has been very successful in partial applications (10) Finally, we are also continuing our long standing collaboration with David Levens group focusing on the role of DNA conformational dynamics in gene regulation (11). Our recent interest is in understanding the impact of energy source on cells ability to perform DNA transactions, however these studies are still in an early stage. We are also involved in other collaborative research (12).