I. Putative genomic characteristics of BRAF V600K versus V600E cutaneous melanoma Approximately, 50% of all cutaneous melanomas harbor activating BRAF V600 mutations, among these 10-30% carry the V600K mutation. Clinically, patients with V600K tumors experience distant metastases sooner and have an increased risk of relapse and shorter survival than patients with V600E tumors. Despite clinical and other histopathological differences between these BRAF tumor subtypes, little is known about them at the genomic level. We systematically compared BRAF V600E and V600K skin cutaneous melanoma (SKCM) samples from the Cancer Genome Atlas (TCGA) for differential protein, gene, and microRNA expression genome-wide using the Mann-Whitney U-test. Our analyses showed that elements of energy-metabolism and protein-translation pathways were up-regulated and that pro-apoptotic pathways were down-regulated in V600K tumors compared to V600E tumors. We found that c-Kit protein and KIT gene expressions were significantly higher in V600K tumors than in V600E tumors, concurrent with significant down-regulation of several KIT-targeting microRNAs (mir) including mir-222 in V600K tumors, suggesting KIT and mir-222 might key genomic contributors to observed clinical differences. The relationship that we uncovered among KIT/c-Kit expression, mir-222 expression, and growth and pro-survival signals in V600 tumors is intriguing. We believe that the observed clinical aggressiveness of V600K tumors compared to V600E tumors may be attributable to the increased energy-metabolism, protein-translation and pro-survival signals compared to V600E tumors. If confirmed using larger numbers of V600K tumors, our results may prove useful for designing clinical management and targeted chemotherapeutical interventions for BRAF V600K positive melanomas. Lastly, small sample size in V600K tumors is a major limitation of our study. II. Learning about tumor microenvironment using tumor sample gene expression and purity data The tumor microenvironment consists of the non-cancerous stromal cells present in and around a tumor; these include immune cells, fibroblasts, and cells that comprise supporting blood vessels and others. Tumor microenvironment plays an important role in tumor initiation, progression, and metastasis. Most genomic and genetic studies of cancer are carried out on tumor tissue samples that are heterogenous in nature. The Cancer Genome Atlas (TCGA) provided comprehensive datasets for gene expression, DNA methylation, protein expression, and clinical characteristics for more than 10,000 samples in more than 30 tumor types. Those studies provide valuable information about genomic changes in tumor samples compared to normal samples; however, teasing out cell-type-specific information from those heterogeneous samples remains a challenge. Computational methods directed at deconvolving cell-type-specific signals in heterogeneous tissue samples have also been developed. Ideally, one would want to deconvolve genomic signals (e.g., gene expression) from tumor samples into tumor-cell- and stromal-cell-specific signals. Unfortunately, to our knowledge, this effort has not yet been truly successful. We believe that the difficulty is in part due to heterogeneities within and between tumor samples. Currently, many computational methods try to tackle a simpler issue: estimating the proportion of tumor cells in a tumor sample (often referred to as tumor purity). Perhaps, the most well-known algorithm is ABSOLUTE, which uses copy number variation in tumor samples compared to normal samples to infer tumor purity and ploidy. ABSOLUTE, which is often considered as the gold standard for performance comparison, provided tumor purity values for many samples from the 11 TCGA tumor types. Besides copy number variation, methods that use DNA methylation data or expression data for a set of pre-selected stromal genes have also been developed to infer the composition of cell mixtures. Purity estimates by those methods appear to have a reasonable concordance. We have three projects related to understanding tumor microenvironment. First, we used a supervised machine learning tool, XGBoost (eXtreme Gradient Boosting), to select a subset of genes whose gene expression levels can predict tumor purity. XGBoost uses an optimized and distributed gradient boosting algorithm. We applied XGBoost to 11 TCGA tumor types for which both RNA-seq gene expression and ABSOLUTE tumor purity estimates were available. We carried two separate analyses, first for each tumor type separately and then for all tumor types combined. Across the 11 tumor types, the average correlation between observed and predicted tumor-purity ranged from 0.70 to 0.84 with small root mean square deviations, suggesting that tumor purity can be accurately predicted using expression levels of certain sets of genes. We further identified an eight-gene set (CSF2RB, CYTIP, GGT5, GLIPR1, IL16, IRF4, PECAM1, and RHOH) whose expression levels alone were predictive of tumor purity regardless of tumor type, suggesting that those genes might serve as biomarkers for tumor purity. Unlike existing methods, our method considered all genes as potential predictors. The genes that XGBoost selected represented genes expressed not only by stromal cells but also potentially by cancer cells. Our analyses identified commonalities and differences in tumor microenvironments among the 11 different tumor types. In the second project, we developed a deconvolution method, CDSeq, designed to estimate both sample-specific cell-type proportions and cell-type-specific expression profiles simultaneously using bulk RNA-seq data only. We modeled the observed expression data using multinomial distributions whose parameters reflect the unknown cell-type-specific expression profiles and sample-specific proportions. Also, we incorporated Poisson random variables to model the varying amounts of RNA from cell types of different sizes. Integrating these components, we built a Bayesian model that fully captures the stochastic nature of RNA-seq data. We used a Gibbs sampler for estimation and developed a strategy to automatically determine the number of cell types present. In the third project, we study the functional role of a long non-coding RNA (LINC00152) in tumorigenesis using in vitro cell culture model.