My long-term goal is to become an independent researcher to investigate and understand the functional impact and roles of genetic alterations in disease development of human cancer, to discover novel drug targets and to guide drug development, clinical trial and treatment plan. In the next few years, I will focus on somatic genome rearrangements in human cancer. With the decreasing sequencing cost, profiling the entire genome of a cancer patient may become routine in the near future for clinical diagnosis and personalized treatment planning. Identification of disease-causing single nucleotide mutations and copy number alterations from whole genome and exome sequencing has become a relatively mature process. However, structural variations (SVs), which constitute another important type of variation in the human genome and an important player to drive cancer, have not been extensively studied due to our inability to identify them with high sensitivity and specificity, and their role in tumor initiation, progression and metastasis are largely unknown. With the large- scale cancer genome sequencing data generated by the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC), it is possible to identify SVs with greater confidence and to better understand the functional impact of those events. Here, we propose to develop a set of sophisticated computational methods to analyze the functional impacts of somatic SVs in human cancer and infer tumor evolution based on their allele frequency, recurrence level, distribution of breakpoints, clonality, and associatios with other types of alterations. The TCGA and ICGC have produced 2,500 high coverage whole- genome sequencing tumor/normal pairs, which is the largest cancer genome sequencing data to date. We will use the high confidence somatic genome rearrangements produced by the consortium for the research in this application. In Aim 1, we will identify driver fusions based on recurrency and develop a novel method to detect activating fusions using RNA-seq data. We will also model the distribution of SV breakpoints by genomic features such as GC content, replication timing, and chromatin status to distinguish driver events. In Aim 2, we focus on the rearrangements involving enhancers and identify oncogenes being activated by relocations of distal enhancers. SV breakpoint distribution, gene expression level and chromatin interaction will be used to identify cancer-driving enhancer-gene pairs. Finally, in Aim 3, we will predict primary and secondary driver genes by differentiating clonal and sub-clonal SVs and study their role in tumor evolution by integrative analysis to incorporate single nucleotide mutations and copy number alteration with somatic SVs. I have extensive training in genetics, genomics, bioinformatics and statistics with a Ph.D. in genetics and a Master's degree in statistics from the University of Georgia. In the past five years, I have been studying genomic rearrangements in cancer at Harvard Medical School (HMS). In addition to its scientific aims, this application proposes a comprehensive training program including further training in clinical aspects of human cancer and leadership, communication and grant writing skills to prepare the candidate as an independent investigator in the fields of bioinformatics and cancer genomics.