Computational methods have become intrinsic to biomedical research. The overall goal is to provide Dr. Ka Yee Yeung (Ph.D. in Computer Science) with mentored training and research experience to transition into an independent multi-disciplinary investigator in biomedical research. A program of mentored research, academic coursework, and research plan has been designed for this purpose. The mentored research component consists of mentors and an advisory committee who are leading experts in molecular biology, proteomics, medical research, bioinformatics and statistics. The academic coursework component will provide Dr. Yeung with a solid background in molecular biology, cancer biology and statistics. The underlying theme of the research plan is development of methods and software tools to facilitate extraction of biological meanings from high throughput data in cancer and disease investigation. The major goals of our research plan are the following: Specific Aim 1: Development of improved algorithms for class prediction and identification of gene markers on microarray data related to Hepatocellular carcinoma (HCC) and Hepatitis C virus (HCV) associated liver disease. The problems of predicting the diagnostic or prognostic category of a given tissue sample (class prediction) and identifying potential gene markers from microarray data have received a lot of attention. We will develop improved algorithms for class prediction and identification of potential gene markers by taking advantage of variability over repeated measurements in microarray data. Specific Aim 2: Development of class prediction and class discovery algorithms on heterogeneous data. We will build on our previous work in cluster analysis and class prediction to develop algorithms to handle data from multiple sources, including microarray data, proteomics data and clinical data. Specific Aim 3: Development of improved visualization tools. Software tools for visualization will be developed to facilitate biologists to utilize their biological knowledge and to interpret computational results from high throughput data. Specific Aim 4: Development of practical guidelines for cluster analysis on microarray data. We will make use of our in-house database consisting of thousands of microarray experiments to conduct empirical studies to develop practical guidelines for cluster analysis.