The majority of cancer presents as a complex phenotype and is manifest through gene-gene, and/or gene-environment interactions. An ideal paradigm for the investigation of complex cancer phenotypes in humans is primary hepatocellular carcinoma (HCC). Molecular studies of genetic alterations in tumors have identified p53 as a tumor suppressor gene commonly altered in HCC. Epidemiologic studies have firmly established the role of chronic hepatitis B virus infection (HBV) and aflatoxin B1 (AFB1) exposure as environmental risk factors. However, the majority of individuals exposed to HBV and AFB1 do not develop HCC. Genetic analysis is being used to assess the role of genes in well-described pathways in determining primary hepatocellular carcinoma (HCC). This approach merges gene mapping and candidate locus studies by including as candidates all the members of a pathway. Each gene of interest is "tagged" with multiple polymorphic sites, in or near it, to identify genetic factors modulating the risk of developing HCC among populations exposed to AFB1. The individual members of each family (GSTA1, GSTM1, GSTM3, GSTP, GSTT1, GST12, EPHX1, EPHX2, GSTA4, GSTT2, GSTZ1, STP, COMT, ESD, DTD, CYP, MGST1) have been tagged with new or published polymorphisms, and their role in HCC risk examined, in a nested case-control population. The loci GSTM1, GSTP, GSTT1, EPHX1 showed significant association with HCC risk while the EPHX2 locus was associated with age of onset. When results were stratified by the HBV status of the case, GSTM1 and GSTT1 were associated only in the HBV(+) cases, while GSTP was associated in the HBV(-) cases. These results indicate that these genes are candidates for more detailed functional and genetic analysis. Candidate gene variation at the 15 candidate cancer susceptibility loci are currently being examined in a large case-control study (n=550 cases and 550 controls). Genetic information in complex trait analysis may be accessible from the joint study of heritable variation and somatic (tumor) variation in cancer. HCC tumor/normal pairs were examined using a collection of genome-wide simple tandem repeat polymorphism (STRP) markers, candidate loci, and the 1,300 single nucleotide polymorphisms (SNPs) present on the Affymetrix HuSNP chip. This data was analyzed to identify regions of loss of heterozygosity (LOH), and was correlated with gene expression data collected from the same samples using Affymetrix HG-U95A chips containing 12,000 characterized genes. More than 16 LOH signatures of HCC were generated across 22 chromosomes. We found that the number of cancer genes (tumor genes and tumor suppressor genes) was significantly higher in regions of LOH relative to regions of non-LOH. In addition, through phylogeny reconstruction studies we demonstrated that these LOH signatures correlate significantly with gene expression results; and identified two LOH signatures, 4q13.3 and 17q11.2 that may be important in generating the HCC LOH signature. This study has now been expanded to include expression data using the Affymetrix HG-U133 chips ( 45,000 probe sets) and SNP data for refining the regions of LOH using the Affymetrix Mapping 10K Array (10,000 SNPS). Data has also been generated to investigate the relationship of chromosome copy number and loss of heterozygosity using in-house algorithms and the Affymetrix CCNT tool. Additional experiments are being carried out using the latest Affymetrix SNP6.0 arrays. Each single SNP Array 6.0 has over 1.8 million total markers for genetic variation (including more than 900,000 SNPs and more than 940,000 copy number probes) for genetic analysis. Using Affymetrix SNP6.0 arrays, we generated genotyping data from 550 cases and 550 controls. In addition, there are 20 pairs of tumor/normal liver tissues analyzed on the same platform. The estimated total number of genotypes is 1.1 billion. Currently. We will be performing the following analysises that involves an iterative process of querying the database, storing the results and refining the query based on the analytical results. These analysis include: a) Odds-ratio of case and control to identify disease association SNPs. We will probably include only SNPs with a high call rate (for example greater than or equal to 85% of the samples have genotype calls; minimum allele frequency exceeds 10%; genotype quality exceeds certain threshold) in this query. This query is likely to be performed across all 1.1 billion genotype rows. b) Identify genes underlying the high-association SNPs; obtain genotypes in these genes to construct haplotypes and LD bin for more structured analysis like haplotype clad. c) Identify allelic-interaction across SNPs. This would require evaluation of genetic risk using multiple SNPs as a single-unit for risk assessment. d) Test and validation analysis. The initial association study will be performed using 2/3 of the samples as training data to identify disease association variations. The results will be tested in the remaining 1/3 of samples. The samples serving as the test and validation shall have the comparable genetic profile and clinical features. The test and validation can be performed using single SNP or multi-SNP as a disease predictor. e) For tumor/normal paired liver tissues, we will identify genetic abnormality including loss-heterozygosity and copy-number variation. This data will be compared against the expression data that we have generated to evaluate the correlation between genetic alteration and expression change. Data collected on gene expression, candidate loci, and somatic allele loss will be integrated via hierarchical clustering of expression data, and correlation of the resulting clusters with variation at candidate susceptibility loci. This information will be used to develop, test, and validate laboratory strategies for pathway models of the cancer/normal cell. To validate the pathways model, siRNAs experiments will be carried out to knock down targeted genes involved in these pathways.