Mass spectrometry (MS) has the promise to provide a noninvasive screening mechanism on easily accessible fluids such as plasma, serum, and urine. The characterization of peptides in these biological fluids is one of the promising strategies for biomarker discovery. However, peptide profiles obtained through current mass spectrometric methods are characterized by their high dimensionality and complex patterns with substantial amount of noise. The presence of biological variability and disease heterogeneity in human samples from diverse populations adds to the complexity of the problem. Thus, in addition to innovative analytical methods desired for sample preparation, peptide identification, and validation, robust computational methods are needed for optimal selection of useful peptidic markers. This collaborative project brings together experts in bioinformatics, biostatistics, proteomics, and mass spectrometry to develop analytical tools that address the above challenges. The specific aims are the following: (1) To develop fuzzy logic based methods to detect and calibrate MS peaks. Our peak detection method will identify peaks in a way that is consistent with peaks detected manually by MS experts. Peaks will be calibrated to accommodate isotopic distributions and machine drifts. (2) To investigate machine learning- based peak selection methods that take into account biological variability and disease heterogeneity of the human population. Spike-in and simulation studies will be conducted to obtain spectra whose true inputs are known. The spectra from these studies will be used to optimize our peak detection/calibration and selection methods, and compare the methods with other existing solutions. The optimized analytical tools will be applied to find and validate markers that detect hepatocellular carcinoma (HCC) at a treatable stage. Serum samples collected from cirrhotic and HCC patients as well as healthy controls in Egypt, United States, and Thailand will be used in this study. Mass spectra will be generated using matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) MS of enriched low molecular weight (LMW) serum fractions of the samples. From these spectra, the most useful panel of peaks will be identified using the proposed peak detection, calibration, and selection methods. The selected peaks will be sequenced to identify the peptides they represent. Finally, the identity of the peptides and their ability to detect HCC will be examined using isotope dilution by synthesizing 13C-labeled peptide standards. The synergetic interaction of diverse disciplines contributes to the intellectual merit of this project, leading to analytical tools that will make scientific knowledge discovery more efficient. Analytical tools developed in this project will be useful for other biomarker discovery studies, where the analysis of high-dimensional mass spectral data is needed. The tools will be freely available (open source) to mass spectrometry users. PUBLIC HEALTH RELEVANCE: Development of a diagnostic test would be of great benefit for detection of hepatocellular carcinoma (HCC) at a treatable stage. In particular, defining clinically applicable biomarkers that detect early-stage HCC in a high-risk population of cirrhotic patients has potentially far-reaching consequences for disease management and patient health. This project is important because most HCC patients present with advanced-stage disease and poor prognosis. There is a pressing need to identify biomarkers of HCC that could be used for early detection and more accurate classification of disease. This project will lead to the development of analytical tools to find and validate early-diagnosis candidate peptide biomarkers from high- dimensional MALDI-TOF spectra of low-molecular-weight serum fractions. In addition to screening high-risk populations for early signs of disease, the resulting biomarkers could be used to design and test improved treatment strategies.