C5. PROTEOMICS CORE (Core #5) ISB is a world leader in the development and application of proteomics technologies which include tools to perform quantitative mass spectrometry (MS) on proteomes [1] [2] and sub proteomes [3-7], tools for analysis of protein post translational modifications [8, 9], and software for rapid evaluation and validation of proteomic datasets [10-12]. The Proteomics Facility has provided state of the art proteomics analysis to the ISB and to collaborators worldwide, as evidenced by over 140 proteomics papers in collaboration with multiple investigators since 2000. Many of these studies were made possible by the proteomics facility. Indeed, some of the emerging proteomics technologies, such as glycocapture (see below), had their development driven by the need to analyze serum samples from normal and cancer patients (Aebersold, personal communication). The NSBCC will include a Proteomics Core for implementing mass spectrometry-based technologies for obtaining accurate, comprehensive, and dynamic information about proteomes, subproteomes and individual proteins, and will play a particularly important role for Projects 1, 2, and 4. The Core will be housed in the ISB Proteomics Facility. The Facility is currently well-equipped with state of the art instrumentation (see Resources) and is managed by Dr. Dan Martin who oversees three full time employees. The tight integration of this facility with cancer-driven research projects will continue to promote high quality data acquisition and rapid implementation of new technologies. The following section describes mature proteomics technologies that have been integrated into the proteomics facility. C5.1 Mature Proteomics Infrastructure and Technologies implemented at the Proteomics Core C5.1.1. Multi dimensional separation of oroteins/peotides for increased proteome coverage. To increase cellular and blood proteome coverage from normal and cancer tissues, we have developed a high resolution two-dimensional (2D) method that exploits free flow electrophoresis-isoelectric focusing, reversedphase high pressure liquid chromatography (FFE-IEF/RP-HPLC) to fractionate complex mixtures of proteins and peptides based on isoelectric point (p/) and hydrophobicity, respectively [13] . It is well recognized that extensive fractionation of protein digests can increase the number of peptides that can be identified in complex samples by MS [14] [15]. In particular, fractionation can help to overcome duty cycle limitations of electrospray ionization-mass spectrometers during real time analyses. It can also aid in the identification of low-abundance peptides or peptides that ionize with poor efficiency by minimizing ion suppression effects of co-purifying peptides [16-18]. The separation capabilities of the 2D FFE-IEF/RP-HPLC system for cellular protein extracts and protein digests is complementary to many 2D gel-based systems but, unlike these systems, is not limited to sample loadability and is ideally suited for basic proteins. Further, since peptides fractionated by FFE-IEF are in liquid medium, they can be readily analyzed by MS for accurate mass determination or identified by MS/MS. We will use this system to create Annotated Peptide Databases for the rapid and accurate identification of peptides (see Creation of Annotated Peptide Databases below). C5.1.2. Qualitative protein complex characterization by mass soectrometrv. One of the primary services of the proteomics facility is to determine the composition of affinity purified protein complexes and to identify sites of post translational modification (PTM)[unreadable]analytic techniques relevant to comparing normal and neoplastic samples. We routinely use a number of affinity isolation strategies, including epitope tagging/immunopurification[19-24], for isolation of protein complexes. Samples prepared by ingel protease digestion or by solution digestion are analyzed in a high throughput mode by automated microcapillary reversed phase electrospray ionization tandem MS (DLC-ESI-MS/MS) with ion trap and quadrupole time of flight (QTOF) instruments. MS data is analyzed by a suite of software tools such as SEQUEST[25] for identifying proteins and PTMs, and ISB-developed PeptideProphet[11] and ProteinProphet[26] for statistical analysis and validation of protein identifications (see Open source software below). With regard to collaboration with the Computational Core at ISB, it is notable to mention that results from the iterative analysis of purified protein complexes are visualized as protein interaction maps using novel plugins for Cytoscape, a graphical network software (see Computation Core) C5.1.3. Protein expression profiling by mass soectrometrv-based quantitative oroteomics. The utility of standard MS datasets is limited by the fact that they are not quantitative and, importantly, the reproducibility of shotgun proteomics experiments can be as low as thirty percent [5]. The development of isotope coded affinity tagging reagents (ICAT)[1] ushered in a new era of mass spectrometry. Combined with the resolving power of multi dimensional chromatography and tandem MS, the use of ICAT reagents made it possible to routinely perform quantitative comparisons of the relative abundances of proteins in two or more complex samples. This technology is essential for performing global protein expression profiling experiments, and these datasets are indispensable for comprehensive network analysis and modeling. As an exemplar of ISB industrial partnerships, the ICAT technology has been commercialized by Applied Biosystems, Inc. (ABI). In fact, ABI has built upon the ICAT technology to develop amine reactive iTRAQ reagents. Since amine groups are more abundant than cysteine sulphydryl groups, relative to ICAT, iTRAQ enables quantitative analysis of more peptides thereby improving the accuracy of quantification. Importantly, the reagents come in four varieties that enable analysis of up to four differently labeled samples in a single MS experiment. Finally, the reagents are isobaric, and quantification is based on detection of reporter ions that are produced during MS/MS. This may improve detection and quantification of peptides because 1) the signal for a peptide may be increased since it is a combination of the signals from each sample being analyzed, 2) the MS spectrum is simplified due to the isobaric nature of the reagents, and 3) quantification does not rely on the reconstruction of ion chromatograms. Recent improvements to ICAT technology includes a solid phase approach for capturing and isotopically labeling cysteinyl peptides from complex mixtures [8]. The solid phase reagent contains a sulphydryl reactive group connected via an isotopic tag to a glass bead-attached photocleavable linker. The solidphase method for stable isotope tagging of peptides is relatively economical, simpler, more efficient, and more sensitive[8]. The solid phase reagent is being used for applications requiring greater sensitivity like protein complex analysis, and we will be exploring its utility for Project 1 and 4, as well as other internal ISB projects. C5.1.4. Identification of post translationally modified proteins. Post translational modification (PTM) of proteins plays a key role in the control of a wide range of biological functions and activities including cancer. Thus, development of technologies for the rapid and conclusive identification of PTMs is essential, and we have made significant contributions to the field with development of technologies for the isolation and identification of phosphopeptides [4], and glycopeptides [9], Direct in vivo determination of individual phosphorylation sites in proteins is difficult, typically requiring purification to homogeneity of the phosphoprotein of interest. Our approach consists of three steps: (1) selective phosphopeptide isolation from a peptide mixture via a series of chemical reactions, (2) phosphopeptide analysis by uLC-MS/MS, and (3) identification of the phosphoprotein and the phosphorylated residue(s) by sequence database searching. By incorporating stable isotope tags into this approach we can also detect quantitative changes in the phosphorylation state of proteins. We will certainly examine key diagnostic markers in the blood for changes in PTM as well as expression levels. The method for identifying N-linked glycosylation sites in proteins is based on the conjugation of glycoproteins to a solid support using hydrazide chemistry, stable isotope labeling of glycopeptides, and the specific release of formerly N-linked glycopeptides by peptide N-glycosidase F. The recovered peptides are then identified and quantified by MS/MS. This technology is also being used for reducing sample complexity before profiling peptide mixtures. Application of these technologies to detect changes in the modification state of proteins during biological processes will greatly enhance our understanding of these events. This technology will potentially be extremely useful for blood proteomics and early cancer detection (See Research Plan). C5.1.5. Open source software development. A significant bottleneck in proteomics experiments is in the analysis and validation of results. At ISB we have significantly reduced this bottleneck through development of a suite of computational tools for rapid analysis and validation of proteomics datasets. These tools include PeptideProphet [26], Protein Prophet [11], Xpress [4]and ASAPratio [12]. PeptideProphet permits statistical validation of peptide assignments to MS/MS spectra made by database search tools such as SEQUEST, and ProteinProphet uses a statistical model to infer identities of sample proteins given the list of identified peptides. Together these methods allow filtering of large-scale proteomics datasets, increase sensitivity and reduce false positive identification error rates. In addition, they provide a standard for publishing large-scale protein identification datasets in the literature and for comparing the results obtained from different experiments. Xpress and ASAPratio automatically calculate protein abundance ratios from data generated by stable-isotope tagging and tandem MS. ASAPratio also provides a statistical assessment of significant changes in protein abundance. Collectively, these tools have proven to be essential for rapidly and accurately analyzing largescale quantitative proteomics datasets. A complete list of ISB software can be accessed at: http://www.svstemsbioloav.org/Default.aspx? pagename=proteomicssoftware. We recently described the mapping of peptides derived from accurate interpretations of protein tandem MS data to eukaryotic genomes and the generation of an expandable resource for integration of data from many diverse proteomics experiments [27]. This publicly accessible resource, called Peptide Atlas, (http://www.peptideatlas.org/) will prove useful for verification and functional annotation of predicted genes and their protein products.