We are working on several proteomics bioinformatics tools: a project to develop an open-source proteomics expression data mining tool set (Open2Dprot) for n-dimensional protein data from various sources (2D PAGE gels, 2D LC-MS, etc); an improved method for visually comparing 2D gel protein samples (Flicker) and making putative spot identifications; a virtual protein visualization tool (ProtPlot) for investigating CGAP expression data. Open2Dprot - the open 2D proteomics expression n-dimensional data analysis project This is a new effort. Open2Dprot is a fully open-source community proteomics project and Web site for developing tools for analyzing protein expression from a variety of protein separation methods. Data is analyszed through ananalysis pipeline that includes: accession of sample data, spot detection in samples (using image segmentation, peak clustering or other methods), spot pairing with respect to a reference sample, assembly of paired spot data into a composite spot database stored in a relational database, and data mining tools on subsets using both Java and R language analyses. Data sources will include: 2D-PAGE (polyacrylamide gel electrophoresis), 2D LC-MS, protein arrays, etc. Because different separation sources requiredifferent analysis methods, the pipeline is being designed so that particularsteps in the pipeline could be assigned dynamically depending on the datasource type. We are working with international groups developing proteomics bioinformatics database schema standards for experiment information and protein expression for these types of sample separations (MIAPE). Our goal is to work towards developing tools compatible with these developing proteomics database standards so data and analysis software could be exchanged. As an open-source tool set, it will be able to incorporate new academic (and other) advances in quantitative and qualitative protein expression analysis - including interaction with various Internet proteomic databases and system biology databases. A Web site has been set up at http://open2dprot.sourceforge.net/that goes into more details. As parts of the pipeline become usable, they are madeavailable on the Web site (see Subprojects) on the web site. Currently thespot segmenter and spot pairing programs are available. These programs are written in Java and run on most computers. Flicker interactive comparison of biological samples across the Internet This effort is a minor update to a previous subproject, the Flicker program Lemkin PF, Thornwall G (1999) Mol Biotechnol. 12(2): 159-172. Scientists around the world often work on similar data so the need to share results and compare data arises periodically. Flicker is a Javacomputer-implementation of a flicker-comparison method for comparing 2D gel image data of two similar samples created in different laboratories to help putatively identify protein spots in user's 2D gels. This is done indirectly by matching the user's gels with a reference map gel lookup for selected spots at federated reference gel servers such as at SWISS-2DPAGE and subsequently usingthat data to access the PIR UniProt, iProClass, and iProLink servers. Usersmay simply add directories of their own images to compare. The Flicker program ncludes zoom, brightness-contrast, image warping and other image transforms to help make the images more comparable. Users may create lists of spots on each of two gels and then manually pair the gels after flicker comparison. If one of these gels is a Web reference gel (such as as SWISS-2DPAGE, it can automatically lookup the protein ids and names on the spots they selected. The program is available at http://open2dprot.sourceforge.net/Flicker. Flicker is written in Java and runson most computers. ProtPlot data mining tool for virtual protein expression patterns ProtPlot is an open-source Java-based data-mining software tool for virtual 2D gels as part of the TMAP (Tissue Molecular Anatomy Program) Medhedad D, et.al., (2003) Proteomics 3(8): 1445-1453. It may be downloaded from http://tmap.sourceforge.net/ and run as a stand-alone application on your computer. Its exploratory data analysis environment provides tools for the data-mining of quantified virtual 2D gel (pIe, Mw, expression) data of estimated expression from the CGAP EST mRNA tissue expression database. This lets you look at the aggregated data in new ways: for example, which estimated "proteins" are in a specified range of (pI,Mw)? Or which sets of estimated "proteins" are up or down regulated or missing between cancer samples and normal samples? Which sets or "proteins" cluster together across different types of cancers or normals? Here, one may aggregate several different normal and several different cancers as well as specify other filtering criteria.As is well known, mRNA expression generally does not correlate well with protein expression as seen in 2D-PAGE gels (Ideker et.al., Science 292: 929-934, 2001). However, some new insights may occur by viewing the transcription data in the protein domain. If actual protein expression data is available for some of these tissues, it might be useful to compare mRNA estimated expression and actual protein expression. This tool may helps find those proteins with similar expression and those that have quite different expression. This might be useful in thinking about new hypotheses for protein post-modifications or mRNA post-transcription processing.ProtPlot generates an interactive virtual protein 2D-gel Map scatterplot based on a database of derived maximum EST expression over a variety of tissue types from data obtained from the NCI-NCBI CGAP EST database of human cancer, precancer and cancer mRNA expression (CGAP is the NCI's Cancer Genome Anatomy Project http://cgap.nci.nih.gov/. EST is the Expressed Sequence Tag of a mRNA found in particular tissues). The EST hit rate is a rough estimate of gene expression. These ESTs were mapped to SWISS-PROT (expasy.org) accession numbers and Ids, the Mw and pI estimates were computed and used as estimates for corresponding proteins in a pseudo 2D-gel.