We have been working on several proteomics bioinformatics tools: a project to develop an open-source proteomics expression data mining tool set (Open2Dprot) for n-dimensional protein data from various sources (2D PAGE gels, 2D LC-MS, protein arrays, etc); an improved method for visually comparing 2D gel protein samples (Flicker program) and making putative spot identifications with enhancements for characterizing nanoparticles; we recently started investigating subtle properties of 2D DIGE images (CmpDIGE program) that may help us characterize these changes, and then model and identify post-translational protein modifications of relatively small size. <H3>Open2Dprot - the open 2D proteomics expression n-dimensional data analysis project</H3> Open2Dprot is a fully open-source community proteomics project and Web site for developing tools for analyzing protein expression data from a variety of protein separation methods. Data is analyzed through an analysis pipeline that includes: accession of sample data, spot detection in samples (using image segmentation, peak clustering or other methods), registration between samples where required, spot pairing with respect to a reference sample where required, assembly of paired spot data into a composite spot database stored in a relational database, and data mining tools on subsets using both Java and R language analyses. All data files are now fully XML compatible. We have completed the initial design of the Open2Dprot pipeline data analyzer and pipeline process scheduler and have implemented a major portion of the code using dynamic methods. Data sources will include: 2D-PAGE (polyacrylamide gel electrophoresis), 2D LC-MS, protein arrays, etc. Because different separation sources require different analysis methods, the pipeline is designed so that particular steps in the pipeline can be assigned dynamically depending on the data source type. We are working with international groups developing proteomics bioinformatics database schema standards for experiment information and protein expression for these types of sample separations (MIAPE/GelML XML standards). Our goal is to work towards developing tools compatible with these developing proteomics database standards so data and analysis software could be exchanged. As an open-source tool set, it will be able to incorporate new academic (and other) advances in quantitative and qualitative protein expression analysis - including interaction with various Internet proteomic databases and system biology databases. A Web site has been set up at <U>http://open2dprot.sourceforge.net/</U> that goes into more detail. As parts of the pipeline software become usable, they are made available on the Web site (see Subprojects) on the web site. These programs are written in Java and run on most computers. <H3>Flicker interactive comparison of biological samples across the Internet</H3> Flicker, <U>http://open2dprot.sourceforge.net/Flicker</U>, is a stand-alone software program to facilitate the interactive comparison of biological samples across the Internet. It is a major update of the previous subproject, the Flicker applet program Lemkin PF, et. al. (1999) <I>Mol Biotechnology</I> <B >12</B>(2): 159-172. The new program is described Lemkin PF, et. al. (2005)(12Mb PDF) Comparing 2-D Electrophoretic Gels Across Internet Databases. In <U>The Proteomics Handbook</U>, JM Walker (Ed), Humana Press Inc, Totowa, NJ, pp 279-305. Scientists around the world often work on similar data so the need to share results and compare data arises periodically. Flicker is a Java computer-implementation of a flicker-comparison method for comparing 2D gel image data of two similar samples created in different laboratories to help putatively identify protein spots in user's 2D gels. This is done indirectly by matching the user's gels with a reference map gel lookup for selected spots at federated reference gel servers such as at SWISS-2DPAGE and subsequently using that data to access the PIR UniProt, iProClass, and iProLink servers. Users may simply add directories of their own images to compare. The Flicker program includes zoom, brightness-contrast, image warping and other image transforms to help make the images more comparable. An image guard region was added to enable comparing the corners of 2D gels such as might be found in some types of gels with sparse spot populations including those used to characterize nanoparticles. Users may create lists of spots on each of two gels and then manually pair the gels after flicker comparison. If one of these gels is a Web reference gel (such as as SWISS-2DPAGE, it can automatically lookup the protein ids and names on the spots they selected. We are adding additional databasing capabilities to allow users to build their own local database of known proteins in gel data for which no Web reference gels exist. This can facilitate using the program to make quick putative identifications within a laboratory. The program is available the Web site. Flicker is written in Java and runs on most computers. <H3>CmpDIGE tool for characterizing DIGE images</H3> We are developing the CmpDIGE Java application tool for investigating subtle properties of 2D DIGE images from different (but similar) proteomes with the goal of modeling these changes and possibly identifying post-translational protein modifications of relatively small size. <H3>ProtPlot data mining tool for virtual protein expression patterns</H3> ProtPlot is an open-source Java-based data-mining software tool for virtual 2D gels as part of the TMAP (Tissue Molecular Anatomy Program) Medhedad D, et. al. (2003) <I>Proteomics</I> <B>3</B>(8): 1445-1453. It may be downloaded from <U>http://tmap.sourceforge.net/</U> and run as a stand-alone application on your computer. An exploratory data analysis environment provides tools (scatter plots, expression profiles, clustering) for the data-mining of quantified virtual 2D gel (pIe, Mw, expression) data of estimated expression from the CGAP EST mRNA tissue expression database. This lets you look at the aggregated data in new ways using scatter plots, expression profiles and clustering using various data filtering criteria. As is well known, mRNA expression generally does not correlate well with protein expression as seen in 2D-PAGE gels (Ideker et. al. <I>Science</I> 292: 929-934, 2001). However, some new insights may occur by viewing the transcription data in the protein domain. If actual protein expression data is available for some of these tissues, it might be useful to compare mRNA estimated expression and actual protein expression. This tool may help find those proteins with similar expression and those that have quite different expression. This might be useful in thinking about new hypotheses for protein post-modifications or mRNA post-transcription processing. ProtPlot Java software is available at the Web site and could be used with other types of protein expression data