The development of genomics, proteomics and advanced imaging technology has resulted in the accumulation of vast amounts of biological data. As large scale data sets become predominant in biomedical research, we are approaching a paradigm shift in which the process of discovery is data-driven, and in which data are the source of hypotheses as well as the means for testing them. These masses of data are rich sources of information;however, extracting meaningful information can be a daunting challenge, and often presents a bottleneck for the discovery process. Thus, there is a pressing need for interdisciplinary training of scientists who understand the data, how they are generated, and what they are used for. In addition, these scientists must become developers and highly skilled users of the new computational tools necessary to analyze large data sets. The goal of this program is to train students to become proficient in the following areas: 1. Data acquisition. This will include knowledge of the methods of genomics, proteomics and imaging. 2. Computation. This will include knowledge of mathematical and statistical algorithms, implementation of effective computer codes as well as an emphasis on methods of data warehousing in relational, deductive and other databases. 3. Data integration. This is a critical area that involves extracting useful information from the heterogeneous data sets at various spatial and temporal scales. It will include knowledge of methods of modeling and simulation of systems from the molecular to the organism level. There will also be an emphasis on computational data mining methods. The core of the program will be research-based training in interdisciplinary teams under the guidance of at least two mentors from disparate disciplines (i.e., computational/mathematical and biomedical sciences). Training activities will consist of specialized didactic coursework as well as seminars, journal clubs and a student-faculty retreat. This will be a cross-institutional training program with faculty drawn from departments ranging from computer science and statistics to genetics and medicine, in five participating institutions in the Gulf Coast Consortia in the Houston Area. The training of scientists equipped to manage and extract information from large data sets will greatly facilitate biological discovery in areas such as infectious disease and cancer and therefore this training program will have a direct, positive impact on public health.