The function of millions of proteins remains unknown, and automated protein function prediction systems have a poor record of performance. We will test hypotheses about protein functional sites by validating high-throughput predictions derived from computational biology techniques through a novel automated system that will mine the literature for targeted information relevant to those predictions. The impact of our work will be to enable large-scale, validated, annotation of protein function and in turn to facilitate progress in tackling drug discovery for treatment of diseases. High-throughput experiments and bioinformatics techniques are creating an exploding volume of data with which we hope to transcribe the genetic blueprints of life. Targeted experiments are required to validate biomedical discoveries from these sources. Fortunately, the information to confirm or refute a prediction is often already available in an existing publication and the biologist can take advantage of this supporting evidence for validation. However, the sheer volume of predictions from high throughput methods exceeds the capacity of researchers to perform even the necessary literature searches. This gap in capacity must be addressed using automated literature mining methods that perform comparably to a human expert;indeed, development of such methods is a grand challenge of modern Biology. We will mine the full text literature to validate computational predictions of functional sites in proteins. The innovations in our approach include: (1) using computational predictions as the context for a literature search;(2) information extraction of protein functional sites from full text journal publications;(3) high-throughput text mining;and (4) using primary information in protein databases to evaluate the methods. Understanding of protein function is a critical bottleneck in the progress of biomedical research. It is time to truly integrate the biological literature into the protein function prediction problem. By doing so, we will enable a critical advance in high-throughput protein function prediction