We propose to further develop, test, evaluate and support caTIES - an existing software system for developing networked repositories of sharable de-identified surgical pathology reports. The caTIES system creates a repository of de-identified, structured, and concept-coded clinical reports derived from large corpora of clinical free-text. Documents are automatically coded against a controlled terminology such as the Unified Medical Language System (UMLS), SNOMED-CT, or NCI Metathesaurus. Users construct queries to identify specific kinds of documents and tissue specimens based on the associated clinical report. For example, a researcher studying genetic variation in metastatic breast cancers can identify cases of invasive ductal carcinoma of the breast, followed by metastatic ductal cancer in bone at an interval of three years or greater from the original diagnosis. The caTIES system also supports acquisition and ordering of tissues, using an honest broker model. Through this mechanism, de-identified data and access to tissue can be shared among institutions, enabling multi-center collaborative research. The caTIES system has already been implemented at seven US Cancer Centers, and is being considered for adoption by numerous other institutions including cancer centers, university hospitals and private hospitals. Initial development of caTIES was funded by the Cancer Biomedical Informatics Grid (caBIG). However, interest in the application has far exceeded our expectations and the limitations of caBIG. This grant will allow us to further extend the capabilities of the system by (a) improving the portability of the system and extending the types of documents that can be processed, (b) evaluating the system's NLP performance and usability, (c) building a user community to support this open-source application, and (d) piloting interoperability of caTIES with other enterprise and research systems. This work will preserve and extend a highly novel platform for development of massive repositories of de-identified clinical data that can be used for research within and across institutions. PUBLIC HEALTH RELEVANCE: This grant will fund the further development and evaluation of a system that takes identified clinical documents and converts them into de-identified, concept-coded, structured data. The system enables researchers to access remainder tissues and clinical report data for research purposes within and across institution. This project is important because it will greatly increase the access of researchers to important data and materials while maintaining patient privacy.