The research plan describes our proposed approach to capture existing chemical carcinogenesis research data contained within the PHS-149 series of publications entitled "Survey of Compounds Which Have Been Tested for Carcinogenic Activity" through scanning, and convert this data into an electronic database through optical character recognition, automated data field recognition, and data modelling techniques. This database will be made available for distribution via CD-ROM and World Wide Web. Phase II objectives include refinements to user interface, database, and OCR software developed during Phase I; large- scale data capture of approximately 20,000 pages of text; and conversion of this text into a database. Due to the large scope of the overall task, innovative means have been identified to complete the conversion effort during SBIR Phase III with funding provided through the commercial sale of the database software. There is significant commercial potential for both the database product itself and the processes and techniques developed to create the database. Successful completion of this task will provide researchers with unprecidented access to previously unavailable PHS-149 data, and a set of tools to navigate through this data in an efficient manner. PROPOSED COMMERCIAL APPLICATION Significant commercial application of this research exists for the sale and distribution of the database containing PHS-149 information via CD-ROM and by subscription on-line. Technologies developed to assist in our task can also be licensed to OCR product providers.