A system of C++ language programs has been developed for the purpose of finding the closely related documents in Medline and for the purpose of performing machine learning on sets of documents. The system has a number of unique features: 1) It is based on a number of C++ classes and highly modular so that alterations in the system are relatively simple to perform. 2) The system currently processes PubMed data by extracting from the Sybase repositories using a C++ interface toSybase. However, a change in the interface portion of the system would allow it to be applied to any large database consisting of discrete textual records. 3) Data processed by the system is stored as compressed file structures, etc. These structures are updatable so that new data may be continually added to the system as it becomes available. 4) Documents are compared with each other using a Bayesian form of analysis. 5) The latest work on this system has involved a study of theoptimal form for weighting in the retrieval algorithm. This is based in part on new algorithms for isotonic regression that speed up the regression calculations.