The current inability to identify which papers bearing the same author name (last name, first initial) are written by different individuals is an impediment to user retrieval of health-related information as well as research devoted to understanding the publication and collaboration behavior of biomedical scientists. Disambiguation of author names will help in scientometrics and health policy studies, as well as everyday scientific tasks of numerous kinds: for example, choosing referees and conference attendees. We have created a probabilistic model of how the attributes of Medline articles vary across authors, and hypothesize that this can serve as the basis for disambiguating author names in Medline. In this exploratory two-year study, it is proposed: 1. To create and evaluate a database of "author-individuals" that lists all of the papers in Medline and assigns the great majority of them to one or more specific author-individuals with high confidence. A probabilistic model based on Medline record fields will be refined which estimates, for any two papers bearing the same name, the probability that they were written by the same individual, including supplementary information such as author first names and affiliations for all authors. Then, clustering algorithms will be optimized and applied to form author-individual clusters for all names in Medline. 2. To update the author-individual database (weekly) and underlying probabilistic model (yearly), and to create and evaluate a free, public, multi-user query interface. The database will also be made available to academic researchers for bibliometric, scientometric and policy studies. This research will set the stage for more in-depth studies of publication and collaboration behavior in the future that should give valuable insights into ways to increase scientific productivity in biomedical sciences.