It is the purpose of this research project to develop experimental designs and methods of statistical analysis for determining the degree of interrater agreement on various kinds of measurement scales. The kinds of evaluations made in survey research in public health, and in clinical research in medicine, are subject to many sources of variation irrelevant to the characteristics being evaluated. It is sound scientific practice to have the raters in a project participate in a reliability study prior to the beginning of the main study to begin as planned. If they are not, retraining of the raters or revisions of the rating forms would be in order. Problems remain to be solved both in the efficient design of reliability studies so that an answer can be reached after the study of as few subjects as possible, and in the measurement of interrater reliability for certain kinds of data. Some of the specific areas of research we propose to investigate are the design and analysis of sequential studies of interrater reliability, procedures for periodically checking that the degree of reliability characterizing a group of raters at the start of a study is maintained throughout the course of the study (i.e., procedures like those used to maintain quality control), the effects of rater-by-subject interactions on the intraclass correlation coefficient of reliability for the case of quantitative ratings, and the continued development of kappa-like measures of interrater agreement for the case of categorical ratings.