This research project is developing experimental designs and methods of statistical analysis for determining the degree of interrater agreement on various kinds of measurement scales. The kinds of evaluations made in survey research in public health, and in clinical research in medicine, are subject to many sources of variation irrelevant to the characteristics being evaluated. It is sound scientific practice to have the raters in a project participate in a reliability study prior to the beginning of the main study in order to determine whether the ratings are sufficiently reliable for the main study to begin as planned. If they are not, retraining of the raters or revisions of the rating forms would be in order. Problems remain to be solved both in the efficient design of reliability studies so that an answer can be reached after the study of as few subjects as possible, and in the measurement of interrater reliability for certain kinds of data. Some of the specific areas of research we are investigating are the design and analysis of sequential studies of interrater reliability, the effects of rater-by-subject interactions on the intraclass correlation coefficient of reliability for the case of quantitative ratings, and the continued development of kappa-like measures of interrater agreement for the case of categorical ratings.