The focus of this project is development and refinement of statistical procedures for evaluating and selecting regression models. Problems under investigation include two areas: multi-step variable selection in multiple regression analysis and discriminating among alternative model specifications, including non-nested classes of models. Although various variable selection procedures have found widespread application in biostatistics and epidemiology, e.g., in the analysis of case-control and cohort studies with many risk factors, their statistical properties are quite poorly understood. In particular, statistical inference both at intermediate steps and for the finally chosen model is problematic. Many stepwise procedures are based on repeated tests of significance. Theory has been developed that addresses the problem of derivation of the distribution for the F-ratio at each step of a sequential forward selection. It is shown that beginning with the second step, the distribution of the F-ratio involves some nuisance parameters, but an appropriate conditioning leads to an exact conservative test. It is also shown that the conventional cut-off values based on the central F- distribution lead to a "liberal" test that does not control the Type I error. A FORTRAN program has been developed jointly with D. Midthune to calculate the correct cut-off values at each step of the procedure. Many alternative model specifications in applied biostatistical studies contain non-nested classes. Methodology is being developed for discriminating among non-nested models. The approach is based on a nonparametric relevancy criterion that evaluates each model by comparing its performance for the observed data and generated pseudo-data with no relationship between response and explanatory variables. The computer simulations demonstrate that this criterion has better statistical properties than many known procedures. Current research is being conducted jointly with D. Midthune and C. Heuer and includes application of this approach to discriminating among different "age-period-cohort" log-linear models.