Analytics (Objective 1) The SSA adjudication process is dynamic, involving a complex sequence of decisions by several offices within SSA as well as the decisions and resources of the claimants themselves. NIH undertook an overarching project, the Adjudication 1 project, to comprehensively model this process. Project goals are to: 1. Develop analytical tools to analyze various aspects of the adjudication process in terms of accuracy, consistency, and timeliness; 2. Develop tools to predict how the system responds to external shocks; 3. Develop methods to analyze data taking into account the multi-stage application process in which data are collected; 4. Develop tools to assist with the disability determination process; 5. Quantify the extent to which SSA can adjust the system to respond to changes; 6. Derive useful statistics to monitor and adjust program performance, based on important outcomes measures (accuracy, timeliness, consistency). Additional details by project: Case Status Change Model and Queuing Theory: This project aims to develop methods to analyze system timeliness, measure processing times, and derive optimal flow characteristics. Our work in the area of system timeliness took on two complementary directions. To study system delays, we built a queuing model for the adjudication system that allows the user to obtain system performance statistics. Addressing the separation between wait times and processing time is our second direction of inquiry. For this purpose, we developed a batch processing model. To date, we have scripted code to obtain the needed transition probabilities, determined the rate at which jobs enter the system, completed the queue code, and estimated the distributions of processing/waiting times by using three techniques. Case review nominator: The SSA evaluates adjudicated claims for benefits administered by SSA. As a result of the wide varieties of language that might be used, reviews focused on particular issues are typically limited to samples that included entire populations without any ability to screen out unlikely cases. With assistance from SSA's staff, NIH developed an automated document classification tool to identify cases with specific issues. The tool used a set of labeled legal documents to build predictive models for identifying new cases of interest. We implemented the automated document classification tool in Python, using Python NLTK and Scikit-learn for text processing, normalization, and feature extraction, as well as Python Scikit-Learn or Weka for feature selection and classification. The document classification algorithm was implemented successfully in June 2014. Since then, we expanded the classification algorithms available in the tool, and we implemented parameter selection methodology to optimize the document classification model parameters. We optimized the parameters of the document classification models for the SSA Case Review Nominator by running over 10,000 combinations per model to find the best parameters. We also expanded the performance metrics available for reporting and optimizations to include F-measure, precision, recall, and overall accuracy. Because SSA was interested in optimizing the model precision, we delivered to them a model that had 80% accuracy and 94% precision, and 73% recall. SSA started testing the model in production in June 2015. Data Mining Feasibility Study: This project placed a high-performance server inside the SSA firewall for the purpose of determining the software and hardware configuration that will allow us to extract information from decision files and medical data. With this server, we are pursuing three areas of inquiry: methods of information retrieval (IR) for measuring relevance of documents, methods in natural language processing (NLP) for extracting medical records, and partial NLP methods for analyzing decision files. Since the last report, we performed optical character recognition (OCR) on over 500,000 decision files for the Case Review Nominator project. We developed an optical mark recognition tool to extract check box information from SSA RFC forms and applied it to the 500,000 downloaded forms with an estimated accuracy of 91%. We are in the process of implementing methodology to extract text box fields from the RFC forms. We also investigated the different types of medical evidence available in SSAs electronic claimant folders. Listings nominator: Under current SSA rules, the presence of a condition that meets criteria in the Listings of Impairments (or that is of equal severity) is considered sufficient to establish medical considerations for allowances. We are currently building a tool for SSA to quickly find relevant listings from the Listings of Impairments. We are using natural language processing to extract information from the medical records and to create a more formal representation of the listings. To date, we parsed through most information in the Listings downloaded from SSAs website and are devising information retrieval methods to automatically identify ICD and CPT codes relevant to the listings inside the medical documents. Multidimensional classification: The objective of this project is to develop methods to use and interpret multiple scores representing different content areas consistently, accurately, and collectively. Specifically, we are developing methods to be applied to the FABs suite of scores (i.e. domain and sub-domain scores) to permit multiple subscale scores to be considered collectively. We are working on four methods to address this objective: a) the Maximum Coverage for Neyman-Pearson Multidimensional Classification with Monotonicity Constraint method, b) a clustering method called Heavy Hitters inspired from the study of web traffic to a particular server, and c) a semi-parametric model for classification and clustering, and d) multidimensional classification using norm quantile. Medical Continuing Disability Reviews: The objective of this project is to find ways to improve the CDR process and reduce the backlog of cases for review. This year, we began learning about the current CDR process and how we may improve its different components. We identified the lack of structured medical and functional evidence as the main limitation of this process. We identified key areas of improvement using data analytics provided the right data could be acquired or extracted and have prepared a final paper of our findings, which will be published and disseminated by the SSDI Solutions Initiative. CAT development (Objective 2) The NIH/RMD awarded a contract to the Boston University Health and Disability Research Institute (BU-HDRI) to develop a comprehensive set of tools to characterize the full continuum of individual capabilities (i.e. human function) relevant to work. This method uses Computer Adaptive Testing (CAT) coupled with Item Response Theory (IRT). The development of CAT tools (also known as the Functional Assessment Battery or FAB) is a sequential process; one step must be completed before advancing to the next step. In FY15, a calibration study was conducted to refine the item pools for the Daily Activities and Learning and Applying Knowledge. Additionally, Westat, a research survey firm, conducted replenishment testing of items for the first two domains, physical function and behavioral health. Boston University has embarked on a number of additional post development studies to enhance the functionality, utility, and comprehensiveness of the instruments. Publications generated by this year's research: A. Constantin, J. Porcino, J. Collins, C. Zhou. Data-Driven Solutions for Improving the Continuing Disability Review Process. SSDI Solutions Initiative, 2015.