In an effort to intervene before psychosis onset and prevent morbidity, a major recent focus in schizophrenia research has been the identification of young people during a putative prodromal period, so as to develop safe and effective interventions to modify disease course. Over the past decades, studies at Columbia and elsewhere have evaluated clinical high-risk (CHR) individuals across a range of cognitive processes in an effort to identify core deficits of schizophrenia evident before psychosis onset. Subtle thought disorder, manifest in disturbance of language production, is a feature that predates rather than follows, psychosis onset in CHR individuals, and therefore may be an indicator of schizophrenia liability. Subtle thought disorder in schizophrenia and its risk states has typically been evaluated using clinical rating scales, and occasionally labor-intensive manual methods of linguistic analysis. Here, we propose to instead use a novel automated machine-learning approach to speech analysis informed by artificial intelligence. The method derives the semantic meaning of words and phrases by drawing on a large corpus of text, similar to how humans assign meaning to language. It also evaluates syntax through part-of-speech tagging. These analyses yield fine-grained indices of speech semantics and syntax that may more accurately capture subtle thought disorder and discriminate psychosis outcome among CHR individuals. Using these automated methods of speech analysis, in collaboration with computer scientists from IBM, we were able to identify a classifier with high accuracy for psychosis onset in a small CHR cohort at Columbia, which included semantic coherence from phrase to phrase, shortened phrase length, and decreased use of determiner pronouns (which, what, that). These features were correlated with prodromal symptoms but outperformed them in terms of classification accuracy. They also discriminated schizophrenia from normal speech. While promising, these automated methods of analysis require validation in a second CHR cohort. In this proposal, in collaboration with IBM, we will validate these automated methods using a large archive of speech data from the UCLA CHR cohort. This dataset has several advantages. First, the UCLA CHR cohort has a high prevalence of psychosis transition, important as machine learning is sensitive to group size. Second, it has undergone prior manual linguistic analysis, identifying features of language production that predicted psychosis outcome; hence, automated and manual methods can be directly compared. Third, there are speech data available from healthy controls and recent-onset psychosis patients (for validation). Fourth, several participants have multiple speech assays (such that stability of the classifier can be examined). Beyond validation of methods, we will maximize group size and combine speech data from Columbia and UCLA to characterize a common classifier of psychosis outcome. Automated methods for language analysis may improve prediction of psychosis onset and inform remediation strategies for its prevention.