The Specific Aim of this Phase I proposal is to test the feasibility of a bioinformatics technology for correction of amplification bias in T cell recepor (TCR ) repertoire sequencing (REP-SEQ), thus providing the foundation for this technology in clinical diagnostics. The T cell repertoire is the foundation of human adaptive immunity, and deep T cell repertoire sequencing is now commonly used in a research setting to quantify immune responses (Robins et al., 2009; Wang et al., 2010; Robins et al., 2012). Clinical immune repertoire sequencing has a large awaiting market because multiplexing all possible V(D)J combinations into a single assay significantly decreases material and labor costs compared with current diagnostic methods. For example, conventional leukemia minimal residual disease (MRD) work-ups demand laborious customization, cost up to ~$5000 per patient, and have a turnaround time of several weeks. We estimate that for the MRD market alone, our technology would save ~$140 million in annual costs for diagnostics labs, and would produce more sensitive data in less than half the time of standard MRD work-ups. The technical innovation of the product is to use bioinformatics to clean the no representative amplification that plagues multiplexed repertoire amplification (Robins et al., 2012). First, we will build a lare control library of TCR plasmid clones. Next, we will build REP-SEQ libraries using the control clones as templates and generate a large training set of data from these libraries using next-generation sequencing (NGS). Finally, we will build a linear model for correction of raw data using this training set, and test the feasibility of the linear model using a second set of TCR clones. We will require that the bioinformatics method consistently clean biased REP-SEQ measurements such that regression analysis between observed clone counts versus expected clone counts achieves an average R2 of >0.95 and an average slope of >0.9 (power=0.8, =0.05) and such that clonotypes present as low as 0.01% have an average coefficient of variation (CV) of <10% across hundreds of measurements (power=0.8, =0.05). Additionally, the technology must be sufficiently sensitive for reliable detection of clonotypes that are present as low as 1 copy in 1 million, such that the area under the receiver operator characteristic curve (AUC) is greater than 0.8 across hundreds of measurements ( =0.05). The methods that we develop in Phase I will enable us to perform a large 510(k) validation study for FDA approval of a molecular kit for clinical REP-SEQ in Phase II. The final product will be priced at <$1000 per sample and will enable diagnostics labs throughout the US to streamline their operations without having to ship samples to a reference lab. PUBLIC HEALTH RELEVANCE: Diagnostics laboratories often analyze T cells to help characterize disease. We are building a streamlined, cheaper, and more comprehensive system for T cell analysis in clinical laboratories.