Gene expression measurement using microarrays or next-generation sequencing techniques, is a popular and useful technology for genomic analysis. Challenging problems result from the large volume of data generated in these experiments. Quality control and experimental design remain important fundamental issues. Analytical techniques which account for complex experimental designs and minimizing artifacts are required. Bioinformaticians are required to be able to handle large scale data projects while also being to process data into a format where statistical procedures can be applied. There are different statistical and bioinformatics issues that remain and this project attempts to address some of these. Next generation sequencing techniques are now a popular means for RNA expression measurement (RNAseq). As with microarrays, a host of technical and quality control issues remain as challenges, in addition to the new statistical problems implied by change of scale from continuous (microarray fluorescence) to discrete (read counts). Affordable, high-quality software availability has been one of the bottlenecks in analysis of microarray data. We have further developed the MSCL Analyst's Toolbox written in the JMP software package to address this need. This toolbox allows investigators to download Affymetrix microarray data from a central database, normalize and transform the data, inspect it for a variety of outliers or defects, perform a variety of statistical tests to select relevant genes affected in the experiment, and then visualize and classify various patterns of gene expression. In collaboration with over forty investigators in NCI, CC, NHLBI, NINDS, NIAID, NHGRI, NICHD, NIA, NIDDK, NIDA , this tool has been applied to dozens of microarray studies. The Analyst's Toolbox has been extended to now handle analysis of RNAseq and metagenomics microbiome data. In addition, the capability to link data from the user's workstation to online databases has been a nice feature that has been recently added to the Toolbox. In a collaboration with NHGRI and NHLBI, we are conducting an investigation of transcriptomic differences using a case-control design of coronary artery calcification, based on ClinSeq study samples. We are studying a select number of genes using NanoString technology and are currently working on an analysis that takes over 100 clinical parameters into account. That same experiment has been extended to a possible novel transcript finding in coronary artery calcification patients. This finding is being further researched within our lab and the NHGRI. An analytical pipeline was developed and published in Plos One in February 2016 utilizing 16s rRNA sequencing data from the Ion Metagenomics Kit (Life Technologies). This publication analyzed and evaluated the results from four different mock samples from BEI Laboratories. The results of this work will be used in other collaborations with the NIAAA and the Clinical Center Nursing Department. In a collaboration with the Clinical Center Nursing Department, we are in the process of finalizing the analysis of 16S rRNA sequencing of severe aplastic anemia (SAA) patients. This is a longitudinal study where samples were collected at baseline, three months after treatment and 6 months after treatment. The study had an enrollment N of 24 patients. Because patient antibiotic use could be a major confounding factor in studying the microbiome of patients, we are in the process of analyzing and correlating patient antibiotic use with their overall diversity changes at different time points. Considering that patients were very ill and were prescribed many different antibiotics, this analysis will be crucial in data interpretation. In addition to the above collaboration with the Clinical Center, a collaboration with NIAAA studying the oral and gut microbiome changes during the alcoholic detoxification process in underway. The pipeline published in February 2016 will be used to analyze this data generated from the 16s Ion Metagenomics Kit (Life Technologies). Libraries from this kit include amplicons that target 7 hypervariable regions of the 16s gene. The study has almost complete enrollment and DNA extraction has begun for almost all gut samples. Two samples have been sequenced in a pilot thus far. The analysis of that data has shown that the data collection method and bioinformatics processing has proven to yield many gut bacteria. A manuscript is in preparation for this study documenting the novel gut sampling used for this project. In an effort to avoid sampling bias, whole stool homogenization was used for data collection for this project. In a different collaboration with the NIAAA, patients with alcohol use disorder (AUD) are being investigated in an attempt to discover important variables associated with Treatment (Tx) and non-Treatment (Non-Tx) seeking individuals. This patient cohort has over 200 variables collected for each patient covering drug, alcohol and smoking use, clinical diagnostics, personal characteristics, family history, psychological, emotional and social traits. Using Alternative Decision Tree Analysis from a classification software tool, Weka Tool (Frank E 2016), a subset of important variables predicting Tx and non-Tx with roughly about 85% or more accuracy were found. The goal of this work is to devise a patient-specific treatment plan based on whether a patient seeks treatment or does not. Being able to reduce the number of highly predictive variables will greatly improve patient classification for applied treatment plans and will be an important contribution to the field of mental health and drug abuse disorders.