The Biodata Mining and Discovery section has been actively involved in a large number of NIAMS research projects, the following in particular: - Developmental acquisition of regulomes underlies innate lymphoid cell functionality - Microbial dysbiosis implicated in Spondyloarthritis (SpA) pathogenesis - The molecular anatomy of human oral and cutaneous wound healing - Effects of IFN-g and TNF-alpha on induced iPSC-derived MSCs from SpA patients - International multi-site assessment of genetics and inflammation in early onset and familial Systemic Lupus Erythematosus - Generation of a B27 knock-out in iPSCs from an Ankylosing Spondylitis (AS) patient using CRISPR/Cas9 gene editing - Retinoic acid inhibits expression of IL-9 by altering enhancer accessibility - Essential nuclear functions of the FUBP family of DNA/RNA binding proteins - Early-Onset Severe Arthritis associated with a gain-of-function mutation in MYD88 - BACH2-immunodeficiency shows an association between super-enhancers and haplo-insufficiency - RNU4ATAC regulates T cell activation by impairing IL2 induced STAT5 phosphorylation - Resequencing of SJIA candidate genes identifies rare variant associations - Investigation of genetic causes for Juvenile-onset Dermatomyositis - Immune dysregulation in patients with TRNT1 deficiency - An active enhancer signature defines regulatory identity in the absence of Foxp3 - Investigation on the role of STAT1 and STAT3 in IL6 and IL-27 signaling in helper T cells - Regulation of bone mass by DLX3 - Identification of a disease-causing homozygous deletion in the SAMHD1 gene - STAT5 paralog dose governs T cell effector and regulatory function Major computational accomplishments are highlighted below. Disease-causing mutation identification in the RNU4ATAC gene: Applied whole genome sequencing and linkage analysis in a large family with three patients affected by an immune dysregulation syndrome characterized by primary immunodeficiency, short stature and polyglandular endocrinopathy. Analysis pipeline development for somatic mutation detection and filtering: Multiple somatic mutation callers were combined and optimal filtering developed for somatic mutation detection in non-malignant diseases. Identification of somatic mutations in a cohort of melorheostosis patients: Analyzed whole exome sequencing data from matched affected and unaffected tissues from a number of melorheostosis patients. Discovered a likely disease-causing gene shared by multiple patients. Evaluation of experimental and computational methods for detecting low frequency mutations: Examples are MDS (maximum-depth sequencing), BotSeqS (bottleneck sequencing), and amplicon-based deep target sequencing. Identification of a disease-causing homozygous deletion in the SAMHD1 gene: Analyzed whole exome sequencing and whole genome sequencing data in a family with a patient affected by undifferentiated autoinflammatory disease. Tofacitinib ameliorates murine Lupus and its associated vascular dysfunction: Contributed to analysis of the effect of Tofacitinib treatment on disease activity in a murine model of lupus. Gene expression changes in lupus mice were characterized, with and without tofacitinib treatment, using data generated from Nanostring. STAT5 paralog dose governs T cell effector and regulatory functions: The contrasting roles of STAT5 paralogs STAT5A and STAT5B in T cell subsets were analyzed. STAT5A and STAT5B binding and chromatin remodeling were analyzed using ChIP-Seq. Downstream changes in gene expression were analyzed using RNA-Seq. Also designed and implemented a data analysis approach to address critical issues involved in the sample-specific FDR calculations as well as a method to quantitatively assess sample replicability and data reproducibility. Developmental Acquisition of Regulomes Underlies Innate Lymphoid Cell Functionality: The epigenome and transcriptome of innate lymphoid cells and their Th cell counterparts were analyzed by ChIP-Seq ATAC-Seq, and RNA-Seq. It was determined that regulatory circuitry is eventually shared between these two divergent classes of immune cells. A considerable number of computing scripts have been developed and combined into an efficient data processing and analysis pipeline. Asymmetric Action of STAT Transcription Factors Drives Transcriptional Outputs and Cytokine Specificity: STAT1 and STAT3 are both involved in signaling in response to IL6 and IL27, although these two cytokines have different responses. The relative contributions of these STATs were examined using knock outs and analyzed using ChIP-Seq and RNA-Seq. Super-enhancers and disease genes: More publicly available datasets have been analyzed for this study, including a total of 27 cell lines and tissues and data from GWAS. In addition to super enhancers, typical enhancers have also been identified to facilitate the statistical assessment of the SE results. The computational investigation has also been expanded to include specific known groups of disease genes. The analysis results support the hypothesis that in general super-enhancers play more significant roles than typical enhancers in regulation of disease-associated genes. Intron expression measurement: A method has been developed to measure genome-wide intron expression. Combined with the individual exon based RNA-Seq results, it has been used to investigate the function of RNA-binding protein KHSRP and to identify potentially disease-associated RNA processing patterns. Non-coding RNA data analysis: A comprehensive method has been developed for long non-coding RNA (lncRNA) data analysis, which includes a) strand-specific expression measurement; b) genomic location based classification c) generation of strand-specific genome browser viewable files; d) bi-directional lncRNA identification; and e) comparison of genome-wide lncRNA reads and density profiles associated with different biological conditions. Gene-based chromatin openness assessment: A gene-based chromatin openness assessment method has been developed to identify the potentially disease-associated openness changes at the gene level in low and normal density granulocytes (LDG and NDG). Our current results identified 77 genes with a high number of associated ATAC-Seq peaks in the control samples and they apparently gradually lost the class status going from normal controls to NDG patients to LDG patients. Potential enhancer-binding transcription factors: A method has been implemented in which a nucleosome-free region (NFR) is determined within each enhancer and used for TF binding site enrichment analysis. This particular approach, together with Venn diagram analysis, has generated interesting results that would have been missed by simple peak overlapping analysis. Mutant TRNT1-associated codon usage analysis: A special method has been developed to study codon usage on mutant TRNT1-associated differentially regulated genes identified by RNA-Seq, coupled with results from the tRNA expression data analysis. This method takes advantage of the program suite EMBOSS together with several home-made scripts and it performs codon usage analysis based on the results from both RNA-Seq and tRNA-seq (down-regulated tRNAs for relevant anti-codons). The method is currently being used to study whether down-regulated tRNAs cause codon usage change in differentially regulated genes. Investigation on the function of RNA-binding protein KHSRP: A highly customized comprehensive approach has been designed to study the functions of RNA-binding protein KHSRP. It has been applied to both RNA-seq data and PAR-CLIP binding data of KHSRP and includes a total 19 sophisticated specific analyses of varying types.