Summary: The Biodata Mining and Discovery section has been involved in several NIAMS research projects including: - Investigation of genetic causes for Juvenile-onset Dermatomyositis - Identification of de novo mutations in IKBKG (NEMO) gene from a patient cohort - Examination of the effect of NEMO splicing mutant on host defense and inflammation - Development of computational method to detect de novo mutations in duplicated genomic regions - Lung fibrosis is induced in SAVI patients who have a monogenic autoinflammatory interferonopathy by an endothelial-mesenchymal transition-like differentiation process - An interferon gene expression signature score to screen pediatric patients with early onset auto-inflammatory disease - Dosage study of baricitinib to treat interferon driven auto-inflammatory diseases CANDLE and SAVI - The effect of T-bet on the transcriptional response to type I interferon signaling - Investigation on cause of osteoproliferation and aberrant bone formation - Microbial dysbiosis implication in Spondyloarthritis (SpA) pathogenesis - Regulation of bone mass by DLX3 - Retinoic acid inhibits expression of IL-9 by altering enhancer accessibility - Essential nuclear functions of the FUBP family of DNA/RNA binding proteins - BACH2-immunodeficiency shows an association between super-enhancers and haplo-insufficiency - Automation of an ATAC-Seq data analysis pipeline - STAT5 paralog dose governs T cell effector and regulatory function - Gene expression analysis on Lung Cancer and Merkel Cell Carcinoma - Neutrophil subsets and gene signatures are associated with subclinical cardiovascular disease in systemic lupus erythematosus - The effects of shared information on semantic calculations in the Gene Ontology - Characterization of distinct transcriptional profiles of neutrophil subsets in systemic lupus erythematosus through RNA-seq & ATAC-Seq - Defining the role of IL-22 on the development and metabolic activity of Osteoblasts as it relates to ankylosing spondylitis - Proteostasis Dysregulation and Autoinflammation in Patients with TRNT1 Deficiency - Investigation of the influence of ribosomal RNA on RNA-Seq results Major accomplishments are highlighted below. Disease-causing mutation identification in the IKBKG (NEMO) gene: Screened a cohort of 69 patients with suspected monogenic auto-inflammatory disease and their 123 unaffected parental controls and identified two de novo mutations in IKBKG (NEMO) gene from two unrelated patients. Analysis pipeline development for de novo mutation detections in duplicated genomic regions: A number of clinically important human genes are located in segmental duplication regions with near perfect sequence identity. Standard variant callers are ill-suited for mutation calling in these regions. A computational pipeline was developed to pinpoint potential mutations that can be further investigated. Identification of somatic mutations in a cohort of patients suspected of having somatic diseases: Analyzed whole exome sequencing and high depth targeted sequencing data. Discovered potential somatic mutations in KRAS and DNMT3A genes. Evaluation of single cell sequencing data from 10X genomics: Both 3 RNA sequencing data and T-cell VDJ sequencing data have been evaluated. Tested variant calling from single cell 3 RNA sequencing data. Identification of disease-causing compound heterozygous mutations in the PRG4 gene: Analyzed whole exome sequencing data from a family affected by Camptodactyly-Arthropathy-Coxa Vara-Pericarditis (CACP) syndrome Gene expression changes in spondyloarthritis patient-derived Mesenchymal Stem Cells and Osteoblasts: iPSCs were generated from patients and controls and then differentiated to disease relevant cell types. Their transcriptome responsiveness to cytokine stimulation was assessed in order to determine pathways that may be dysregulated in the disease. Interplay between gut microbial dysbiosis and host gut transcriptomic changes in an animal model of spondyloarthritis: A progressive dysbiosis of gut microbiome and host tissue gene expression change accompanies progressive gut inflammation in an animal model of spondyloarthritis. Although the gut microbiomes of healthy animals of different strains are quite similar, on induction of disease, different strains take on distinct patterns of microbial dysbiosis while generating similar changes in transcriptome indicative of inflammation. Examination of the effect of a NEMO splicing mutant on host defense and inflammation: A NEMO splicing mutant lacks exon 5, which encodes a TBK1 binding domain. Gene expression shows that in fibroblasts, TNF, but not TLR3 and RLR pathways were dysregulated. In contrast, NF-kB and IFN production were increased in T cells and monocytes. An interferon gene expression score is proving useful in identifying type I interferonopathies: Developed a Nanostring gene expression assay to quantify type I interferon gene expression in whole blood, which has proven useful in screening new pediatric auto-inflammatory patients for involvement in type I interferon as opposed to other inflammatory pathways. Effects of T-bet on transcriptional response to type I interferon: T-bet is a transcription factor that promotes Th1 cells, which preferentially produce interferon-gamma, the Type II interferon. RNA-Seq was used to show that T-bet represses type I interferon responsive genes, thereby preventing an aberrant type I IFN amplification loop. Investigation on the function of RNA-binding protein AGO2: A highly customized comprehensive approach has been designed to study the functions of RNA-binding protein AGO2. It has been applied to integrating data from RNA-seq, miRNA-Seq, PAR-CLIP, and ChIP-Seq. Super-enhancers and disease genes: After establishing the significant association of super-enhancers with HI genes as compared to HS genes, we expanded our analysis to include publicly available GWAS data and showed that there are more coding-region GWAS SNP harboring HI genes than HS ones. An approach has been also been developed to assess the significance of this observation over 10,000 randomly selected gene sets with similar size distributions. Integrated analysis of Lupus patient samples: After investigating the genome-wide chromatin openness on LDG and NDN patient samples and the corresponding gene expression studies, ATAC-Seq data and RNA-Seq data have been integrated. Interesting gene groups have been selected based on up/down regulation, and the results of pathway analysis and their ATAC-Seq read density profiles have been generated. An interesting pattern has emerged, which is currently under further investigation. Automation of ATAC-Seq data analysis pipeline: This pipeline has been expanded to include initial reads count, adaptor trimming, PE fastq file-based redundant reads removal, genome-mapping, fragment size distribution calculation and graphing, fragment size-based sub-grouping, normalized data visualization file generation (bigWig), peak-calling and peak-annotation. The complete pipeline has been automated, which allows bench scientists to take advantage of these automated resources. A method to solve a false discovery rate issue in ChIP-Seq data analysis: This method has been developed to address a statistical issue associated with a popular ChIP-Seq peak-calling program (MACS1.4.2). It involves an additional separate calculation of FDRs based on the original results. The method has been applied to successfully address a specific concern from a reviewer on a manuscript submitted to eLife. Single Cell RNA-Seq data analysis pipeline: Our latest data analysis pipeline is Cell Ranger-based and includes sequence fastq file generation, individual sample RNA-Seq data generation, and data aggregation for multiple sample comparison purposes.