The Biodata Mining and Discovery Section has been involved in multiple NIAMS research projects including: - Investigation of genetic causes for Systemic Lupus Erythematosus through whole genome sequencing - Identification of a disease-causing de novo mutation in FBN1 gene - Somatic mutation detection in ClinSeq and Inova cohorts - Analysis of WES data from multiple NIAMS patients - WGS analysis workflow implementation in NIAMS cluster and NIH Biowulf cluster - Study of Spondyloarthritis and microbiome using a rat model - Analysis of the role of Mitomycin C in corneal sensory nerve repair after corneal wounding - Analysis of corneal epithelial and stromal tissue before and after wounding - Distinctive wound healing signatures in oral mucosa vs skin - The effect of Jakinibs on gene expression in Lupus patient whole blood - IRF8 mutation and its effect on osteofproliferation and bone formation - Investigation of the influence of ribosomal RNA on RNA-Seq results - Investigation of the open chromatin state in Merkel cells during mouse embryonic development through scRNA & ATAC-Seq - A Muscle-Specific Enhancer RNA Mediates Cohesin Recruitment and Regulates Transcription in Trans via ATACseq & CHIPseq - Dissecting the distinct roles of peptidylarginine deiminases 2 and 4 in TLR-7 dependent lupus autoimmunity and vasculopathy via RNAseq - Retinoic acid inhibits expression of IL-9 by altering enhancer accessibility - Neutrophil subsets and gene signatures are associated with cardiovascular risk in systemic lupus erythematosus - Epigenetic regulation of osteoclast differentiation and osteoclast-specific super-enhancers - Argonaute-miRNA complexes silence target mRNAs in the nucleus of mammalian stem cells - Dissecting the distinct roles of peptidylarginine deiminases 2 and 4 in TLR-7 dependent lupus autoimmunity and vasculopathy - Impaired NET degradation and IL-1-beta-mediated NET formation contributes to inflammation in PAPA syndrome - MicroRNA-221/222 regulate gut homeostasis via tuning tissue Th17 cells and their transcriptomic phenotype Major accomplishments are highlighted below. Investigation of genetic causes for Systemic Lupus Erythematosus through whole genome sequencing: Established data transfer (Globus) and backup infrastructure for about 200 WGS samples (40+ TB of sequence data). Assessed the WGS quality of the lupus cohort. Developed analysis strategy for rare variants and common variants. Generated rare variant lists for about half the trio families. Performed family-based variant association tests such TDT and PBAT. Identification of a disease-causing de novo mutation in FBN1 gene: Analyzed whole exome sequencing data from a family affected by severe arthritis and geleophysic dysplasia. Somatic mutation detection in ClinSeq and Inova cohorts: Collaborated with NHGRI ClinSeq and Inova Translational Medicine Institute to screen somatic mutations in genes involved in hematopoiesis. Identified JAK2 V617F mutations in multiple individuals. Analysis of WES data from multiple NIAMS patients: Analyzed whole exome data and performed mutation search in families affected by sJIA, SpA, RP, as well as a cohort of Primary Sjogren Syndrome patients. Study of the overlapping and distinctive transcriptional programs of skin and oral healing: transcriptomes of skin and oral tissue were analyzed at various timepoints after wounding and compared with unwounded samples. Oral healing is faster and induces less scar tissue than skin. Understanding why could perhaps show us how to speed and improve skin healing. Interplay between gut microbial dysbiosis and host gut transcriptomic changes in an animal model of spondyloarthritis: A progressive dysbiosis of gut microbiome and host tissue gene expression change accompanies progressive gut inflammation in an animal model of spondyloarthritis. Although the gut microbiomes of healthy animals of different strains are quite similar, on induction of disease, different strains take on distinct patterns of microbial dysbiosis while generating similar changes in transcriptome indicative of inflammation. Analyzing the transcriptomic effect of Mytomycin C in Cornea healing: Mytomycin C has been shown to enhance corneal nerve repair in a system of mouse corneal wounding. We are trying to analyze the gene expression changes that would explain that. Investigation on the function of enhancer RNA(DRReRNA ): To investigate the roles of enhancer RNA(DRReRNA ) during the myogenic differentiation, ATAC-Seq data and ChIP-Seq data have been integrated. Distribution of the SMC3 CHIPseq signals has been generated. ATAC-Seq signal densities for regions affected by DRReRNA (DRRsiRNA) or SMC3siRNA in C2C12 cells were also generated. This work has been published in molecular cell. Examination of the distinct roles of PAD2 and PAD4 in the progression of murine SLE: To explore the differential roles of pads proteins in the SLE, RNA-Seq was performed to reveal distinct modulation of immune-related pathways in the lymphoid organs of PAD Kos Investigation on the function of RNA-binding protein AGO2: Several more highly customized comprehensive approaches have been designed and implemented to study the functions of RNA-binding protein AGO2. They have been applied to integrating data from RNA-seq, microRNA-Seq, PAR-CLIP, and ChIP-Seq. Among them is a specific computational approach developed to study the conservation of short AGO2 binding sequences with background sequences matched with both nucleotide compositions and gene expression profiles. A previously home-made python program has also been further expanded with enhanced functions for the conservation study. Integrated analysis of Lupus patient samples: ATAC-Seq data and RNA-Seq data have been further integrated to identify patterns of biological significance. The computational methods designed and implemented have been made it possible to investigate on the differences between ATAC-Seq read density profiles of the gene groups defined by both gene expression profiles and biological pathways. Single Cell RNA-Seq data analysis: A special method has been developed to enable the single cell gene expression analysis of several novel sequences. These sequences are not present in the databases used for the regular single cell RNA-Seq data analysis. Data analysis pipeline implementation with Snakemake: ATAC-Seq data analysis pipeline and ChIP-Seq data analysis pipeline have been implemented with a more advanced automation platform: Snakemake. The snakemake based pipelines include more comprehensive data analysis steps. They are more efficient and more powerful as compared to the previously built ones that are largely bash based. These snakemake based pipelines also allow better process monitoring and more efficient trouble-shooting. Discovery of a potential new LDG patient sub group: RNA-Seq data from Lupus patients have been analyzed and a potential LDG (low density granulocyte) based lupus patient sub-group have been identified. Further analysis has shown that this potential subgroup is characterized with low expression of a biomarker CD10. New motif enrichment analysis pipeline: A new transcription factor motif enrichment analysis pipeline has been developed to facilitate multiple sample based ATAC-Seq data analysis. The pipeline automates the identification of ATAC-Seq peaks present in all samples in a given sample group, the classification of peaks into group-unique and group-common ones, and the motif enrichment analysis for each of the sample groups.