The Biodata Mining and Discovery Section projects are: - Investigation of genetic causes for Systemic Lupus Erythematosus through whole genome sequencing - Investigation of mutation spectrum in CACP Syndrome patients - Somatic mutation analysis in Merkel Cell Carcinoma - Analysis of WES data from multiple NIAMS patients - Analysis of rare coding variant burden in WGS and WES patients - Investigation of the pathogenic heterogeneity of low-density neutrophils isolated from systemic lupus erythematosus patients - Examination of the distinct roles of PAD2 and PAD4 in the progression of murine SLE - Study on chromatin dynamics and regulation during neutrophil extracellular trap formation - Investigation of the mechanism of age-resistant genuine quiescent stem-cell state - The regulation of skeletal muscle stem cell function by EZH 1 - Exploration on the lineage specification through epigenetic regulation of Pax 7 - Investigation of epigenetic modules governing epidermal homeostasis - Study of the epigenetic regulation orchestrated by IRF8 deficiency during the osteoclast differentiation - Study of Spondyloarthritis and microbiome using a rat model - Analysis of the role of Mitomycin C in corneal sensory nerve repair after corneal wounding - Analysis of corneal epithelial and stromal tissue before and after wounding - The effect of Jakinibs on gene expression in Lupus patient whole blood - Effect of JAK inhibition on NK and ILC1 homeostasis in mice - IRF8 mutation and its effect on osteoproliferation and bone formation - Effect of pericardial cavity macrophages in heart injury and cardiac fibrosis - Examining the skin transcriptome in hidradenitis suppurativa - Role of epidermal SOX2 overexpression in cutaneous wound healing - Investigation on the distinct roles of peptidylarginine deiminases 2 and 4 in TLR-7 dependent lupus autoimmunity and vasculopathy via RNAseq - Analysis of a muscle-specific enhancer RNA via ATAC-seq & ChIP-seq - Role of retinoic acid in expression of IL-9 - Identification of Neutrophil subsets and gene signatures associated with cardiovascular risk in systemic lupus erythematosus - Epigenetic regulation of osteoclast differentiation and osteoclast-specific super-enhancers - Investigation of Argonaute-miRNA complexes in silencing target mRNAs in the nucleus of mammalian stem cells - Effect of impaired NET degradation and IL-1-beta-mediated NET formation in PAPA syndrome - Function of microRNA-221/222 in regulating gut homeostasis Major accomplishments are highlighted below. Investigation of genetic causes for Systemic Lupus Erythematosus through whole genome sequencing: Compared variant call quality between GATK pipeline and TAGC pipeline for the lupus cohort. Completed trio analysis using GATK calls and uncovered potential leads for functional study. Generated variants for gene burden testing and polygenic risk scores. Analyzed variants in genes involved in HLH. Investigation of mutation spectrum in CACP Syndrome patients: Analyzed whole exome and whole genome sequencing data from multiple families affected by CACP syndrome. Discovered multiple mutations in PRG4 gene. Helped develop strategy to uncover mutations in difficult-to-sequence regions in PRG4 gene. Somatic mutation analysis in Merkel Cell Carcinoma Analyzed WES data from Merkel Cell Carcinoma patients and controls. Identified somatic mutations and mutation signatures. Identified driver genes such as NOTCH1 and HRAS. Analysis of WES data from multiple NIAMS patients: Analyzed whole exome data and performed mutation search in families affected by sJIA, SpA, RP, as well as a cohort of lupus patients. Identified a CASP4 de novo mutation and an IRAK2 homozygous mutation. Improved kinship detection pipeline for homozygous mutations discovery. Analysis of rare coding variant burden in WES patients: Developed scripts to count rare coding variants in genes and performed gene burden tests for a cohort of 54 RP patients. Combined analysis of transcriptome and microbiome: Performed an inter-omic analysis of rat gut transcriptome and microbiome in an animal model of spondyloarthritis. Elucidated the host-bacterial interplay that helps explain how microbiome can influence the outcome of a systemic inflammatory rheumatic disease. Transcriptome of healing: Furthered the understanding of the transcriptome of healing by examining SOX2 overexpression in epidermis, looking at effect of mitomycin C on corneal wound healing, and comparing diabetic foot ulcers with normal diabetic skin. Clinical studies of JAK inhibitors: Examined the effect of JAK inhibitors on patients with Systemic Lupus Erythematosus and in the homeostasis of NK and ILC1 cells in mice. Integrated analysis of Lupus patient samples: ATAC-Seq data and RNA-Seq data have been further integrated to identify patterns of biological significance. The computational methods designed and implemented have made it possible to investigate on the differences between ATAC-Seq read density profiles of the gene groups defined by both gene expression profiles and biological pathways. Discovery of a potential new LDG patient sub group: RNA-Seq data from Lupus patients have been analyzed and a potential LDG (low density granulocyte) based lupus patient sub-group has been identified. Further analysis has shown that this potential subgroup is characterized with low expression of a biomarker CD10. Examination of the distinct roles of PAD2 and PAD4 in the progression of murine SLE: To explore the differential roles of pads proteins in the SLE, RNA-Seq was performed to reveal distinct modulation of immune-related pathways in the lymphoid organs of PAD KOs. Differentiated gene sets and related functional pathways were identified. Exploration of the global chromatin accessibility during neutrophil extracellular trap formation: ATAC-Seq data were performed to define the chromatin open regions in response to the ionophore stimulation. In parallel, RNA-Seq data were integrated as well. Data analysis pipeline implementation with Snakemake: Snakemake based data analysis pipelines have been created, implemented, and improved for ATAC-Seq, ChIP-Seq, and RNA-Seq. These pipelines include more comprehensive data analysis steps. They are more efficient and more powerful as compared to the previously built ones that are largely bash based. These snakemake based pipelines also allow better process monitoring and more efficient trouble-shooting. 3D genome conformation analysis with HiC The dimensional conformation of genomes analysis tools JUICER and hiCPro have been implemented to analyze Hi-C data at various resolutions. Compartments, contact domains, and locally enriched peaks can be defined. The meta data generated with the pipeline may be visualized in 2D together with 1D data generated from ATAC-seq, CHIP-seq and RNA-seq pipeline. Quantitative ChIP-spike analysis Introduced the Drosophila chromatin ChIP spike-in as internal control to compare adjusted signal strength across samples with spike-in adjusted ratio. Results seem to show improved ChIP-Seq data quantification. Motif enrichment analysis for heterogeneous data: This approach has been specifically developed for ATAC-Seq data where group members show a high level of heterogeneity that makes it difficult to perform motif enrichment analysis. Instead of using overlapping peak regions, the analysis is done for each individual sample. The enriched motifs for a given group are then identified as those that are enriched in a majority of the group members. A pathway analysis tool This R based software tool automates pathway analysis for multiple gene lists against the pathway databases in Enrichr. Results can be visualized to show pathways shared or uniquely enriched in the gene lists. Expanding single cell RNA-Seq data analysis to includ