The Biodata Mining and Discovery section has been actively involved in a variety of NIAMS research projects and in particular: - A study that shows generation of pathogenic Th17 cells in the absence of TGF-beta signaling - A deep sequencing analysis that identifies the genomic targets of the cytidine deaminase AID and its cofactor RPA in B lymphocytes - A systems biology analysis of PFAPA syndrome - A copy number variation study that identifies LEPREL1 (P3H2) intron 1 deletion associated with protection from multiple inflammatory diseases - Homeostatic tissue responses in skin biopsies from NOMID patients with constitutive overproduction of IL-1-beta - Opposing regulation of the locus encoding IL-17 through direct, reciprocal actions of STAT3 and STAT5 - IL-27 priming of T cells controls IL-17 production in trans via induction of PD-L1 - Neural crest deletion of Dlx3 recapitulates features of Tricho-Dento-Osseous syndrome - Combining microarray and ChIP-Seq data to screen for key transcription factors associated with folliculin interacting protein 1 in B lymphocytes - Identifying molecular targets of heterotopic ossification following war trauma using RNA-Seq and microRNA-Seq - Applying RNA-Seq to SLE: identifying distinct gene expression profiles associated with high levels of auto-reactive IgE antibodies in systemic lupus erythematosus Major computational approaches and methods developed are highlighted below. The development of a Peak Assignment and Profile Search Tool (PAPST) Based on our extensive experience in analyzing ChIP-Seq data, PAPST has been developed to combine several most useful data analysis methods developed previously with a unique feature of its own as an easy-to-use novel and fast profile search tool of ChIP-Seq data for genes with specific transcription factor binding and epigenetic modifications. Systematically analyzing post-peak-calling ChIP-Seq data is a great challenge not only because of a current lacking of the software tools, but equally important also because the limited existing tools are largely inaccessible to the lab scientists who are ultimately responsible for making sense of the peak-calling results. PAPST has been developed for post-peak-calling ChIP-Seq data analysis in response to this great challenge. With a few mouse clicks and within seconds, PAPST allows a user to quickly identify genes with specific transcription factor (TF) binding and/or epigenetic modification co-localization profiles, a novel and unique feature of the software tool that answers questions such as what are the genes with TF1 and TF2 binding and epigenetic mark A in their promoters, and epigenetic marks B and C in their gene bodies?. Other quick PAPST analysis results include peak distribution statistics among gene-centered genomic regions and the number of overlapping peaks for all pair-wise sample comparisons. PAPST can also generate microarray style gene-centered quantitative ChIP-Seq data with a single mouse click, which may then be combined with RNA-Seq or microarray data, if available, to facilitate further down-stream analysis. A Java based platform independent desktop application, PAPST is very user friendly and requires no special computational expertise to use. For advanced users, PAPST may also be creatively used as a general genomic interval based search tool to fast screen any coordinated genomic feature, such as genes or a set of TF binding peaks, against any other coordinated genomic features in any combination. A method that combines microarray data and ChIP-Seq data to screen for key transcription factors This is a computational strategy in which select ChIP-Seq data from GEO (Gene Expression Omnibus), after peak calling and peak assignment to genes, are combined with in-house generated microarray data to screen for potentially important transcription factors. The approach identifies the number of genes with a TF binding among all the expressed genes;it then does the same analysis with differentially expressed genes. A Fisher exact test is applied to the results to determine if the difference between the two sets of results (the number of TF occupied genes in all expressed genes vs the number of TF occupied genes in differentially expressed genes, given the two totals) is statistically significant. The TFs with a significantly higher percentage of TF occupied genes in differentially expressed genes as compared to that in all expressed genes would be the potential key ones for further down-stream analysis. A strategy to identify potential transcription factors that may regulate microRNA targeted genes This strategy involves in the following general steps on analyzing RNA-Seq and microRNA-Seq data: a) identify differentially expression genes;b) identify differentially expressed microRNAs;c) identify computationally predicted mRNA targets for the differentially expressed microRNAs with multiple methods such as TargetScan, PicTar, mirRanda, and mirSvr;d) identify a reliable set of predicted microRNA targets;e) identify the overlap between the differentially expressed genes and the predicted microRNA targets these are the genes for the next step;f) motif enrichment analysis with the promoter sequences of the genes identified in e, using tools such as MEME and DREME. Transcription factors with their binding sites enriched in such analysis would be the potential regulators for microRNA targeted genes and they may be subjected to ChIP-Seq in follow-up studies.