Identify serological markers defined subgroups in CD patients using Big Data tools Crohn's Disease (CD) is a complex disorder with a wide spectrum of phenotypic heterogeneity in disease onset, symptoms, location, severity as well as response to treatments. Serological markers such as CBir1, Anti-OmpC and Anti-I2, are linked to heterogeneity of clinical features in CD. These markers constitute an opportunity to further understand the contributions of the multitude of risk factors, the innate and adaptive immune responses, and underlying CD phenotypic subgroups and heterogeneity in clinical features. Here we propose to apply Self-Organizing Map (SOM), a Big Data tool that can identify latent subgroups using complex data in a non-linear and non-parametric way, to a large cohort of 3,812 CD patients to define subgroups in CD. We will examine the clinical features in the serologically defined subgroups to evaluate whether those subgroups can reflect the heterogeneity in CD clinical phenotypes. We will also evaluate effects of known CD risk factors including smoking and previously identified genetic variants in those subgroups, to evaluate the relationship between the subgroups and heterogeneity in CD etiology. Furthermore we will utilize genetic, microbiome/metabolome and glycome data in this repository to screen for novel risk factors specific for given subgroup, and will explore the underlying pathways and networks to better understand the distinct and overlapping pathogeneses of the subgroups. This current proposal represents the first attempt to identify CD subgroups using comprehensive serological panels in a large cohort. This is also one of the first studies utilizing Big Data tools to understnd the complex disease structure in CD. Moreover, this is the first study to systematically integrate serological markers, genetics, microbiome/metabolome, and glycome data in identifying driving factors underlying CD heterogeneity. In particular, this is the largest study with both CD serology markers and glycome in the world, and the first study ever to integrate glycome, which is known to play an important role in shaping human immunity, with serological markers in CD. Identified subgroups in the proposed analysis, which are expected to be more homogenous, can help to develop individualized or tailored treatment strategies for CD patients. Also with the comprehensive integration of genetics, microbiome/metabolome and glycome data, those subgroups can help to identify novel risk factors and might lead to better understanding of CD etiology.