The correct functioning of many proteins depends on glycosylation, the addition of sugar molecules (glycans) to selected amino acids in the protein. For example, cancer cells have different glycosylation patterns than ordinary cells, and there is strong evidence that glycoproteins on the surface of egg cells play an essential role in sperm binding. Despite the importance of glycosylation, there are as yet no reliable, high-throughput methods for determining the identity and location of glycans. Glycan identification is currently a manual procedure for experts, involving a combination of chemical assays and mass spectrometry. The automation of the process would have a significant impact on our understanding of this important biological process. The proposed project aims to invent chemical procedures, algorithms, and software for high-throughput analysis of glycan mass spectrometry data. The goal is to bring glycan analysis up to the level of peptide analysis within 3 years. In contrast to peptide analysis, which can leverage genomics data, glycan analysis requires the incorporation of expert knowledge of synthetic pathways, in order to limit the huge number of theoretical combinations of monosaccharides to the much smaller number that are actually synthesized in nature. The project will have to develop novel representations for the evolving expert knowledge, because an exhaustive list- analogous to the human genome- is not currently known. Along with expert knowledge, the project will develop and validate machine learning and statistical techniques for glycan identification. In particular, the project will develop methods for internally calibrating spectra, and will learn fragmentation patterns that can statistically distinguish different types of glycosidic linkages.