PROJECT SUMMARY - HGNC The core objective of the HUGO Gene Nomenclature Committee (HGNC) is to enable effective communication and data exchange between everyone with an interest in human genes, by providing a unique, concise and standardized gene symbol and name for every human gene. As usage of genomic data increases in healthcare the need for a consistent language for genes becomes ever more vital. Aim 1 of this proposal will ensure the continuity of standardized naming of novel human protein-coding genes, long and small non-protein-coding RNA (ncRNA) genes, and pseudogenes. In particular all putative novel protein coding loci will require careful consideration; as active members of the Consensus Coding Sequence (CCDS) consortium, HGNC can ensure that named loci accurately reflect the current expert manual annotations provided by consortium members. Aim 2 concentrates on renaming newly characterized human genes that are currently assigned a placeholder symbol; some genes with a phenotype-based nomenclature will also be considered for reassignment. These updates will reflect functional information, aid name transferal to orthologous genes, and avoid confusion. This round of revision will be a precursor to a nomenclature freeze for the majority of protein coding genes. Aim 3 focusses on coordinating gene naming across vertebrates, collaborating with the six existing vertebrate gene nomenclature committees (mouse, rat, chicken, Anolis, Xenopus, zebrafish) which base their naming on human gene nomenclature. As the Vertebrate Gene Nomenclature Committee (VGNC) this effort will be extended to naming genes in vertebrates lacking a nomenclature authority, such as cow, dog and horse. Recent gene naming in chimpanzee enabled creation of the necessary tools, pipelines and databases required for naming in other species. Assignments will reflect homologous relationships to human genes. Aim 4 will investigate the utility of machine learning in automated assignment of systematic nomenclature to long ncRNA genes and vertebrate orthologs, and potentially to pseudogenes. Models will be trained on manually curated datasets to learn both explicit and implicit practices, and then applied at scale. Aim 5 will ensure consistent naming across vertebrates in complex and divergent gene families, which can reveal some of the most interesting evolutionary biology and account for key diversity between species, but require careful manual curation. HGNC will collaborate with experts to expand the successful gene naming in the pharmacologically relevant cytochrome P450 (CYP) family to further species, also working across species on two other complex key families involved in detoxification, the glutathione S-transferases (GST) and UDP-glucuronosyltransferases (UGTs). HGNC will also work with experts to expand gene naming of olfactory receptors, the largest vertebrate gene family, from mammals to other well- studied model organisms such as zebrafish, Xenopus and chicken. HGNC is the only group worldwide providing systematic and user-friendly names for genes that are used throughout biomedical science and clinical practice; this is a crucial facility, with increasing importance and direct benefits for public health.