SUMMARY Annotation and curation of large-scale spatial gene expression data for sea urchins Spatial gene expression data are an invaluable source of information, as they provide a simultaneous assessment of transcript distribution over a large field of cells or even throughout entire embryos. The acquisition of spatial expression data remains in most species a slow process, which is why large scale collections of publicly accessible spatial gene expression data are only available for very few species. However, given the importance of gene expression data for developmental biology, in particular for the analysis of gene regulatory networks, these collections are widely used and benefit larger communities. The extensive analysis of gene regulatory networks in the sea urchin embryo have over the last two decades produced a large set of spatial gene expression data that unfortunately remains accessible only through individual publications. In addition, our lab has in the last few years conducted a systematic analysis of the spatial expression of regulatory genes during the first three days of sea urchin development. This analysis includes almost all genes encoding known transcription factors, approximately 350 regulatory genes in total. For every gene, spatial expression was analyzed at five stages during the first 72h of sea urchin development by whole mount in situ hybridization, and dozens to hundreds of microcopy images were acquired for each sample to capture different embryo orientations and different focal depths. The result are >220,000 images that without proper curation and annotation will remain difficult to access for the broader community, which includes scientists working with echinoderms and also an increasing number of scientists interested in comparative developmental biology. In this project, we will curate the existing set of expression data by collecting images of stained embryos from the newly generated dataset, and by selecting and processing for each gene and stage a small number of representative images for inclusion into a database. We will include also data from past research projects that focused mainly on the expression of regulatory genes during pre-gastrular development. We will complete the ongoing annotation of observed spatial gene expression patterns in order to enhance the accessibility of the spatial expression data. Furthermore, this project will develop a controlled vocabulary that will facilitate the consistent description of spatial gene expression domains that so far are characterized only at the molecular level, by expression of regulatory genes. Finally, we will use these data to generate a publicly accessible database of spatial expression data for sea urchins, which will display microcopy images of stained embryos along with detailed annotations of expression patterns, searchable by gene name and developmental stage.