Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Thorough and accurate annotation of repetitive content in genomes depends on a comprehensive database of known TEs, along with robust statistical and procedural methods for recognizing decayed instances of elements and disentangling their complex relationships. Annotation of TE instances is usually performed using our RepeatMasker software, which compares a genome to a database containing representations of known repeat families. These have historically been consensus sequences, which generally approximate the sequences of the original TEs. The largest repository of such consensus sequences is Repbase, whose restrictive license and limited interface for curators has led to a lack of input from third parties and the creation of many unaffiliated, often organism-specific open databases. The parallel existence of these many databases has led to a divergence in nomenclature and repeat definition. Our Dfam database is an open access collection of repetitive DNA families, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). We have demonstrated that profile HMMs support improved annotation sensitivity, and Dfam provides numerous aids to both curators of TE families and those who make use of the resulting annotations. In this proposal, we describe a plan to develop the infrastructure of Dfam to expand to 1000s of genomes, and to establish a self-sustaining TE Data Commons dependent on limited centralized curation. We further describe plans to improve the quality of repeat annotation through development of methods for more reliable alignment adjudication, to expand approaches to visualization of this complex data type, and to improve the modeling of TE subfamilies. By further developing this open access database, we will provide a strong disincentive for the proliferation of unaffiliated non-standard repeat datasets and ease the burden of data management for those developing TE libraries.