Most eukaryotic genomes include vast numbers of interspersed repeats (IRs), which are the remnants of mostly selfishly amplified transposable elements. Transposable elements have an exceptionally wide-ranging mutagenic effect on genomes, while recognition of IRs provide unparalleled information on genome evolution and is crucial in many aspects of bioinformatics. This grant would continue support for the maintenance and further development of RepeatMasker, a computational tool that has become the de facto standard for identification and characterization of IRs, RepeatModeler, a program designed to derive RepeatMasker-grade databases of IR consensus sequences, and related software. The source code for these tools is freely available to the public. We have recently co-created a database of profile Hidden Markov Models, called Dfam, for repeat families found in the human genome. RepeatMasker can use this database and we have found dramatically increased sensitivity over previous results. Our research and development plans include the following: a) We propose several ways in which sensitivity of detection can still be improved, including the creation of better profiles and exploiting our false positive and false negative benchmarks. b) To prepare for the onslaught of the 10,000 vertebrate genome project, we propose significant speed up strategies for both library creation and repeat analysis, and plan to improve repeat analysis for NextGen generated genomes. c) Dfam is meant to eventually comprise repeats for all genomes In collaboration with our colleagues at the Howard Hughes Medical Institute who house Dfam, we aim to develop tools to simplify and enhance submission and curation of entries. d) We plan to optimize our method of superimposing the RepeatMasker annotation of reconstructed ancestral genomes on extant genomes. We also propose several strategies involving IRs that could improve the construction of ancestral genomes.