During the past year, work on this project focused on four loosely related topics in biological sequence analysis: 1) Eric Nawrocki completed a project wit the lab of David Spector (Cold Spring Harbor) characterizing which vertebrate genomes contain orthologs of the human noncoding RNA MALAT1 (Metastasis-Associated Lung Adenocarcinoma Transcript 1) A paper about this work was published in Cell Reports. 2) We added features to our software to annotate all genes and other features in virus sequences, done in collaboration with the group of J. Rodney Brister in our Center. A paper partly touching upon this work was published in Nucleic Acids Research. 3) We continued development of algorithms and software tools to improve the identification of nucleotide sequences that are contaminated by cloning vectors. These tools are currently being applied to correct thousands of contaminated sequences stored in the non-redundant (nr) database of sequences used by researchers world-wide. As of August 2017, we have corrected 8,303 sequences. The tools were made publicly available at https://github.com/aaschaffer/vecscreen_plus_taxonomy. A paper has been submitted. 4) We extended our prototype software tool to recognize bacterial 16S rRNA sequences to a much more sophisticated and general tool to recognize 10 classes of structural RNAs. The expanded version of the tool is called ribosensor, and is being used within our center (National Center for Biotechnology Information to evaluate the validity of batch submissions to GenBank that claim to contain at least 5000 structural RNA sequences within any of the 10 categories covered by ribosensor.