The specific objective of this proposal is to create a method for quickly identifying the species composition of organisms present in high-throughput DNA sequencing data. The main hypothesis is that every organism has a unique k-mer frequency vector that can be constructed from the organism's genome to quickly identify the organism in heterogeneous DNA sequencing samples using methods from linear algebra and statistics. The goals of this research are to build a computational framework for storing and manipulating k-mer frequency vectors, develop a regression model for identifying organisms and estimating their abundance from heterogeneous, short-read DNA sequencing samples, and apply this method for city-wide pathogen detection as part of the New York City PathoMap project.