The human microbiome contributes essential and complementary genetic and metabolic components to the host human. Until recently, microbiologists mainly studied individual culturable species of microbes, even though a vast majority (approximately 95%-98%) of microorganisms cannot live in pure culture. Facilitated by the rapid advancement of the DNA sequencing techniques, metagenomics attempts to directly determine the whole collection of genes within an environmental sample. To study the human microbiome at a global level, metagenomics becomes the methodology of choice for the Human Microbiome Project (HMP). We propose to develop computational methods addressing several challenges to the metagenomic analysis in HMP, namely, the assembly of short reads from pyrosequencing, the functional annotation of protein coding genes through database searching, and the characterization of the biodiversity in samples. We start with a novel approach to assembling short reads from metagenomics, called ORFome Assembly, by assembling putative ORFs from homologous proteins in the same family into a protein family graph (an Eulerian path approach). We then propose a network matching approach for the similarity search using the protein family graphs as queries. We anticipate that using protein family graphs will result in database searching with higher sensitivity and specificity than simply using unassembled sequencing reads. Finally, we propose to develop computational tools to simultaneously assess the biodiversity and biological functions in samples, by identifying the most likely set of coherent pathway variants covering the annotated gene functions within the metagenomic data based on the similarity search results. These software tools will enable researchers to efficiently and effectively analyze the data from HMP, which will enhance the understanding of the relationship between the human microbiota (i.e., the microbes living on the surface and inside human body) and human diseases, and hasten the development of better or new therapies. PUBLIC HEALTH RELEVANCE: We propose to develop computational methods addressing several challenges to the metagenomic analysis of human microbiome project (HMP) data. These software tools will enable researchers to efficiently and effectively analyze the data from HMP, which will enhance the understanding of the relationship between the human microbiota and human diseases, and hasten the development of better or new therapies. [unreadable] [unreadable] [unreadable]