The goal of the proposed research is the analysis of biological sequence data to address the molecular mechanisms of evolution and the origin(s) of all viruses and related genetic elements. Phylogenetic trees will provide a framework for the mapping of cell and tissue tropism, pathogenicity and virulence, modes of transmission and geographical distributions, and many other higher order characteristics of viruses. The specific aims of proposed analytical studies are: 1) determining functionally equivalent networks and frequency of exchange among and between retroid elements, and their potential cellular homologues, including new studies on 300 retroviral env proteins; 2) inferring functionally important regions of all proteins of paramyxo-, rhabdo- and filoviruses, (with privileged access to new Ebola sequences), and Borna Disease virus, (including potential BDV sequences from schizophrenic patients); and 3) the analysis of the dUTPase gene, as a model system, to address issues relevant to the structure, function and evolution of duplicated sequences, and potential horizontal transfer among and between host and viral genomes. The specific aims of the technical studies are: 1) evaluation of stochastic production model approaches for generation of multiple alignments, detection of recombination, and calculation of evolutionary distances; and 2) development and testing of new and existing methods for historical reconstruction of functionally equivalent networks. RNA viruses (e.g. HIV, or Ebola) are the major causative agents of human, animal and plant viral diseases world wide. The heterogeneous nature of RNA populations makes it difficult to develop effective, anti-viral agents. The sequence database is now large enough to conduct comparative studies on natural variants versus chemotheraputically induced mutants for several retroviral proteins. This model study will provide new information on the nature of selected mutations which will be useful in future anti-viral drug development. Computational analysis of primary sequence data is an area of intense interest in biology, mathematics, statistics and systems science. In the last few years new approaches to problem solving and classification, such as machine learning, neural networks, genetic algorithms, and stochastic production models or, "intelligent systems" as they are referred to collectively, have become available. Unfortunately most biologists are unaware of these developments. Application of these methods to real data remains unexplored. The proposed studies will go a long way in rectifying this gap in technological utilization. These studies will continue to define important evolutionary relationships and events, provide biologically informative sequence relationships for bench-marking new software, and contribute new information relevant to the structure and function of viral proteins suggesting new directions in laboratory experimentation. Strategies and techniques developed for the analysis of highly divergent genomes can also be applied to the study of the wealth of sequence information generated under the auspices of the Human Genome Project.