Recently the existence of scale-invariant long-range correlations extending across thousands of base pairs-and sometimes over the length of the entire chromosome-has been demonstrated in DNA sequences that contain non-coding material. While the mechanism for protein coding is well understood, little is known about the role of noncoding parts of the genome (introns and intergenomic sequences) that comprise more than 2/3 of the entire genome length in some eukaryotic species. The study of long.range correlations may provide a tool for better distinguishing coding and non-coding regions of genomic sequences. It may also answer a number of questions regarding the organization, evolution, spatial structure, and function of the noncoding genomic material. We are requesting NIH support to carry out investigations with the following specific aims: 1) Systematically test the hypothesis (based upon our preliminary observations) that messenger RNA and exon sequences usually have only short range correlations while introns and intergenomic sequences have scale-invariant long-range correlations; also, test the universality of this observation by new statistical methods that we have designed specifically for non.stationary systems such as DNA sequences. 2) Quantitatively characterize the role of known dynamic phenomena in DNA evolution-such as duplication and shuffling fragments within and between chromosomes, insertion of repetitive elements and intron deletion; also, develop an evolutionary model of DNA genome organization and test it on large number of DNA sequences, and test the hypothesis that long-range correlations are related to the three-dimensional spatial structure of the DNA molecule in the chromosome. 3) Develop a new statistical "Coding Sequence Finder" (CSF) algorithm that automatically identifies coding regions, based upon our finding that long- range correlations exist in non-coding regions and do not exist in coding regions of DNA sequences. These aims are directly related to the practical need of detecting coding regions in DNA sequences as well as the long term goal of understanding the role of non-coding DNA sequences in global genomic structure, organization, and evolution.