Our long-term goal is to know the factors that determine the compositional patterns in various regions of the human genome by a combination of molecular and statistical methods. In particular, we want to know: (1) Do the GC-rich isochores (compositionally homogeneous DNA segments) have any functional significance? These isochores represent about one-third of the genome of humans and other warm-blooded vertebrates, but are nearly absent in cold-blooded vertebrates. Usually they are located in Giemsa light (R) bands of metaphase chromosomes and replicate early in the cell cycle, whereas most GC-poor isochores are located in Giemsa dark (C) bands and replicate late in the cell cycle. The origin of the GC-rich isochores is mysterious. One hypothesis is that the GC-rich isochores arose because of functional (selective) advantages whereas another hypothesis postulates that they arose from different mutational biases in different DNA regions in the replication of germline DNA. (2) Is "GC-richness" a good gene marker? And is the GC level at the third position of codons generally higher than the level in noncoding regions, so that this feature is useful for identifying coding regions? (3) What is the role of CpG islands in gene regulation? (4) Does the rate of mutation in a DNA region depend on its GC content? (5) Are the selective constraints on the usage of synonymous codons in vertebrate genes strong enough to affect the GC level and the rate of substitution at degenerate sited of codons? To investigate the above questions we propose to (a) clone and sequence several high, intermediate and low GC regions from Homo sapiens and Galago crassicaudatus (a primate). We will use the data to be obtained and data from other laboratories to study: (b) Do most GC-rich regions occur in or near coding regions? (c) Rates of nucleotide substitution in regions with different GC levels, (d) Evolutionary change of GC content and CpG dinucleotides in vertebrate and invertebrate genomes, (e) Variation of GC content among the various regions of a gene, and (f) Rates of silent substitution in coding and noncoding regions. A comparison of these rates can reveal the strength of selective constraints on the usage of synonymous codons.