The long range goal of this research is to exploit to the fullest extent the favorable properties of a very complex satellite DNA in the analysis of the relationships between primary structure and secondary structure, and between primary structure and susceptibility to spontaneous mutagenesis and evolutionary divergence. The satellite, isolated from a land crab, is G+C-rich (63%). The average repeat unit is 2.07 kb and there are 16,000 copies/genome. The satellite is a family of closely related sequences within which there are hundreds, perhaps thousands, of variants making possible a systematic investigation of the relationships between primary DNA structure and points of major sequence divergence. Sequences of the satellite have been conserved among the Crustacea. A number of satellite variants have already been cloned; three have been sequenced. Principal aims are: (1) By restriction mapping of a relatively small number (approximately 300) of the variants, obtain a statistical estimate of the number of variants (possibly each of the 16,000 differs in detail from all of the others). (2) Identify and characterize "hotspots" for major sequence divergences such as deletions, insertions, scramblings, or amplifications. Candidates for "hotspots" are (a) simple repeating sequences that permit strand slippage and consequent looping out of vulnerable single-stranded segments, (b) inverted repeats that permit possibly vulnerable cruciform stem-loops, (c) segments rich in alternating purines and pyrimidines that permit formation of Z-DNA and vulnerable single-stranded regions at the B yields Z junctions, (d) stretches of only purines or only pyrimidines on one strand; in other organisms as well, these have been shown to impart S1 sensitivity (i.e. single-stranded character) to such domains, and (e) segments bordering amplified sequences that have already been identified in some of the variants. Other aims are: (3) Using a cosmid library that will be developed from total land crab DNA, localize the junctions of the satellite in the genome. (4) Search for RNA transcripts of the satellite, which contains consensus sequences for splice junctions, a number of open reading frames, OCAAT boxes, and abbreviated polyadenylation sites.