We have shown that CpG-containing rare restriction endonuclease sites are much more abundant in cloned human DNA than expected from the average dinucleotide frequencies in total genomic DNA, and tend to be clustered. These observations could be accounted for if the human genome contains a minor fraction of G+C rich DNA in which the CpG frequency is not suppressed as it is in bulk DNA. Independent evidence for such a compartment has come from analysis of HpaII generated tiny fragments of genomic DNA (HTFs) less than 500 bp long by Cooper, Bird, O.J. Miller and co-workers. HTFs tend to be clustered in CpG rich islands about 500 - 2000 bp in size. Our study of rare restriction site clusters suggests that they can be considerably larger than this and are themselves frequently clustered. The present experiments are designed to provide further information on the size and distribution of rare restriction site clusters, their relation to HTF islands, their state of methylation in various tissues, and their functional significance. The starting material is a set of five cosmid clones containing two or more clusters of CpG-containing rare restriction enzyme sites. The distribution of HpaII and HhaI sites within these cosmids will be determined to see if the rare restriction site clusters fall within HTF islands or are distinct from these. The region (up to 2000 kb) around each of the five cosmids will be examined to determine the size of the clusters of rare restriction site clusters, using overlapping cosmids and field inversion gel electrophoresis. Hybridization probes within, and adjacent to, each cluster will be developed and used to examine the state of methylation of particular sites and regions in DNA from a variety of tissues. We will determine whether DNA sequences within, or adjacent to, clusters are in nuclease sensitive domains, contain open reading frames and are transcribed in tissues where they are unmethylated. We will use gel retardation, chromatography and footprinting to search for specific proteins that bind to specific sites within CpG rich rare restriction site clusters and find out if methylation of these sequences influences the proteins bound. We will determine whether these sequences have enhancer or promoter activity, their time of replication, and whether this timing is tissue specific and is related to methylation and/or nuclease sensitivity.