We report on several specific projects that are related to the DNA sequence-dependent structural characteristics, important for interactions with proteins (including p53), higher order self-organization of genomic DNA, and gene regulation. 1. Novel mechanism of protein-nucleic acid recognition in the minor groove Information readout in the DNA minor groove is stereochemically challenging, because the base specificity is obscured there, and hydrogen bonds per se are insufficient for rigorous selection of the DNA sequence. Still, the minor groove interactions are rather specific and important for complex formation with many proteins, from enzymes to transcription factors to architectural proteins. The characteristic examples are the eukaryotic regulatory proteins TBP, LEF-1, HMG1, hSRY. In all these complexes, the sequence-dependent deformability of DNA plays a critical role in increasing the selectivity of recognition. It has been known for a while that interactions with proteins are accompanied by "partial B-to-A transformation" in the duplex, but there was no clear understanding of the general principles of the DNA mechanics underlying such an indirect readout of the sequence. To deduce the molecular origins of the minor groove recognition through the DNA sequence-dependent deformability, we analyzed the structural role of A-like nucleotides (with C3'-endo sugars) in the protein-DNA complexes solved by X-ray crystallography and NMR. We found a striking difference between the sets of amino acids interacting with B-like and A-like nucleotides in the minor groove. Polar amino acids mostly interact with B-nucleotides, while hydrophobic amino acids interact extensively with A-nucleotides. This tendency is consistent with the larger exposure of hydrophobic surfaces in the case of A-like sugars. Overall, the A-like nucleotides aid in achieving protein-induced fit in two major ways. First, hydrophobic clusters formed by several consecutive A-like sugars interact cooperatively with the non-polar surfaces in proteins. Second, the sugar switching occurs in large kinks promoted by direct protein contact, predominantly at the pyrimidine-purine dimeric steps. The sequence preference for the B-to-A sugar repuckering, observed for pyrimidines, suggests that the described DNA deformations contribute significantly to protein-DNA recognition. In particular, this previously unappreciated specificity in the minor groove could be operative for fast "pre-recognition" of DNA sites by proteins. 2. A-tract distribution and DNA packaging in pro- and eukaryotes Periodic positioning of the A-tracts in DNA causes DNA curvature in solution and facilitates its bending in the complexes with proteins. Here, we analyzed distribution of these sequences in the pro- and eukaryotic genomes (E.coli, S.typhimurium, B.subtilus and H.sapiens). We found that distribution of the strongly bent A-tracts (4-7 bp) in the prokaryotic genomes demonstrates a remarkable periodicity of 10-11 bp. Such a periodicity may reflect the intrinsic propensity of prokaryotic DNA to form the loop-shaped structures. Based on these data and by analogy with the "gene repression" gal- and lac-loops in E. coli, we hypothesize that the loop folds with the structural period of ~100 bp may be elementary units of the prokaryotic nucleoid packaging. This hypothesis was tested by the micrococcal nuclease digestion of bacterial nucleoids (in collaboration with S. Adhya, NCI). The results show that the ~100 bp DNA fragments are highly overrepresented in digestion products, thereby implying a highly specific nucleoid packaging, with the DNA structural period of ~100 bp. In contrast, the A-tracts of all lengths are highly overrepresented in eukaryotic genomes. At the same time, the "optimal" A-tracts (4-7 bp) do not reveal the 10-11 bp periodicity. Apparently, the intrinsic curvature of DNA, caused by the A-tracts, is not a necessary prerequisite for the formation of nucleosomes. Rather, the overabundant long purine runs observed in eukaryotic genomes may serve as the "chromatin organizers," decreasing the DNA propensity for the formation of nucleosomes. 3. DNA looping in prokaryotes Stabilization of the multi-subunit protein-DNA complexes is facilitated by DNA looping. One of the best characterized is the gal loop in E. coli, involved in regulation of the gal operon. To determine the optimal trajectory of the DNA loop in such a complex structure, we used the knowledge-based elastic model suitable for large-scale DNA simulations (developed by us earlier). As a result, we found that the "antiparallel" gal loop is energetically more favorable than the "parallel" one. The same trend was found for the DNA loop formed upon binding of the lac repressors to DNA. Based on these computations, we designed detailed experiments to visualize the 3D organization of the DNA loops in bacteria. The atomic force microscopy supports the "antiparallel" DNA looping, both with the gal and lac repressors. These results imply that the "antiparallel" DNA looping may be a general feature of the condensed bacterial nucleoid, as opposed to the parallel DNA "wrapping" around histones in eukaryotic chromatin. Importantly, the regular DNA folding in prokaryotes is consistent with the periodic distribution of the curved A-tracts in bacterial genomes, described above. 4. Genome-wide distribution of p53 sites in human DNA The tetrameric p53 binding to DNA plays a key role in tumor suppression. In response to DNA damage and other types of cellular stress, the p53 protein becomes activated and binds DNA sequence-specifically, functioning as a transcriptional factor or cell cycle regulator. p53 is unique in regulating a wide spectrum of genes: thousands of human genes are either activated or repressed by p53. Normally, the p53 tetramer binds to DNA response elements, consisting of two decamers RRRCWWGYYY (half-sites) separated by a spacer. (The length of the spacer, S, varies from 0 to 14 bp in the known functional binding sites, but in most cases S=0 or 1.) How many putative p53 binding sites, consistent with this scheme, are there in the human genome? What is the distribution of the spacer lengths? With the human genome sequence we can directly answer these questions. The distribution of spacers proves to be extremely nonuniform in all human chromosomes, with strong peaks in the profile, exceeding the average background 3-4 fold. The peaks at S=0 and 10 bp, and the gap at 4-5 bp are consistent with our earlier computer modeling and electrophoresis measurements, indicating the lateral positioning of the p53 core domains on the outer side of the DNA loop. In general, these data agree with the idea that the p53 tetramer can bind DNA specifically without unwrapping nucleosomes in the course of transcriptional activation of the chromatin-assembled genes. Currently, we are exploring localization of the putative p53 sites with respect to the starts of transcription. Our data indicate strong difference between the up- and down-regulated genes in terms of distribution of the p53 sites in the vicinity of genes. The up-regulated genes are characterized by a twofold higher occurrence of the p53 sites within 1 Kbp from the start of transcription, compared to the down-regulated genes. This is an extremely important observation, as it can be used for prediction of the p53-activated genes. Summarizing, distribution of p53 sites in the human genome reflects the versatility of p53 binding and its tumor suppressor functions.