We have developed a method to clone synthetic DNAs with regions of random sequence. We are using this technique to create millions of ribosome binding sites in an isogenic background. The cloning vehicle contains the lacZ gene positioned to allow quantitation of initiation from each binding site and the f1 origin which allows rapid DNA sequencing. From a large set of sequences for which we know the relative efficiencies of initiation we can build linear models of the E. coli ribosome binding site. We will construct plasmids with regions that differ on the 5' and 3' sides from our original plasmid in order to study the more complex effects of mRNA structure and context on translational initiation. We propose to extend this technique into a tool for the study of any binding site. We are also studying sequence specific recognition of DNA by proteins. We have adapted the Shannon measure of information to quantitate the information content of several recognition systems. This has led to a prediction that the T7 late promoters have overlapping repressor sites, a prediction that we are now testing. We have shown that the information content of sites for recognizer proteins is related to the sequence specific binding energies, and have devised a method to determine the quantitative sequence preferences of proteins through in vitro binding to random DNA.