The availability of complete genome sequences is beginning to have a profound impact on structural biology. For the first time, it is possible to select targets from among many thousands of open reading frames, any of which can easily be retrieved from its genome through the power of the polymerase chain reaction (PCR). Understandably, this perspective has conjured up visions of structural biology on a grand scale, creating a new field that has come to be known as "structural genomics". Yet, transforming this dream into reality will require technical advances that increase the speed with which the three-dimensional structures of biological macromolecules can be determined. The goal of this project is to develop an integrated strategy for protein expression and purification that will be suitable for large-scale structural biology initiatives. The term "high-throughput" is often used in connection with structural genomics. However, we believe it is more appropriate to think in terms of "high-output." After all, what matters in the end are not how many experiments that one does but how many successful outcomes one has to show for it. Output is a function of two variables: input and efficiency, the latter being a measure of how many targets that enter the pipeline successfully emerge from the other end. In principal, output can be augmented by increasing either of these variables. Our research focuses on trying to maximize the efficiency of protein production. This would be an uphill battle if protein expression were a completely haphazard venture, but we do not believe this is the case. While we recognize that no single method will succeed all of the time, experience has taught us that some approaches are, on average, consistently more productive than others. Our goal is to merge these high-probability approaches into a coherent strategy for "maximum likelihood" protein expression and purification that will enable us to increase output by improving efficiency, irrespective of the scale of the project.Central to our approach is the liberal use of genetically engineered affinity tags. The advantages of affinity tags are compelling: they can improve the yield of a recombinant protein, help protect it from intracellular proteolysis, enhance its solubility, and facilitate its purification. We believe that the collective attributes of affinity tags far outweigh their potential disadvantages. Moreover, it is impossible to imagine a generic process for the production of recombinant proteins that does not involve affinity tags. Having committed ourselves to the tagging approach, our objective is to derive the maximum possible benefit than we can from affinity tags. Affinity tags are not all the same; some perform certain tasks better than others do, and so we need to be cognizant of the advantages and disadvantages of various tags in order to use them to maximum advantage.We have been particularly interested in exploring the influence of affinity tags on the solubility of recombinant proteins, because insolubility is a major problem in protein expression and purification, and refolding of proteins seems incompatible with high-output applications. Through trial and error, we have discovered that one tag in particular, E. coli maltose-binding protein (MBP), has an amazing ability to improve the solubility and promote the proper folding of its fusion partners. Because we believe this to be a rare and valuable attribute, we have adopted MBP as the cornerstone of our affinity-tagging strategy. One of our ongoing objectives is to understand the underlying mechanism of the solubilizing effect so that we can better manipulate it to our advantage. We also hope to learn something of a more fundamental and general nature about the process of assisted protein folding by studying this model system in detail. Although MBP has some powerful advantages, it is not a particularly good affinity tag for protein purification.