The goal of this project is to create a unified technological infrastructure for protein expression and purification that will be suitable for large-scale structural biology initiatives. Central to our approach is the use of multiple genetically engineered affinity tags. We are currently trying to determine what combination of affinity tags is most effective and how to use them with maximal efficiency. At the same time, because most affinity tags have the potential to interfere with structural studies, we are also striving to develop more reliable methods for removing them. One of the greatest technical obstacles that we face is "the inclusion body problem"-i.e., the tendency of proteins to accumulate in an insoluble, inactive form. Because refolding of proteins can be an arduous and time consuming undertaking, some way to circumvent the formation of inclusion bodies would be advantageous. Sometimes this can be accomplished by fusing an aggregation-prone polypeptide to a highly soluble partner. We have demonstrated that Escherichia coli maltose-binding protein (MBP) is a remarkably effective solubility enhancer, and that in many cases MBP can promote the proper folding of its fusion partners as well. This chaperone-like quality distinguishes MBP from other affinity tags and greatly enhances its value as a fusion partner. Accordingly, MBP fusion proteins have become the cornerstone of our strategy for protein expression. Additional tags are utilized within the framework of an MBP fusion protein to facilitate purification of the target protein. Affinity tags would probably be used more often if it were not so difficult to remove them. This is usually accomplished by endoproteolysis of a fusion protein at a designed site. The main difficulty with this approach stems from the intrinsic promiscuity of the proteases that are commonly used to cleave fusion proteins. This problem is compounded by the fact that it is prohibitively expensive to purchase enough of any of these reagents to cleave fusion proteins on a scale amenable for structural studies. To overcome these problems, we produce our own supply of TEV protease, the catalytic domain of the nuclear inclusion protease from tobacco etch virus. TEV protease cleaves the amino acid sequence ENLYFQG/S between Q and G or Q and S with high specificity. In contrast to factor Xa, enteropeptidase and thrombin, there have never been any reports of cleavage at noncanonical sites in fusion proteins by TEV protease. The production of TEV protease in Escherichia coli has been hampered in the past by low yield and poor solubility, but we have been able to solve both problems by making synonymous codon replacements and producing the protease in the form of an MBP fusion protein. A more troublesome shortcoming of TEV protease is that it readily cleaves itself at a specific site, generating a truncated protease with greatly diminished activity. We have been able to rectify this problem as well by introducing amino acid substitutions that prevent autoinactivation without impeding the ability of the protease to cleave canonical target sequences. A systematic analysis of the enzyme's P1' specificity revealed that, in addition to G and S, many different amino acids can be accommodated in this position with relatively little impact on the efficiency of processing. The crystal structure of catalytically inactive TEV protease in complex with a peptide substrate illuminated the structural basis of its stringent substrate specificity. A homologous protease from tobacco vein mottling virus (TVMV), a close relative of TEV protease with a distinct sequence specificity, is currently being developed as an alternative reagent.