We have developed a general strategy for de novo protein design. The strategy is based on the premise that the sequence locations of hydrophobic and hodrophilic residues must be specified explicitly, but the precise identities of the side chains need not be constrained and can be varied extensively. We have tested this premise by constructing and characterizing a large collection of novel proteins designed to fold into four-helix bundles. The collection of proteins was generated in vivo from a degenerate family of synthetic genes. Each member of this family encodes a different amino acid sequence, but all sequences share the same pattern of polar and non-polar residues. This sequence degeneracy was made possible by using degenerate codons: Wherever a non-polar residue was required, the degenerate codon NTN was used (giving Phe, Leu, Ile, Met, or Val); wherever a polar residue was required, the degenerate codon NAN was used (giving Glu, Gln, Asp, Asn, Lys, or His). The synthetic genes were then expressed in E. coli. Characterization of the expressed proteins indicates a majority of the novel amino acid sequences fold into stable structures that are both compact and a- helical. Thus, a simple binary code of polar and non-polar residues arranged in the appropriate order appears sufficient for de novo protein design. (Summarized in: Kamtekar, Schiffer, Xiong, Babik, and Hecht (1993) Science 262, 1680). The sequences of the novel proteins were determined by sequencing the DNA of the corresponding genes. However, in some cases we have reason to suspect the integrity of the purified protein. Does the purified protein sequence exactly match that of the sequenced gene? Or has the protein been modified by proteolysis in vivo or during the purification? In particular, does the bacterial machinery remove the N-terminal methionine, or is it retained? These questions are most readily addressed by determining the mass of the purified proteins by electrospray mass spectrometry.