(a) Mathematical modeling of the immunotoxin delivery process We have published the initial mathematical model of the immunotoxin delivery process (Chen et al., Annals of Biomedical Engineering, 36: 486-512, 2008). This model fairly well reproduces the dose-dependent in vitro cytotoxic activity and the reduction in volume of xenograft tumors in mice for a number of immunotoxins. More importantly, the modeling process identified some two dozen factors that are involved in the delivery process. We determined sensitivity of the tumor volume reducing capacity of different immunotoxins to each of these factors according to the model. For example, the tumor volume changed most sensitively to the changes in the normal growth rate of the tumor and the blood vessel density of the tumor tissue. The model gives a detailed space-time distribution of the immunotoxin in the tumor tissue. It also reproduces the "active site barrier" effect, which indicates that strong binders are not necessarily always the most potent and that there is an optimal binding constant for the given set of diffusion rate and number of binding sites on the tumor cell surface. This model, however, is incomplete in that it does not account for all the immunotoxin that is administered, i.e. if one adds up all the immunotoxin that is lost by various different mechanisms that are incorporated in the model, the sum does not equal the total administered dose. This accounting error of the model prevents one from computing the possible source of waste with confidence. We are currently working to devise an alternate model, which correctly accounts for all the losses and still work as well as the initial model. Additionally, Ira Pastan's group found that shed antigen in the tumor tissue is probably a major factor that reduces the efficacy of the immunotoxins. Accordingly, we will incorporate shed antigen in any new model that we produce in the future. (b) Investigation of the origin of new genes The existence of ORFan genes raises the question of how they arose. Assuming that viable ORFan genes do exist, and convincing evidences exist that they do, there seem to be only three possible explanations. One is that they arose de novo from previously non-coding sequences. The second is that they arose by duplication of an existing gene, but that they mutated so fast that the ancestor gene can no longer be detected. The third is that they came from foreign sources, possibly viruses and phages. This third source, however, seems to be unlikely in view of a recent survey by Yin and Fischer that the number of ORFan gene homologs found in the public viral gene database is small, substantially smaller than that for the normal genes. It is well known that proteins in an organism have a composition bias characteristic to that organism. Therefore, the simple idea was to compute and compare the composition biases of the protein products of the ORFan and normal genes, and of the artificial protein sequences produced by translating random sequences of which the stop codons were replaced by another codon. Using microbial genome sequences, We found (1) that there is a pattern of compositional bias among the natural protein sequences when compared to their average composition, (2) that random nucleic acid sequences, when translated into proteins, show a different, larger compositional bias when compared to the same average composition, (3) that non-coding and intergenic sequences, when translated into proteins, show an even larger compositional bias when compared to the same average composition, (4) that the ORFan sequences have compositional biases that vary between that for the natural protein sequences and that for the intergenic and non-coding sequences, and (5) this variation correlates with the age of the organism in which the ORFan gene is found, being closer the compositional bias of the natural protein sequence the older the organism. This result is consistent with the idea that new genes arose de novo from non-coding sequences, which initially mutated rapidly under positive selection then changed more gradually to the composition typical for the host organism. This work has been published recently. (c) Investigation of the yeast protein interaction network The Neighbor Overlap (NO) between two proteins in a protein-protein interaction network (PPIN) is defined as the number of neighbors that are common to both proteins. One expects that high NO pairs would have a similar function and therefore could serve as backup copies of one another. We could show that yeast PPIN is enriched with high NO pairs compared with random PPINs that are carefully constructed to preserve single protein properties (degree distribution and cluster coefficient) of the network intact. This is an expected result if high NO pairs indeed served as backup copies since many of the proteins in the yeast system must have backup mates. The high NO pairs also tend to share the same (high level) Gene Ontology (GO) annotation, indicating that they have similar functions, and stronger genetic interaction than the low NO pairs. Some, but by no means all, of the high NO pairs arise from the existence of protein complexes. Examination of many individual cases indicates that others appear to provide functional variation in addition to a backup function. This work is essentially finished and a manuscript has been submitted for publication.