From 10-01-02 to 9-01-03, our understanding of the significance of CTCF as a remarkable multifaceted protein with many important functions, has increased dramatically. By September 2003, the total number of original publications, identified by searching NLM PubMed for titles containing CTCF as the key-word, reached 90. Among these, every third paper was (co)authored by V. Lobanenkov with one or several member(s) of the MPS LIP current or former stuff, and/or with collaborators listed in this report. Three reviews on CTCF published during the same time (see refs) established CTCF as a true multivalent multifunctional protein which utilizes different subsets of 11 Zn-fingers (ZF) to form structurally-distinct complexes with varying ~50 bp CTCF-target sites (CTS) that mediate distinct functions in regulation of gene expression. These functions include context-dependent promoter repression or activation, creation of hormone-responsive gene silencers, and formation of all known in vertebrates enhancer-blocking activities (EhBA), often also referred as chromatin insulators or boundaries. We, and others, demonstrated that a subset of EhBA and silencers driven by methylation-sensitive CTCF binding plays critical role in reading in mammalian somatic cells of allele-specific regulatory marks that control X-chromosome inactivation (Xi) and parent-of-origin-dependent monoallelic gene expression of imprinted gene clusters. The LIP/MPS remains the only lab, which demonstrated that CTCF in vivo is capable of discriminating paternal vs. maternal alleles of the imprinting control region (ICR) of the H19 gene that regulates allelic Igf2 expression by means of a methylation-sensitive EhBA. Moreover, we were the first and the only lab that explained the very nature of the sequence-specific mechanism for CpG-sensitive CTCF-ICR interactions mediated by the H19 CTS by showing that it can occur only for such CTS, which display a perfect match between dispositions of dinucleotides with the me-dC-residues and the ZF-contacting nucleotides necessary for CTCF to recognize, and bind to, any given CTS. Taken together with the combinatorial principals of ZF usage for CTCF to achieve multiple sequence-specificity, the latter finding allowed us to predict that the same general mechanism of creating methylation-sensitive (M-S) CTCF-driven EnhBA is to be employed universally throughout mammalian genomes. Besides our joint work with the Ohlsson lab described below, perhaps the best example to prove that the idea was both correct and productive, came from this year HMG paper by Paul Sadowski, which showed the presence of CTCF sites in EACH - from over a DOZEN in total - of differentially-methylated EnhBAs and silencers identified in a very long (for such type of studies) imprinted region with 16 genes in of human chr 11p15.5. Moreover, the authors also suggested that various deletions in this region, associated with the congenital overgrowth disorder called BWS and with various malignancies, may be explained by the loss of CTCF functions mediated by at least some of these numerous CTS. Interestingly, similar idea of linking the loss of an CTS-EnhBA with pathology was previously made by V. Lobanenkov and his former co-workers who demonstration (Nat. Genetics, 2001) functional association between the loss of CTCF M-S binding in the DM1 locus and severe congenital myotonic dystrophy. To address whether the role for M-S CTS in epigenetic regulation could possibly be extension beyond regulation of gene-imprinting, we mapped - in collaborations with Bill Paul (on the IL-4 locus) and with Barbara Birnstein (on the IgH) - both constitutive CTS-dependent EhBA (i.e. putative boundaries that define independent regulation of these 2 loci) and identified a number of M-S EhBA driven by the number of varying, and often organized into clusters, CTS in the differentially-methylated regulatory regions. To assess whether these findings may indeed reflect specific cases of the universal EPIGENETIC role for CTCF targets, we collaborated with the Ohlsson lab to identify more than 400 of new CTS by generating DNA microarrays of clones derived from DNA of shared chromatin fractions immunopurified with CTCF antibodies. These CTS, both single copy and repetitive, are found in loci involved in multiple cellular functions, such as metabolism, neurogenesis, growth and signaling. Using a toxin-based EnhBA-trapping vector, we also showed that the majority of these targets mediate chromatin-insulator functions. As the majority of the sites represented by the M-S class of CTS, we feel confident that a CTCF-based network emerges as a major determinant of epigenetic states. To develop in mammals a targeting mechanism required for re-setting and reading of epigenetic marks at certain genome regions - which without exception turned out to harbor a subset of M-S CTCF sites necessary for the parent-of-origin-dependent gene-expression - mammalian CTCF has COOPTED a new function in gene-imprinting by the means of an unprecedented self-duplication and divergence that created the pair of male and female germ-cell-specific CTCF-counterparts with the same 11 ZF domain: testis-specific BORIS gene (cloned, patented and published by us last year) and oocyte-specific Natasha gene (which we partially cloned in August this year), respectively. The work on these two genes is reviewed in the other project report. Only by having these three mammalian genes working together on the SAME DNA SEQUENCES recognized by the same 11 ZF domain shared by all three factors, it has become possible to achieve both germline transmission (resetting) of maternal (Natasha-directed) and of paternal (BORIS-directed) methylation marks at CTS, and reading of these marks by somatic CTCF. This implies that CTCF homologue that we identified in Drosophila, may encode THE ONLY TRUE ANCESTOR of the universal vertebrate insulator factor, CTCF. This year, due to the outstanding work of Dr. Hanlim Moon (who finished her training with V. Lobanenkov and accepted a senior position in her home country) and establishing of collaboration with Dr. Jumin Jhou who performed insulator assays in flies, we obtained direct evidence for this view. The evidence is based on the combination of the 3 major arguments and experimental results: [1] All DNA sequences ( including scs, scs?, FAB-8, globin HS4, etc ), which are capable of binding to BOTH Drosophila CTCF and to mammalian CTCF factors, manifested equally-efficient EnhBA in BOTH Drosophila and in mammalian systems. [2] In contrast, such DNA sequences, which perfectly well bind to mammalian CTCF but do NOT BIND to drCTCF (exemplified in our studies by the negative outcome of screening the 2 kb of the H19 ICR with the recombinant drcTCF), manifested insulator function only in mammalian cells BUT NOT in Drosophila. [3] Except drCTCF, all 3 known OTHER genes involved in the insulator function in flies, including Drosophila Su(Hw), BEAF-32 and Zw5, have NO vertebrate HOMOLOGUES, which one would expect to exist for a FOUNDER INSULATOR gene. Thus, we concluded that drCTCF is indeed the true ANCESTOR of the universal vertebrate insulator gene. We made a library of drCTS (with keeping in mind creating a genome-wide map of gene clusters separated by drCTS), but noticed that in addition to the intergenic sites (which are clearly the main drCTCF targets throughout the fly genome), drCTS with EhBA were also found in the Notch and scs' and other elements, which in addition to EnhBA also have known PROMOTER ACTIVITY. This observation suggested that the EnhBA of drCTCF is closely associated with the ability to function as a decoy promoter. This year, we finalized a joint project with Dr. Elena Klenova to confirm that CTCF directly interacts with of Pol II holoenzyme to initiate transcription from intergenic CTS.