The signal elements in promoter sequences are not well characterized. We developed statistical tests to find nucleotide words (generally of length 8) that appear localized relative to TSSs (transcription start site). These words constituted "seeds" for expansion to develop PSSMs (position-specific scoring matrices) characterizing systems of co-regulated genes. To this end, Dr. Marino-Ramirez collected a database of about 4700 sequences around the TSS of human genes. The database was exceptionally well characterized, and ideal for our statistical study. We used a Poisson scan statistic to determine whether occurrences of a given 8-letter DNA word are clustered unusually relative to the TSS. The Poisson scan statistic also identified clusters of significant words. We have developed a database of positionally significant clusters and a Gibbs sampling program, A-GLAM, to further our exploration of transcriptional regulatory elements using anchored alignments. Our next step is to include Bayesian sampling methods to incorporate positional information into A-GLAM's analysis. We are also validating our results with microarray data and gene ontology information.