The goal of the "Genome sequence variation" project is to understand the genomic structure of variations, the fundamental forces that have shaped this structure, and to use this knowledge for understanding the genetic causes of diseases. First, we have analyzed the overlapping portion of large insert (BAC) clones sequenced for the construction of a public human reference sequence. We have found 500,000 computational candidate SNPs. These candidates were verified to be high quality predictions in independent laboratory experiments (Marth et al., Nature Genetics 2001). We analyzed the genome distribution of these SNPs, and found that nucleotide diversity correlated with structural and functional features such as G+C content, CpG di-nucleotide content, repeat content, recombination frequency, and coding features. However, the variance in these correlations is so large that even all these features together are only very poor predictors of local values of nucleotide diversity. This shoed that random forces (genetic drift) is likely the main component in the description of nucleotide diversity. We have studied genetic drift under realistic recombination and mutation values, and dynamic models of population history, and found that a bottleneck shaped model accounts for the data at all length scales we analyzed (Marth et al., PNAS, in press). Among other population-genetic conclusions, this predicts a reduced level of linkage disequilibtium in the genome of the population (or populations) represented in the public genome sequence compared to previous expectations. This prediction has since been confirmed in various studies from other laboratories. In the BAC overlaps, we have also found over 100,000 deletion/insertion polymorphisms (DIPs). Thousands of these were analyzed by our collaborator, and results reported (Weber et al., AJHG 2002).