The task ahead is to produce a finished sequence, by closing all gaps and resolving all ambiguities.
Much work remains to be done to produce a complete finished sequence, but the vast trove of information that has become available through this collaborative effort allows a global perspective on the human genome.
Although the details will change as the sequence is finished, many points are already clear.•The genomic landscape shows marked variation in the distribution of a number of features, including genes, transposable elements, GC content, Cp G islands and recombination rate. For example, the developmentally important HOX gene clusters are the most repeat-poor regions of the human genome, probably reflecting the very complex coordinate regulation of the genes in the clusters.•There appear to be about 30,000–40,000 protein-coding genes in the human genome—only about twice as many as in worm or fly.
However, the genes are more complex, with more alternative splicing generating a larger number of protein products.•The full set of proteins (the ‘proteome’) encoded by the human genome is more complex than those of invertebrates.
This is due in part to the presence of vertebrate-specific protein domains and motifs (an estimated 7% of the total), but more to the fact that vertebrates appear to have arranged pre-existing components into a richer collection of domain architectures.•Hundreds of human genes appear likely to have resulted from horizontal transfer from bacteria at some point in the vertebrate lineage.
It is the first vertebrate genome to be extensively sequenced.
And, uniquely, it is the genome of our own species.The scientific progress made falls naturally into four main phases, corresponding roughly to the four quarters of the century.The first established the cellular basis of heredity: the chromosomes.The sequence was produced over a relatively short period, with coverage rising from about 10% to more than 90% over roughly fifteen months.The sequence data have been made available without restriction and updated daily throughout the project.In this paper, we start by presenting background information on the project and describing the generation, assembly and evaluation of the draft genome sequence.