If you want to delve into the staggering depths of the human genome, you can now peruse our genetic blueprint in its entirety. On Thursday, researchers officially published the 3.055-billion base pair (bp) sequence in the journal Science.
Working with the private company Celera Genomics, the Human Genome Project completed the first draft of the effort in June 2000. This feat generated 92 percent of the human genome sequence, allowing scientists to examine the sequence of 46 chromosomes in 23 pairs that represent tens of thousands of individual genes. However, gaps remained in parts of the genome that could not be resolved with the technology of the time, called the bacterial artificial chromosome.
Scientists used a bacterium to clone each piece of the genome, and then studied them in smaller groups. But this process inherently misses some portions of the whole genome, particularly sections of certain genes in heterochromatin regions (typically found in the centromeres and telomeres, where DNA is highly repetitive).
All genes consists of base pairs made of adenine (A), thymine (T), guanine (G), and cytosine (C). There are billions of these base pairs in the human genome. The heterochromatin regions are made of tightly packed blocks of genetic material filled with repeated sequences of DNA. They are therefore harder to decipher than other types of genetic material.
Two companies’ techniques recently made it easier to break down the code. California-based Pacific Biosciences used a system called HiFi, which provides superior accuracy for longer sequences. HIFi circularized base pairs, arranging them into a circle to read them more easily. U.K.-based Oxford Nanopore Technologies pressed strands of base pairs through a microscopic nanopore—just one molecule at a time—where an electrical current zaps them in order to observe what kind of molecule they are. By zapping each molecule, scientists can identify the full strand.
The newly-sequenced sections include nearly 200 million base pairs. Scientists are now studying them for their functions. Knowing the entire genomic sequence allows researchers to formulate precise medicines and to better understand how to vaccinate against viruses like Covid-19.
The published sequence represents the genetic material of only one human, however. With the efforts of the Human Pangenome Reference Consortium, one of the next steps is to sequence the genomes of people from different parts of the globe. Scientists hope to have a more robust, diverse source of genetic material to study.