Comprehensive mapping of long-range interactions reveals folding principles of the human genome
Entry by Leon Furchtgott, APP 225 Fall 2010.
Erez Lieberman-Aiden*, Nynke L. van Berkum*, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326 (2009).
This paper is about Hi-C, a method that probes the tree-dimensional architecture of whole genomes. The authors construct a spatial proximity map of the human genome with Hi-C at a resolution of 1 megabase. The map shows that the genome is spatially segregated into two genome-wide compartments corresponding to open and closed chromatin. The chromatin conformation is consistent with a fractal globule polymer conformation as opposed to an equilibrium globule conformation.
Understanding how chromosomes fold can provide insight into the complex relationships between chromatin structure, gene activity, and the functional state of the cell. Yet beyond the scale of nucleosomes, little is known about chromatin organization. Up until now long-range interactions between loci could only be probed for specific pairs of loci, not on the level of every combination of loci. This paper introduces a new method to obtain this degree of resolution and suggests some implications from the analysis of these sets of data.
Hi-C method: DNA from cells is digested with a restriction enzyme that leaves a 5′ overhang; the 5′ overhang is filled, including a biotinylated residue; and the resulting blunt-end fragments are ligated under dilute conditions that favor ligation events between the cross-linked DNA fragments. The resulting DNA sample contains ligation products consisting of fragments that were originally in close spatial proximity in the nucleus, marked with biotin at the junction. A Hi-C library is created by shearing the DNA and selecting the biotin-containing fragments with streptavidin beads. The library is then analyzed by using massively parallel DNA sequencing, producing a catalog of interacting fragments (Fig. 1).
Analysis of this data gives long-range contacts between segments more than 20 kb apart. The authors thus construct a genome-wide contact matrix by dividing the genome into 1-Mb regions and defining matrix entries m(i,j) to be the number of ligations products between two loci i and j. (Obviously this means that the contact matrices are symmetric). The contact matrices are highly reproducible (1B,C,D).
Using these matrices, the authors can also compute interesting statistics about the 3-D structure of DNA such as the contact probability <math>I_n(s)</math>, for paris of loci separated by a genomic distance s on chromosome n (Fig. 3A). <math>I_n(s)</math> decreases monotonically on every chromosome, suggesting polymer-like behavior. However, even at distances greater than 200 Mb, <math>I_n(s)</math> is always much greater than the average contact probability between different chromosomes -- this confirms the existence of polymer territories (chromosomes are not all intertwined).