Comprehensive mapping of long-range interactions reveals folding principles of the human genome

From Soft-Matter
Revision as of 18:50, 5 December 2010 by Furchtg (Talk | contribs)

Jump to: navigation, search

Entry by Leon Furchtgott, APP 225 Fall 2010.

Erez Lieberman-Aiden*, Nynke L. van Berkum*, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326 (2009).


This paper is about Hi-C, a method that probes the tree-dimensional architecture of whole genomes. The authors construct a spatial proximity map of the human genome with Hi-C at a resolution of 1 megabase. The map shows that the genome is spatially segregated into two genome-wide compartments corresponding to open and closed chromatin. The chromatin conformation is consistent with a fractal globule polymer conformation as opposed to an equilibrium globule conformation.


Understanding how chromosomes fold can provide insight into the complex relationships between chromatin structure, gene activity, and the functional state of the cell. Yet beyond the scale of nucleosomes, little is known about chromatin organization. Up until now long-range interactions between loci could only be probed for specific pairs of loci, not on the level of every combination of loci. This paper introduces a new method to obtain this degree of resolution and suggests some implications from the analysis of these sets of data.


Hi-C method: DNA from cells is digested with a restriction enzyme that leaves a 5′ overhang; the 5′ overhang is filled, including a biotinylated residue; and the resulting blunt-end fragments are ligated under dilute conditions that favor ligation events between the cross-linked DNA fragments. The resulting DNA sample contains ligation products consisting of fragments that were originally in close spatial proximity in the nucleus, marked with biotin at the junction. A Hi-C library is created by shearing the DNA and selecting the biotin-containing fragments with streptavidin beads. The library is then analyzed by using massively parallel DNA sequencing, producing a catalog of interacting fragments (Fig. 1).

Fig. 1.Overview of Hi-C. (A) Cells are cross-linked with formaldehyde, resulting in covalent links between spatially adjacent chromatin segments (DNA fragments shown in dark blue, red; proteins, which canmediate such interactions, are shown in light blue and cyan). Chromatin is digested with a restriction enzyme (here, HindIII; restriction site marked by dashed line; see inset), and the resulting sticky ends are filled in with nucleotides, one of which is biotinylated (purple dot). Ligation is performed under extremely dilute conditions to create chimeric molecules; the HindIII site is lost and an NheI site is created (inset). DNA is purified and sheared. Biotinylated junctions are isolated with streptavidin beads and identified by paired-end sequencing. (B) Hi-C produces a genome-wide contact matrix. The submatrix shown here corresponds to intrachromosomal interactions on chromosome 14. (Chromosome 14 is acrocentric; the short arm is not shown.) Each pixel represents all interactions between a 1-Mb locus and another 1-Mb locus; intensity corresponds to the total number of reads (0 to 50). Tick marks appear every 10 Mb. (C and D) We compared the original experiment with results from a biological repeat using the same restriction enzyme [(C), range from 0 to 50 reads] and with results using a different restriction enzyme [(D), NcoI, range from 0 to 100 reads].

Analysis of this data gives long-range contacts between segments more than 20 kb apart. The authors thus construct a genome-wide contact matrix by dividing the genome into 1-Mb regions and defining matrix entries m(i,j) to be the number of ligations products between two loci i and j. (Obviously this means that the contact matrices are symmetric). The contact matrices are highly reproducible (1B,C,D).

Using these matrices, the authors can also compute interesting statistics about the 3-D structure of DNA such as the contact probability <math>I_n(s)</math>, for paris of loci separated by a genomic distance s on chromosome n (Fig. 3A). <math>I_n(s)</math> decreases monotonically on every chromosome, suggesting polymer-like behavior. However, even at distances greater than 200 Mb, <math>I_n(s)</math> is always much greater than the average contact probability between different chromosomes -- this confirms the existence of polymer territories (chromosomes are not all intertwined).

Fig. 2
Fig. 3

Discussion / Relation to Soft Matter