Comprehensive mapping of long-range interactions reveals folding principles of the human genome

From Soft-Matter
Jump to: navigation, search

Entry by Leon Furchtgott, APP 225 Fall 2010.

Erez Lieberman-Aiden*, Nynke L. van Berkum*, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326 (2009).


This paper is about Hi-C, a method that probes the tree-dimensional architecture of whole genomes. The authors construct a spatial proximity map of the human genome with Hi-C at a resolution of 1 megabase. The map shows that the genome is spatially segregated into two genome-wide compartments corresponding to open and closed chromatin. The chromatin conformation is consistent with a fractal globule polymer conformation as opposed to an equilibrium globule conformation.


Understanding how chromosomes fold can provide insight into the complex relationships between chromatin structure, gene activity, and the functional state of the cell. Yet beyond the scale of nucleosomes, little is known about chromatin organization. Up until now long-range interactions between loci could only be probed for specific pairs of loci, not on the level of every combination of loci. This paper introduces a new method to obtain this degree of resolution and suggests some implications from the analysis of these sets of data.


Hi-C method: DNA from cells is digested with a restriction enzyme that leaves a 5′ overhang; the 5′ overhang is filled, including a biotinylated residue; and the resulting blunt-end fragments are ligated under dilute conditions that favor ligation events between the cross-linked DNA fragments. The resulting DNA sample contains ligation products consisting of fragments that were originally in close spatial proximity in the nucleus, marked with biotin at the junction. A Hi-C library is created by shearing the DNA and selecting the biotin-containing fragments with streptavidin beads. The library is then analyzed by using massively parallel DNA sequencing, producing a catalog of interacting fragments (Fig. 1).

Fig. 1.Overview of Hi-C. (A) Cells are cross-linked with formaldehyde, resulting in covalent links between spatially adjacent chromatin segments (DNA fragments shown in dark blue, red; proteins, which canmediate such interactions, are shown in light blue and cyan). Chromatin is digested with a restriction enzyme (here, HindIII; restriction site marked by dashed line; see inset), and the resulting sticky ends are filled in with nucleotides, one of which is biotinylated (purple dot). Ligation is performed under extremely dilute conditions to create chimeric molecules; the HindIII site is lost and an NheI site is created (inset). DNA is purified and sheared. Biotinylated junctions are isolated with streptavidin beads and identified by paired-end sequencing. (B) Hi-C produces a genome-wide contact matrix. The submatrix shown here corresponds to intrachromosomal interactions on chromosome 14. (Chromosome 14 is acrocentric; the short arm is not shown.) Each pixel represents all interactions between a 1-Mb locus and another 1-Mb locus; intensity corresponds to the total number of reads (0 to 50). Tick marks appear every 10 Mb. (C and D) We compared the original experiment with results from a biological repeat using the same restriction enzyme [(C), range from 0 to 50 reads] and with results using a different restriction enzyme [(D), NcoI, range from 0 to 100 reads].

Analysis of this data gives long-range contacts between segments more than 20 kb apart. The authors thus construct a genome-wide contact matrix by dividing the genome into 1-Mb regions and defining matrix entries m(i,j) to be the number of ligations products between two loci i and j. (Obviously this means that the contact matrices are symmetric). The contact matrices are highly reproducible (1B,C,D).

Using these matrices, the authors can also compute interesting statistics about the 3-D structure of DNA such as the contact probability <math>I_n(s)</math>, for paris of loci separated by a genomic distance s on chromosome n (Fig. 3A). <math>I_n(s)</math> decreases monotonically on every chromosome, suggesting polymer-like behavior. However, even at distances greater than 200 Mb, <math>I_n(s)</math> is always much greater than the average contact probability between different chromosomes -- this confirms the existence of polymer territories (chromosomes are not all intertwined).

Fig. 2 Correlation matrix illustrates the correlation [range from – (blue) to +1 (red)] between the intrachromosomal interaction profiles of every pair of 1-Mb loci along chromosome 14. The plaid pattern indicates the presence of two compartments within the chromosome.

When the authors zoomed into a single chromosome, they found large blocks of enriched and depleted interactions, generating a plaid pattern (Fig 2). This suggests that each chromosome can be decomposed into two sets of loci such that contacts within each set are enriched and contacts between sets are depleted. The authors find that one compartment is associated with open, accessible, actively transcribed chromatin, whereas the other consists of inactivated genes.

Fig. 3 The local packing of chromatin is consistent with the behavior of a fractal globule. (A) Contact probability as a function of genomic distance averaged across the genome (blue) shows a power law scaling between 500 kb and 7 Mb (shaded region) with a slope of –1.08 (fit shown in cyan). (B) Simulation results for contact probability as a function of distance (1 monomer ~ 6 nucleosomes ~ 1200 base pairs) (10) for equilibrium (red) and fractal (blue) globules. The slope for a fractal globule is very nearly –1 (cyan), confirming our prediction (10). The slope for an equilibrium globule is –3/2, matching prior theoretical expectations. The slope for the fractal globule closely resembles the slope we observed in the genome. (C) (Top) An unfolded polymer chain, 4000 monomers (4.8 Mb) long. Coloration corresponds to distance from one endpoint, ranging from blue to cyan, green, yellow, orange, and red. (Middle) An equilibrium globule. The structure is highly entangled; loci that are nearby along the contour (similar color) need not be nearby in 3D. (Bottom) A fractal globule. Nearby loci along the contour tend to be nearby in 3D, leading to monochromatic blocks both on the surface and in cross section. The structure lacks knots. (D) Genome architecture at three scales. (Top) Two compartments, corresponding to open and closed chromatin, spatially partition the genome. Chromosomes (blue, cyan, green) occupy distinct territories. (Middle) Individual chromosomes weave back and forth between the open and closed chromatin compartments. (Bottom) At the scale of single megabases, the chromosome consists of a series of fractal globules.

Finally, the authors examine chromatin structure within compartments. They observe a power-law scaling of the intra-chromosomal contact probability, specifically, contact probability scales as the inverse of genomic distance.

Various authors have proposed that chromosomal regions can be modeled as an “equilibrium globule”: a compact, densely knotted configuration originally used to describe a polymer in a poor solvent at equilibrium Grosberg et al. proposed an alternative model, theorizing that polymers, including interphase DNA, can self-organize into a long-lived, nonequilibrium conformation that they described as a “fractal globule.” This highly compact state is formed by an unentangled polymer when it crumples into a series of small globules in a “beads-on-a-string” configuration. These beads serve as monomers in subsequent rounds of spontaneous crumpling until only a single globule-of-globules-of-globules remains. In a fractal globule, contiguous regions of the genome tend to form spatial sectors whose size corresponds to the length of the original region (Fig. 3C). In contrast, an equilibrium globule is highly knotted and lacks such sectors; instead, linear and spatial positions are largely decorrelated after, at most, a few megabases (Fig. 3C). The fractal globule has not previously been observed.

The authors perform Monte Carlo simulations to generate ensembles of fractal globules and equilibrium globules. They find that the properties of the fractal globules match those of the Hi-C data. Whereas the contact probability goes as the power -3/2 in equilibrium globules, it goes as -1 in fractal globules, like the observed data.

Discussion / Relation to Soft Matter

This is an interesting paper. The Hi-C experimental method is very powerful and novel. The discussion of polymer scaling is particularly relevant to the course and highlights how finding out how an actual polymer scales can be quite complicated.