Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity:...

36
1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi 1 , Enrique Carrillo de Santa Pau 1 , Biola Maria Javierre 2 , Peter Fraser 2 , Mikhail Spivakov 2 , Alfonso Valencia 1 , Daniel Rico 1 1 Spanish National Cancer Research Centre (CNIO), Madrid, Spain; 2 The Babraham Institute, Cambridge, United Kingdom Abstract Background The field of 3D chromatin interaction mapping is changing our point of view on the genome. Despite the increase in the number of studies, there are surprisingly few network analyses performed on these datasets and the network topology is rarely considered. Assortativity is a network property that has been widely used in the social sciences to measure the probability of nodes with similar values of a specific feature to interact preferentially. We propose a new approach, the Chromatin feature Assortativity Score (ChAS), to integrate the epigenomic landscape of a specific cell type with its chromatin interaction network. Results We analyse two chromatin interaction datasets for embryonic stem cells, which were generated with two very recent promoter capture HiC methods. These datasets define networks of interactions amongst promoters and between promoters and other genomic loci. We calculate the presence of a collection of 78 chromatin features in the chromatin fragments that constitute the nodes of the network. Looking at the ChAS of these epigenomic features in the interaction networks, we find Polycomb Group proteins and associated histone marks to be the most assortative factors. Remarkably, we observe higher ChAS of the actively elongating form of RNA Polymerase 2 compared to the inactive forms in interactions between promoters and other distal elements, suggesting an important role for active elongation in promoter-enhancer contacts. Conclusions

Transcript of Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity:...

Page 1: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

1

Chromatin assortativity: Integrating epigenomic data

and 3D genomic structure

Vera Pancaldi1, Enrique Carrillo de Santa Pau1, Biola Maria Javierre2, Peter Fraser2,

Mikhail Spivakov2, Alfonso Valencia1, Daniel Rico1

1Spanish National Cancer Research Centre (CNIO), Madrid, Spain; 2The Babraham Institute,

Cambridge, United Kingdom

Abstract Background

The field of 3D chromatin interaction mapping is changing our point of view on the genome.

Despite the increase in the number of studies, there are surprisingly few network analyses

performed on these datasets and the network topology is rarely considered. Assortativity is a

network property that has been widely used in the social sciences to measure the probability

of nodes with similar values of a specific feature to interact preferentially. We propose a new

approach, the Chromatin feature Assortativity Score (ChAS), to integrate the epigenomic

landscape of a specific cell type with its chromatin interaction network.

Results

We analyse two chromatin interaction datasets for embryonic stem cells, which were

generated with two very recent promoter capture HiC methods. These datasets define

networks of interactions amongst promoters and between promoters and other genomic loci.

We calculate the presence of a collection of 78 chromatin features in the chromatin

fragments that constitute the nodes of the network. Looking at the ChAS of these

epigenomic features in the interaction networks, we find Polycomb Group proteins and

associated histone marks to be the most assortative factors. Remarkably, we observe higher

ChAS of the actively elongating form of RNA Polymerase 2 compared to the inactive forms

in interactions between promoters and other distal elements, suggesting an important role

for active elongation in promoter-enhancer contacts.

Conclusions

Page 2: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

2

Our method can be used to compare the association of epigenomic features to different type

of genomic contacts. Furthermore, it facilitates the comparison of any number of chromatin

interaction datasets in the context of the corresponding epigenomic landscape.

Introduction Advances in chromatin interaction mapping have dealt a final blow to the linear vision of the

genome, leading us to a more realistic well organized fractal globule picture (Mirny, 2011)

with resolution increasing from a megabase to less than a kilobase in just 5 years

(Lieberman-Aiden et al., 2009; Mifsud et al., 2015; Pombo & Dillon, 2015; Sahlén et al.,

2015; Schoenfelder, Furlan-Magaril, et al., 2015; Schoenfelder, Sugar, et al., 2015).

However, our understanding of what determines this three-dimensional structure and of its

functional importance remains limited. Starting from the first papers modelling DNA as a

polymer and the genome as a fractal globule (Mirny, 2011), scientists have been looking for

a connection between the chromatin contact configuration and the regulation of gene

expression. It is now accepted that gene regulation happens as much through distal

enhancer elements as through proximal promoters and the distinction between promoters

and enhancers has itself been put to test (Andersson, 2015; Feuerborn & Cook, 2015).

The combination of chromatin capture experiments with next-generation sequencing (NGS)

has enabled the characterization of chromatin contacts at an unprecedented level of detail.

Different techniques yield different views of the genome. High-throughput conformation

capture (HiC) is an unbiased approach that allows to investigate the three-dimensional

structure of the genome of a given cell type (Lieberman-Aiden et al., 2009), even in single

cells (Nagano et al., 2013) and across species (Dixon et al., 2012).

The HiC technique assays, in principle, all versus all chromosomal contacts, requiring very

high sequencing coverage and making it very costly. Alternative approaches allow

exploration of the contacts of a subset of genomic regions, with higher resolution at the

same cost. For example, Chromatin Interaction Analysis by Paired-End Tag sequencing

(ChIA-PET) (Fullwood et al., 2009) analyzes only those interactions that are mediated by a

protein of interest by pulling down only the interacting fragments that include this protein.

Page 3: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

3

Recently, other capture approaches were developed that enable to selectively analyse only

interactions involving specific classes of genomic fragments genome-wide. For example,

Capture HiC was recently used to identify the chromatin interactions involving colorectal

cancer risk loci (Jäger et al., 2015). A similar approach is used in promoter capture HiC

(PCHi-C) (Schoenfelder, Furlan-Magaril, et al., 2015) which detects both promoter-promoter

interactions and interactions of promoters with any other non-promoter regions. These

interactions are therefore identified irrespective of target promoter activity and across the

whole range of linear genomic distances between fragments. HiCap (Sahlén et al., 2015) is

a similar approach to detect promoter-centered chromatin interactions. The two methods

provide a complementary view of chromatin interactions as PCHi-C yields larger fragments

(average fragment size 5Kb) and longer interaction ranges (on average 250Kb), whereas

HiCap has better resolution (average fragment size < 1Kb) but less coverage of long range

interactions. Thanks to these new techniques, we can now use interactions between non-

coding parts of the genome and genes to interpret the wealth of disease associated genomic

variation data which were so far unexplained (Jaeger et al. 2015).

The increasing availability of 3D interaction datasets for multiple cell types and organisms

has prompted the development of different data processing approaches. There are still

important hurdles in these analyses. One is the detection of biologically significant

interactions from the background noise of interactions purely due to linear proximity of the

two fragments on the genome; another is the averaging effect that is produced by

heterogeneity of contacts in different cells (Lajoie, Dekker, & Kaplan, 2015). Moreover, no

single unified standards are available for these types of data, hindering the direct

comparison between the contact maps in different cell types, species or conditions (Ay &

Noble, 2015).

Given the complexity of these datasets, it is intuitive and useful to represent them as

networks, in which each chromatin fragment is a node and each edge (link) represents a

significant interaction between two chromatin fragments. This framework allows us to study

the properties of the 3D chromatin structure using tools from network theory. The booming

field of network science provides a useful toolbox and different metrics that can be used to

compare and interpret chromatin contact network datasets. For example, one can identify

the most connected nodes or look for functional relationships between nodes that interact

more than expected by chance (Hoang & Bekiranov, 2013).

Page 4: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

4

Surprisingly few papers have dealt with network analysis approaches applied to chromatin

interaction networks (CINs) (Babaei et al., 2015; Hoang & Bekiranov, 2013; Kruse, Sandhu

et al., 2012; Sewitz, & Babu, 2012), with the aim of unravelling general principles of 3D

chromatin organization. For example, In the pioneering work by Sandhu et al. (Sandhu et al.

2012), the CIN is constructed starting from RNAPII ChIA pET performed in mouse

embryonic stem cells (mESCs) to obtain a single large connected component. An accurate

network analysis revealed the functional organization of different chromatin communities. A

similar analysis, performed on the budding yeast CIN, showed that cohesin mediates highly

interconnected interchromosomal subnetworks (cliques) which are stable and have similar

replication timing (Hoang & Bekiranov, 2013).

Contrary to the techniques that filter for chromatin interactions mediated by specific proteins,

for example by pull down of a protein as in ChIA-PET, the newer HiC methods can in

principle detect all possible interactions genome-wide. It thus becomes possible to identify

the different epigenetic modifications and proteins that associate to the different types of

interactions. More specifically, PCHi-C is able to detect sufficient numbers of promoter-

promoter interactions such that we can address the questions of which chromatin features

are present in interactions involving only promoters versus interactions of promoters with

other distal elements. We can thus go beyond the topological characterization of the network

to gain insight into the organization of epigenetic marks in the nucleus.

To this end, we project the linear chromatin context information directly onto the 3D network,

preserving its topology. We focus our analysis on mouse embryonic stem cells (mESCs) as

chromatin interactions for this cell type have been assayed by multiple techniques and a

very comprehensive epigenetic characterization is available. We analyze interaction

networks derived by state-of-the-art promoter capture HiC where we quantify the

assortativity of 78 chromatin features (3 cytosine modifications, 13 histone modifications and

62 chromatin-related proteins (CrPs) binding peaks (Juan et al., 2014).

In social sciences, assortativity is used to measure the extent to which similar people tend to

connect with similar people. Whereas in society it is easy to imagine which principles might

lead people from the same ethnic origin or cultural background to establish social ties, we

are still investigating principles that organize chromatin in the nucleus. We borrow the

concept of assortativity to introduce the Chromatin feature Assortativity Score (ChAS), which

Page 5: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

5

identifies features shared by fragments that interact preferentially. Identifying features with

high ChAS can lead us to candidates for factors that might mediate these interactions.

Through this novel analysis, we gain interesting insight regarding different RNA Polymerase

II (RNAPII) variants and 3D chromatin structure. More specifically we note a different

configuration of actively elongating RNAPII forms in promoter-other end contacts compared

to other RNAPII variants. This finding is confirmed in three independent datasets and it

suggests that actively elongating RNAPII is involved in the contact between regulatory

elements and their targets.

Results The Chromatin Interaction Network (CIN) To build the CIN, we used the recent PCHi-C dataset of mESCs from Schoenfelder et al.

(Schoenfelder, Furlan-Magaril, et al., 2015), including interactions amongst promoters and

between promoters and other genomic elements. The PCHi-C data was processed using the

CHiCAGO algorithm, a capture HiC specific method that filters out contacts that are

expected by chance given the linear proximity of the interacting fragments on the genome

(Cairns et al., 2015). The resulting network has 55,845 nodes and 69,987 connections

(Figure 1). Of these interactions, 20,523 interactions connect a promoter fragment with

another promoter fragment (PP edges) and 49,464 interactions connect promoters with non-

promoter “other end” fragments (PO edges).

As in many networks, we can observe a main large connected component that consists of

35,293 nodes (63% of total nodes) joined by 52,984 edges (76% of total edges). There are

264 disconnected components with more than 10 nodes and about 4,000 additional small

components. Each chromatin fragment has an average of 2.5 neighbours.

Constructing another network with only PP edges we find a largest connected component

which encompasses only 27% of the total P nodes and 48% of the total PP edges, implying

that the interconnectedness of the total network relies heavily on the PO interactions (See

supplementary Figure 1). Promoter fragments (P-fragments) interact with an average of 2

other P-fragments and 3 Other element fragments (O-fragments).

Page 6: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

6

Epigenomic features associated to chromatin fragments participating in 3D contacts

For each fragment in the PCHi-C network, we mapped a large set of 78 epigenomic features

(Juan et al., 2014) including cytosine modifications, histone marks and ChIP-seq peaks of

chromatin-related proteins (CrPs) such as transcription factors and members of chromatin

complexes including cohesin, CTCF, Polycomb Group (PcG) and different RNAPII variants.

For each chromatin fragment we calculate the fraction covered by peaks of a specific feature

and we define the abundance of each feature as the average of this value over all fragments

in the network (see methods). Figure 2a shows the fraction of fragments covered by EZH2

binding sites. We noticed the strong accumulation of the nodes that have binding sites for

this factor in specific regions of the network. Strikingly, this localization of the signal is

observed despite the low overall prevalence of EZH2 binding in the fragments. We therefore

set out to investigate and quantify the relation between abundance and strength of

localization (measured by the ChAS) of different features on the network.

Definition of the Chromatin feature Assortativity Score (ChAS) To measure the extent to which features tend to localize in defined neighbourhoods of the

CIN network, we calculate the Chromatin feature Assortativity Score (ChAS). Since ChAS is

a Pearson correlation it will take values between -1 and 1. More specifically, we can have

three cases (Figure 2b):

1) Fragments that have a certain value of the feature (that is a certain proportion of the

fragment has peaks of that feature) always interact with other fragments which have similar

values of the same feature, but not with other fragments. In this case the ChAS for that

feature will be positive (ChAS>0). This situation would indicate that this feature is potentially

associated to chromatin contacts.

2) Alternatively, there might be no relationship between the values of the feature on

fragments and values on their neighbours. In this case we will have ChAS=0. This can

happen either when the feature values do not have anything to do with the contacts, or when

the feature values are very homogeneous in the network: either the feature is low on all

Page 7: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

7

fragments (as would be the case for a very rare chromatin mark, or high on all fragments (as

would be the case for ubiquitous chromatin marks).

3) Finally it could be that fragments that have high values of the a given feature always

interact with fragments with low values of that same feature. In this case we will have

negative ChAS (ChAS<0). This suggests that there are two classes of fragments in the

network, characterized by different features, which tend to interact with each other.

For this reason, it is important to consider the abundance of a feature (defined above as the

fraction of fragment covered by the feature averaged over fragments) together with the

ChAS value.

To summarize, firstly we are interested in features that have high positive ChAS and

especially the ones with low abundance, as this signifies that the mark is rare but appears to

be localized in specific areas of the network. These features are thus very probably involved

in the chromatin contacts. Secondly, we are looking for features with negative ChAS, which

should be typical of one subclass of fragments that always interact with a different subclass

of fragments. These scenarios are represented in Figure 2c, which is a schematic scatter

plot of ChAS vs abundance.

To judge whether the ChASs obtained in a specific network are significantly different from

random, we perform randomizations of the assignment of features to the fragments and

check whether the observed ChAS is different from the randomized version (see Methods

and supp.Figure 2).

ChAS of chromatin features in promoter-capture mESC chromatin interaction networksWe

calculated the ChAS for the 78 chromatin features in the entire PCHi-C network and

compared these values with the corresponding abundance values (Figure 3A). The PcG

proteins (EZH2, PHF19, RING1B, SUZ12, CBX7) and histone marks associated with them

(H3K27me3, H2Aub1) have the highest ChAS values (ranging from 0.2 to 0.35, Figure 3A),

suggesting that this complex might be involved in establishing the 3D structure of chromatin

in mESC cells, as was already observed for the Hox gene cluster (Schoenfelder, Sugar, et

al., 2015, Denholtz et al 2013). RNA polymerase II (RNAPII) also has high ChAS, especially

the variant implicated in transcriptional elongation (RNAPII_S2P, Figure 3A). On the other

Page 8: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

8

hand, H3K4me3, a modification associated with active promoters, is a very abundant mark

(fourth most abundant, Abundance=0.12), but it has low ChAS (0.03).

The properties of the CIN are clearly very dependent on the experimental technique and the

data processing algorithms used. We therefore proceeded to test whether the biological

findings presented above are confirmed in other CINs constructed for the same cell type. To

this end, we analysed an alternative recently published mESC promoter-centred chromatin

interaction dataset (Sahlén et al., 2015). Sahlen et al. applied HiCap (another promoter

capture method simi to PCHi-C) also to mESCs, identifying ineractions of promoters. We

filtered out the contacts between non-promoter fragments from the Sahlen et al. Dataset,

and generated a network of 87,823 nodes with 173,801 interactions (of which 19,309

promoter nodes and 82,659 PP interactions). The HiCap technique is complementary to

PCHi-C, since a different enzyme is used for the restriction step, generating shorter

interaction fragments compared to PCHi-C (median size 599 bp vs close to 4,000 bp for

PCHi-C). The shorter fragments produce a higher resolution picture of contacts between

nearby fragments, at the expense of reduced coverage of large-scale interactions.

Visualizing the network shows that most of the HiCap subnetworks unite fragments

belonging to the same chromosome (Supp Figure 3) but the largest connected component

is comparatively smaller than in PCHi-C. We analysed the HiCap network in combination

with the 78 chromatin features previously introduced.

We repeated the calculation of ChAS of the chromatin features using the HiCap network as

described above for the PCHi-C network. The plot displaying ChAS vs abundance of our

chromatin features in the HiCap network (Figure 3B) shows that many of the high ChAS

features in PCHi-C also have high ChAS in HiCap. For example, the PcG components are

confirmed amongst the features with the highest ChAS, as was observed in the PCHi-C

analysis, together with RNAPII, especially the S2P variant. In summary, we have shown that

ChAS is a useful metric to detect those epigenomic features that might be more influential in

the promoter-centered CINs.

ChAS of chromatin features in RNAPII and cohesin ChIA-PET-derived interaction networks

Page 9: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

9

Having successfully reproduced our results in two different datasets obtained with very

different capture HiC networks already gives us confidence that our results are general and

biologically founded. However, to better interpret our results, we decided to test our method

on CINs for which we knew what features dominate the fragments and mediate the contacts.

We therefore repeated the analysis on two mESC ChIA-PET datasets, detecting the

interactions involving RNAPII (8WG16 antibody, recognizing all variants) (Zhang et al., 2013)

and cohesin (antibody recognizing the SMC1 subunit) respectively (Dowen et al., 2014).

As seen in Figure 3D, the ChAS of RNAPII_8WG16 is quite low (0.08) in the RNAPII ChIA-

PET network. This is expected given that fragments in this network are highly enriched for

presence of this feature (84% of fragments have an RNAPII_8WG16 peak). This leads to

uniform levels of RNAPII abundance on the nodes and hence we do not observe any

localization of the mark in specific areas of the contact network. Interestingly, we do observe

higher ChAS for the elongating variant of RNAPII, suggesting that regions of the genome in

which elongation takes place interact preferentially. In this CIN, PcG proteins and associated

histone marks show considerably high ChAS, but lower than H3K4me1 (an enhancer

specific mark) and H4K20me3 a mark generally associated with transcriptional repression at

promoters as well as silencing of repetitive and transposable elements (Jorgensen et al.

2013).

Finally, in Figure 3D we investigate ChAS in the cohesin-mediated interactions, observing a

low value for ChAS of cohesin subunit SMC1 (0.09), as expected due to the strong

enrichment of fragments for presence of this protein (98% of fragments have an SMC1 peak)

and only slightly higher ChAS for the cohesin subunit SMC3. CTCF shows 3-fold increased

ChAS as compared to SMC1, suggesting that the subset of cohesin-bound fragments that

have in addition CTCF binding tend to interact preferentially with each other. As in the

previous three networks, PcG proteins and associated histone marks (H3K27me3 and

H2Aub1) as well as RNAPII_8WG16 maintain high ChAS, but we see higher values for

H4K20me3, H3K27ac, Mediator components (MED1 and MED12), LSD1 and laminB.

To conclude, the high ChAS values of PcG and RNAPII are confirmed in different datasets

but different features acquire different levels of ChAS and, potentially, relevance in the

different contact maps. Although PCHi-C, HiCap and RNAPII ChIA-PET are all enriching for

interactions involving promoters, there are clear differences in the resulting networks. The

Page 10: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

10

RNAPII ChIA-PET network is obviously enriched in promoter interactions (58% of the

RNAPII ChIA-PET fragments overlap PCHi-C promoter fragments, see Supp. Table 1) but

the PCHi-C and HiCap promoter-capture networks contain promoter interactions irrespective

of the presence of RNAPII. The cohesin ChIA-PET network is not enriched for promoters

(only 20% of the SMC1 ChIA-PET fragments overlap the PCHi-C promoter fragments, see

Supp. Table 1) but it confirms once again the assortativity of PcG and active RNAPII.

Comparing the results of our approach using these four different networks we conclude that

the methodology is able to identify the specific importance of epigenomic features in

mediating different types of contacts. We therefore proceed to use it to analyse the

difference between various types of promoter-centred contacts in promoter-capture CINs.

Characterizing contacts amongst promoters and contacts between promoters and other elements

As mentioned above, the experimental design of promoter-capture HiC produces chromatin

fragments of two kinds: Promoter (P) fragments are the ones that are captured in the

experiment because they match a library of promoters, therefore identified as baits; Other-

end (O) fragments are chromatin fragments found to interact with the promoter baits.

We next addressed the question of what are the differences in chromatin features

associated to contacts involving two promoters (PP) and contacts involving a promoter and

an Other-end fragment element (PO). We calculated feature abundance and ChAS values

for two subnetworks: the PP network and the PO network. Figure 4A and 4B represent the

ChAS and abundance for the PP and PO networks in PCHi-C respectively. We combine

these data in a comparative ChAS plot to directly investigate the relationship between the

ChAS of chromatin features measured in the two different subnetworks in PCHi-C (Figure

4C).

Strikingly, we find a number of features with very different values of ChAS in these two

subnetworks. For example, in Figure 4C we see a group of features with positive ChAS in

the PP interactions, implying that these epigenomic features are found in promoters that

contact each other, and negative ChAS in the PO interaction network, implying that they are

usually not present on the O fragments that contact promoters.

Page 11: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

11

These features that have discordant signs of ChAS in the two subnetworks include many

promoter-specific histone modifications and chromatin factors, specifically H3K4me3

(typically denoting active promoters), HCFC1 (transcription activator complex), SIN3A

(transcriptional repressor complex), KDM2A (H3K26 demethylase), MAX (H3K4methylase,

MLL1 complex with HCFC1), NMYC, OGT (histone acetyl transferase complex), H3K4me2

and H3K9ac (denoting active promoters) (Juan et al., 2014). Features that have slightly

higher positive ChAS in the PO interactions include CBX3 (recently implicated in elongation

(Smallwood et al. 2012)), P300 (typical enhancer marker), and RNAPII_S2P. These findings

reinforce the assumption that at least a subset of O fragments is distal regulatory elements.

PCHi-C can only reliably detect interactions involving at least one promoter and the

epigenetic features considered are mostly found in promoters. Therefore, we are unlikely to

find features with higher ChAS in PO vs PP contacts, which would lie above the diagonal in

Figure 4c. However, the features closer to the diagonal are features that are present in both

PP and PO contacts. The PcG proteins and their associated histone marks are found very

close to the diagonal on the comparative ChAS plot of Figure 4C, suggesting that they are

found at both PP and PO contacts, together with H3K36me3 and, interestingly, the cytosine

modifications 5hmC and 5fC.

The comparative ChAS plots for the HiCap datasets are very consistent with the PCHi-C

ones (Supp. figure 5) as shown clearly in a correlation plot of the differentce of ChAS

between PO and PP subnetworks (Figure 4d). We observe substantially different ChAS

scores for different RNA polymerase variants and we explore this finding more in detail in the

following section.

RNAPII_S2P has higher ChAS in promoter-other end contacts compared to other RNAPII variants Our collection of genome-wide features includes 5 different ChIP-seq datasets for RNA

polymerase II (RNAPII) obtained using 5 different antibodies. Of these, three of them

recognize different phosphorylated forms of RNAPII involved in the different stages of

transcription (Brookes et al., 2012) (Figure 5A). We can therefore distinguish between ChIP-

seq peaks of RNAPII in its initiating or repressed form (S5p, S7p), in its actively elongating

form (S2p), or in any of its forms (RNAPII, RNAPII_8WG16).

Page 12: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

12

We compared the ChAS of the different RNAPII variants in the whole PCHi-C and HiCap

networks. As already noted, RNAPII_S2p, which denotes elongation of actively transcribed

genes, shows higher ChAS than the other RNAPII variants in both datasets (Figure 5B).

Figure 5B shows the corresponding abundance values, which are very comparable between

different RNAPII variants within each dataset.

Next, we compared the ChAS of the different RNAPII variants in the RNAPII ChIA-PET

network (Figure 5B). In principle, the RNAPII ChIA-PET dataset provides us with the

network of chromatin contacts in mESC mediated by any RNAPII, as the antibody used in

this experiment (8WG16) recognizes all RNAPII variants. Interestingly, there is an increase

of ChAS from repressed to actively elongating RNAPII in all three networks (Figure 5B).

These results suggest that, whereas all interacting fragments in these promoter-rich

networks do contain some form of polymerase, the presence of active forms of RNAPII

distinguishes different network neighbourhoods in which active elongation is taking place.

Finally, we used the ChIA-PET network of contacts mediated by cohesin in mESC cells as a

negative control. In this dataset we see many contacts that do not involve any promoters or

genes, in which we do not expect to find any RNAPII bound (61% of fragments in the SMC1

ChIA-PET dataset have no signal for RNAPII_8WG16). Indeed, the different variants of

RNAPII in this cohesion-mediated network have very high ChAS, (Figure 5B). The presence

of any form of RNAPII clearly separates regions of the network where transcription is active

from regions where it is not. These trends cannot be explained by changes in abundance

(Figure 5C).

We further compared the ChAS of different RNAPII variants between PP and PO contacts

(Figure 5D). In the PCHi-C network we observe the ChAS for different phosphorylation

states of RNAPII to vary widely in the PO contacts (from close to 0 to 0.22, the third highest

value overall) while all states have similar ChAS in the PP contacts (Figure 5D). To

understand this trend better, we also look at abundance of the different RNAPII variants in

the different subnetworks (Figure Supp. 6). Whereas in the PP subnetwork the abundance

decreases from inactive forms of RNAPII to the elongating form, in the PO subnetwork the

elongating form is equally abundant compared to the other forms. We can therefore

conclude that the different ChAS observed for different forms of RNAPII is related to the

topological distribution of RNAPII binding on the network rather than simply to changes in

Page 13: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

13

average abundance in the network. This trend is even more marked in the HiCap dataset

where the ChAS value of non-elongating variants of RNAPII is negative in PO contacts

(Figure 5E). This suggests that when O fragments contact P fragments, only the elongating

form of RNAPII is present on both fragments.

We investigated further to determine whether the patterns of ChAS of different RNAPII

variants change depending on the type of fragments contacted by the promoter. We defined

2 types of O-fragments: we divided enhancers (fragments with H3K4me1>0) into active

enhancers (H3K4me1>0 and H3K27ac>0) or poised enhancers (H3K4me1>0 and

H3K27me3>0). We can thus separately compare ChAS values between PP contacts and

contacts of P-fragments with each type of O-fragment. As shown in Figure 5F, RNAPII_S2p

has higher ChAS than the other RNAPII variants in contacts between promoters and active

enhancers but not in contacts with poised enhancers. This suggests that the presence of

elongating RNAPII at the PO contact is involved in the activity of the enhancer.

Strikingly, we also observe a considerable number of contacts between promoters and

fragments that do not have enhancer marks (H3K4me1=0), which we found to be strongly

enriched for H3K36me3 and that in some cases, overlap protein-coding regions. This is not

due to a change in the abundance of different forms of RNAPII, as shown in (Figure Suppl.

6) and these results are largely confirmed in HiCap (Figure 5g). This result suggests that

promoters contact with gene bodies, not only their own.

Discussion

We have presented a novel approach inspired by social network science which enables the

powerful integration of epigenomic features with maps of 3D contacts of chromatin

fragments in the nucleus, taking into account its exact topology.

We demonstrated the capabilities of our approach by recapitulating the importance of PcG

factors and associated histone marks, which are found on chromatin fragments that

preferentially interact and localize in specific areas of the contact network. Given the small

proportion of fragments that are covered by these marks in the whole genome, the values

we observe for their ChAS are highly significant, as shown by our randomization procedure.

Page 14: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

14

These results suggest a strong relationship between certain chromatin features and the

topology of the contact network. Our method allows taking into account the exact topology of

the network without selecting only abundant features and without relying on arbitrary

subdivisions of the network into clusters (which would be different depending on the

clustering algorithms).

So far integrated analyses also report correlations between genomic information and

characteristics of genes in the 3D contact network (Akdemir & Chin, 2015; Merelli, Liò, &

Milanesi, 2013; Mifsud et al., 2015), but the exact network topology itself is rarely taken into

account. This confirms the importance of the network topology in the definition of the ChAS.

We further define two different subnetworks a priori and then compare the ChAS for all the

features in the two subnetworks. These comparative plots show the specificity of the role of

certain chromatin factors in one or both of the two subnetworks formed by PP and PO

interactions. For example, this subdivision allows us to recover the importance of H3K4me3,

marking active promoters, which has high ChAS in the PP subnetwork and negative ChAS in

the PO contacts. This discordant sign in the two subnetworks explains why when looking at

H3K4me3 on the whole network we had seen its ChAS to be very close to 0.

The subdivision of contacts into different subnetworks can also be used to calculate, for

each feature, the ChAS difference between the two. This number can be potentially used to

summarise the relationship between each feature and the network topology and to compare

different datasets.

We performed this comparison using PCHi-C and HiCap networks to exclude the possibility

that our findings are artifacts of one specific dataset. We find a strong correlation of the

difference between ChAS in PO and PP subnetworks in the two datasets giving us

confidence in the biological relevance of our results. The reproducibility between the two

datasets is remarkable, especially considered the differences in the experimental techniques

and the interaction filtering methods used. Whereas PCHi-C is enriched for long-range

contacts, HiCap has a higher coverage of short-distance interactions, likely constituting

connections between promoters and regulatory elements that are relatively close. These

types of interactions are likely lost in PCHi-C, due to the larger fragment size (which means

a single fragment might contain both sites of interaction), and due to the strict distance

Page 15: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

15

correction algorithms applied. Given these differences, the good correspondence of ChAS in

the two datasets points to the generality of the role of many chromatin factors, which seem

to play similar roles in short and long-range contacts.

There are however very interesting differences between the ChAS values in PO contacts in

PCHi-C and HiCap, which can be seen by comparing ChAS value directly in the two

datasets. For example, more features have negative PO ChAS values in HiCap. The

distribution of ChAS values in HiCap PO contacts almost completely symmetric around 0 (-

0.3 to 0.3). This is different from the positively skewed values of PCHi-C PO ChAS (-0.1 to

0.3). The reason for this is that in PCHi-C the larger fragments will include promoters and

also nearby regulatory regions, decreasing the difference between P- and O-fragment

associated chromatin features (Figure supp. 4). The comparison between ChAS scores

between these two datasets and the ChIA-PET datasets shows how our method can identify

very specific characteristics of the CINs used and exposes the biases that each

experimental technique carries, providing different points of view on the 3D structure of the

genome. Potentially, the analysis presented could also be used to identify particularly low

quality ChIP-seq datasets, which would fail to show the expected localization on the plot of

ChAS versus abundance.

Looking at the PP and PO subnetworks separately has also allowed us to notice a marked

difference between the different variants of RNAPII. The elongating variant appears more

strongly associated with contacts between promoters and active enhancers, compared to

inactive forms. This is observed in all promoter centred interaction datasets, including PCH-

iC, HiCap and RNAPII ChIA-PET. In fact this tendency is given by a decrease in assortativity

of the unspecific RNAPII forms in the contacts between promoters and active enhancers.

This suggests that the presence of unspecific forms of RNAPII is not associated to

preferential contacts of promoters and active distal regulatory elements whereas the

elongating form is. This is also consistent with the negative ChAS of non-elongating forms of

RNAPII in HiCap PO contacts. It is possible that the RNAPII that is found at active

enhancers is entirely in its elongating form, as also suggested by looking at the abundance

of RNAPII variants in different fragment types ( Supp. Figure 7). This result is stronger in

HiCap contacts as the large size of PCHi-C fragments probably signals peaks of RNAPII in

O-fragments where in reality the peak is in a nearby promoter.

Page 16: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

16

Recent literature is suggesting a picture in which enhancer activity is mediated by the

formation of loops connecting the gene promoter, the distal enhancer and the body of the

gene (Andersson, 2015; Feuerborn & Cook, 2015). 3C experiments have shown that these

gene-body contacts are often dynamic and they keep a connection between the gene

promoter and the gene body at the exact location of active elongation (Lee, Hsiung, Huang,

Raj, & Blobel, 2015). This model would also be useful to explain the many contacts we have

observed between promoters and gene body fragments without obvious enhancer marks. It

could be speculated that these contacts are joining two promoters to the body of one of the

two genes, which would be consistent with the concomitant localized transcription of multiple

genes.

Conclusions We have demonstrated the use of assortativity of chromatin features in interpreting

chromatin interaction datasets in the context of available epigenomic data. We have

achieved this through the definition of the ChAS, a measure of how much the value of a

specific chromatin feature is correlated between a chromatin fragment and others that

interact preferentially with it.

The difference of ChAS between the PP and PO subnetworks can be used to compare two

or more chromatin interaction datasets. Thus the method we present provides a quick and

efficient comparison of different CINs and integration of complementary epigenomic and

functional information.

Comparing two different networks obtained with two variations of promoter capture HiC on

mESC cells, we find excellent reproducibility of the following observations: 1) Members of

the PcG and associated marks show high ChAS despite the low abundance in the

interacting fragments, suggesting that they mediate the 3D contacts; 2) The ChAS values of

different variants of RNAPII are consistent with a picture in which contacts happen between

enhancers, their target promoters and along the gene body, with the actively elongating form

of RNAPII being determining for these interactions.

Our results lend support to the presented ChAS approach as a useful tool in the quest for

organizing principles in chromatin contact networks.

Page 17: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

17

Methods Generating the PCHi-C network PCHiC Interactions measured in in mESCs in (Schoenfelder, Furlan-Magaril, et al., 2015)�

were processed using ChiCAGO (Cairns et al. 2015). The publicly available HiCUP pipeline

(Wingett et al., submitted was used to process the raw sequencing reads. This pipeline was

used to map the read pairs against the mouse (mm9) genome, to filter experimental

artefacts (such as circularized reads and re-ligations), and to remove duplicate reads. The

resulting BAM files were processed into CHiCAGO input files, retaining only those read pairs

that mapped, at least on one end, to a captured bait.

CHiCAGO is a method to detect significant HiC interactions specifically adapted to promoter

capture experiments. In brief, it uses a noise convolution model in which two noise terms

account independently for noise sources that dominate at different scale: 1) Brownian motion

which leads to probabilities of interactions decreasing with distance and is modelled using a

negative binomial distribution and 2) sequence artifacts which are modelled using a Poisson

distribution. Once the ChiCAGO scores had been obtained only interactions with score>= 5

were considered.

The network was then built considering each fragment as a node (therefore having two types

of nodes, namely promoters and other ends, and two types of edges, namely promoter-other

end and promoter-promoter. Multiple edges connecting the same 2 nodes were eliminated.

Generating the HiCap network: The HiCap data was downloaded from the supplementary material of (Sahlén et al., 2015)�

which provides coordinates of the promoter and other end fragments that show significant

interaction as well as a list of gene promoters that interact based on assignation of promoter

fragments to the closest TSS. The HiCap networks was then build exactly as described for

the PCHi-C network.

Generating the the ChIA-PET networks: The fragment coordinates of the RNAPII ChIA-PET dataset were downloaded from the

supplementary material of (Zhang et al., 2013)��. The fragments coordinates and

Page 18: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

18

interactions of the SMC1 ChiA-PET dataset were downloaded from the supplementary

material of (Dowen et al., 2014)�.

Chromatin feature selection: The chromatin features were taken from Juan et al. (Juan et al., 2014).

Calculation of feature Abundance in the chromatin fragments: The histone modification and chromatin related protein peaks were binarized in 200 bp

windows. For each fragment the overlapping windows of chromatin peaks were identified

and their values averaged to give a percentage of presence of any feature in each fragment.

Thus for each feature a value between 0 and 1 is associated to each fragment (which has an

average length of 4.9Kb in PCHiC and 600bp in HiCap) generating a fragment by feature

matrix. The value of Abundance of a feature is defined as the average of that feature value

across all fragments in the network considered.

ChAS calculation: We define the ChAS of a specific epigenomic feature in a contact network as the Pearson

correlation coefficient of the value of that feature across all pairs of nodes that are connected

with each other (Newman, 2002; Noldus & Van Mieghem, 2015). ChAS is therefore the

assortativity of a the Abundance value of a feature on a network. We used the igraph

(Version 0.7.1 ) package in R and its function 'assortativity' to calculate the ChAS of each

feature separately on the network of choice (either the full network, the PP or the PO

networks in PCHi-C and HiCap. The assortativity measure used was that for continuous

variables. A more intuitive definition of assortativity is simply the Pearson correlation

between two vectors: vector 1 contains the feature values of the source nodes and vector 2

contains the feature values of the sink nodes, once all edges in the network are enumerated.

There is no appreciable difference in the value of assortativity obtained by listing all edges in

an arbitrary direction or first adding all edges in the opposite direction and calculating

assortativity on this extended network.

Assessment of ChAS significance: We tested whether the ChAS of the chromatin features we measured was significantly

higher than would be expected at random. To do this we produced 100 randomized

chromatin feature matrices (fragment by feature) in which for each fragment the values of

the chromatin features were shuffled 100 times column wise, thus preserving the overall

Page 19: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

19

Abundance of the feature. We then measured assortativity of each feature in these 100

random realizations and compared to the value of assortativity in the initial chromatin feature

matrix (See supp. Figure 2). All but the following features had p-val=0 … MAX had negative

assortativity contrary to what was seen in the randomized networks.

Definition of different types of O fragments Active enhancers were defined as O fragments with the H3K4me1>0 and H3K27ac>0.

Poised enhancers were defined as other-end fragments with the abundance of H3K4me1>0

and H3K27me3>0. For example, given our definition of feature abundance, this will identify

an active enhancer in any fragment that has at least 1 200bp segment covered by a

H3K4me1 peak and at least one (not necessarily the same) 200bp segment covered by a

H3K27ac peak. We have identified Non-enhancers as O fragments that have H3K4me1=0.

Statistical analyses All analyses were performed using R version 3.1.0 (x86_64-pc-linux-gnu) [R. A Language

and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna;

2008].

Page 20: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

20

References Akdemir, K. C., & Chin, L. (2015). HiCPlotter integrates genomic data with interaction

matrices. Genome Biology, 16(1), 198. doi:10.1186/s13059-015-0767-1

Andersson, R. (2015). Promoter or enhancer, what’s the difference? Deconstruction of

established distinctions and presentation of a unifying model. BioEssays : News and

Reviews in Molecular, Cellular and Developmental Biology, 37(3), 314–23.

doi:10.1002/bies.201400162

Ay, F., & Noble, W. S. (2015). Analysis methods for studying the 3D architecture of the

genome. Genome Biology, 16(1), 183. doi:10.1186/s13059-015-0745-7

Babaei, S., Mahfouz, A., Hulsman, M., Lelieveldt, B. P. F., de Ridder, J., & Reinders, M.

(2015). Hi-C Chromatin Interaction Networks Predict Co-expression in the Mouse

Cortex. PLoS Computational Biology, 11(5), e1004221.

doi:10.1371/journal.pcbi.1004221

Brookes, E., de Santiago, I., Hebenstreit, D., Morris, K. J., Carroll, T., Xie, S. Q., … Pombo,

A. (2012). Polycomb associates genome-wide with a specific RNA polymerase II

variant, and regulates metabolic genes in ESCs. Cell Stem Cell, 10(2), 157–70.

doi:10.1016/j.stem.2011.12.017

Cairns, J., Freire-Pritchett, P., Wingett, S. W., Dimond, A., Plagnol, V., Zerbino, D., …

Spivakov, M. (2015). CHiCAGO: Robust Detection of DNA Looping Interactions in

Capture Hi-C data. bioRxiv. Cold Spring Harbor Labs Journals. doi:10.1101/028068

Dixon, J. R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., … Ren, B. (2012). Topological

domains in mammalian genomes identified by analysis of chromatin interactions.

Nature, 485(7398), 376–80. doi:10.1038/nature11082

Dowen, J. M., Fan, Z. P., Hnisz, D., Ren, G., Abraham, B. J., Zhang, L. N., … Young, R. A.

(2014). Control of Cell Identity Genes Occurs in Insulated Neighborhoods in

Mammalian Chromosomes. Cell, 159(2), 374–387. doi:10.1016/j.cell.2014.09.030

Feuerborn, A., & Cook, P. R. (2015). Why the activity of a gene depends on its neighbors.

Trends in Genetics : TIG, 31(9), 483–90. doi:10.1016/j.tig.2015.07.001

Fullwood, M. J., Liu, M. H., Pan, Y. F., Liu, J., Xu, H., Mohamed, Y. Bin, … Ruan, Y. (2009).

An oestrogen-receptor-alpha-bound human chromatin interactome. Nature, 462(7269),

58–64. doi:10.1038/nature08497

Hoang, S. A., & Bekiranov, S. (2013). The network architecture of the Saccharomyces

cerevisiae genome. PloS One, 8(12), e81972. doi:10.1371/journal.pone.0081972

Page 21: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

21

Huang, J., Marco, E., Pinello, L., & Yuan, G.-C. (2015). Predicting chromatin organization

using histone marks. Genome Biology, 16(1), 162. doi:10.1186/s13059-015-0740-z

Jäger, R., Migliorini, G., Henrion, M., Kandaswamy, R., Speedy, H. E., Heindl, A., …

Houlston, R. S. (2015). Capture Hi-C identifies the chromatin interactome of colorectal

cancer risk loci. Nature Communications, 6, 6178. doi:10.1038/ncomms7178

Juan, D., Perner, J., Carrillo de Santa Pau, E., Marsili, S., Ochoa, D., Chung, H.-R., …

Valencia, A. (2014). Epigenomic co-localization and co-evolution reveal a key role for

5hmC as a communication hub in the chromatin network of ESCs. bioRxiv. Cold Spring

Harbor Labs Journals. doi:10.1101/008821

Kruse, K., Sewitz, S., & Babu, M. M. (2012). A complex network framework for unbiased

statistical analyses of DNA-DNA contact maps. Nucleic Acids Research, 41(2), 701–

710. doi:10.1093/nar/gks1096

Lajoie, B. R., Dekker, J., & Kaplan, N. (2015). The Hitchhiker’s guide to Hi-C analysis:

practical guidelines. Methods (San Diego, Calif.), 72, 65–75.

doi:10.1016/j.ymeth.2014.10.031

Lee, K., Hsiung, C. C.-S., Huang, P., Raj, A., & Blobel, G. A. (2015). Dynamic enhancer-

gene body contacts during transcription elongation. Genes & Development, 29(19),

1992–7. doi:10.1101/gad.255265.114

Lieberman-Aiden, E., van Berkum, N. L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A.,

… Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals

folding principles of the human genome. Science (New York, N.Y.), 326(5950), 289–93.

doi:10.1126/science.1181369

Merelli, I., Liò, P., & Milanesi, L. (2013). NuChart: an R package to study gene spatial

neighbourhoods with multi-omics annotations. PloS One, 8(9), e75146.

doi:10.1371/journal.pone.0075146

Mifsud, B., Tavares-Cadete, F., Young, A. N., Sugar, R., Schoenfelder, S., Ferreira, L., …

Osborne, C. S. (2015). Mapping long-range promoter contacts in human cells with high-

resolution capture Hi-C. Nature Genetics, 47(6), 598–606. doi:10.1038/ng.3286

Mirny, L. A. (2011). The fractal globule as a model of chromatin architecture in the cell.

Chromosome Research : An International Journal on the Molecular, Supramolecular

and Evolutionary Aspects of Chromosome Biology, 19(1), 37–51. doi:10.1007/s10577-

010-9177-0

Page 22: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

22

Nagano, T., Lubling, Y., Stevens, T. J., Schoenfelder, S., Yaffe, E., Dean, W., … Fraser, P.

(2013). Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature,

502(7469), 59–64. doi:10.1038/nature12593

Newman, M. E. J. (2002). Assortative Mixing in Networks. Physical Review Letters, 89(20),

208701. doi:10.1103/PhysRevLett.89.208701

Noldus, R., & Van Mieghem, P. (2015). Assortativity in complex networks. Journal of

Complex Networks, cnv005. doi:10.1093/comnet/cnv005

Pombo, A., & Dillon, N. (2015). Three-dimensional genome architecture: players and

mechanisms. Nature Reviews. Molecular Cell Biology, 16(4), 245–257.

doi:10.1038/nrm3965

Sahlén, P., Abdullayev, I., Ramsköld, D., Matskova, L., Rilakovic, N., Lötstedt, B., …

Sandberg, R. (2015). Genome-wide mapping of promoter-anchored interactions with

close to single-enhancer resolution. Genome Biology, 16(1), 156. doi:10.1186/s13059-

015-0727-9

Sandhu, K. S., Li, G., Poh, H. M., Quek, Y. L. K., Sia, Y. Y., Peh, S. Q., … Ruan, Y. (2012).

Large-scale functional organization of long-range chromatin interaction networks. Cell

Reports, 2(5), 1207–19. doi:10.1016/j.celrep.2012.09.022

Schoenfelder, S., Furlan-Magaril, M., Mifsud, B., Tavares-Cadete, F., Sugar, R., Javierre, B.-

M., … Fraser, P. (2015). The pluripotent regulatory circuitry connecting promoters to

their long-range interacting elements. Genome Research, 25(4), 582–597.

doi:10.1101/gr.185272.114

Schoenfelder, S., Sugar, R., Dimond, A., Javierre, B.-M., Armstrong, H., Mifsud, B., …

Elderkin, S. (2015). Polycomb repressive complex PRC1 spatially constrains the mouse

embryonic stem cell genome. Nature Genetics, advance on. doi:10.1038/ng.3393

Smallwood, A., Hon, G. C., Jin, F., Henry, R. E., Espinosa, J. M., & Ren, B. (2012). CBX3

regulates efficient RNA processing genome-wide. Genome Research, 22(8), 1426–36.

doi:10.1101/gr.124818.111

Zhang, Y., Wong, C.-H., Birnbaum, R. Y., Li, G., Favaro, R., Ngan, C. Y., … Wei, C.-L.

(2013). Chromatin connectivity maps reveal dynamic promoter-enhancer long-range

associations. Nature, 504(7479), 306–10. doi:10.1038/nature12716

Page 23: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

23

Figure Legends Figure 1. Promoter-centred chromatin network in mESCs. Promoters (P red nodes),

Other ends (O blue nodes). PP edges (red) PO edges (blue).

Figure 2. Illustration of assortativity of epigenomic features in a network of chromatin

contacts. A) PCHi-C CIN in mESCs. Nodes are coloured by proportion of fragment covered

by EZH2 binding, highlighting the neighborhoods in which the protein is abundant. B)

Cartoon illustrating what ChAS measures. Each node of the network is a chromatin

fragment, blue nodes represent nodes in which a peak of a specific chromatin mark is found.

Edges represent significant 3D interactions. C) Cartoon of ChAS versus abundance plot.

Figure 3. Assortativity of the 78 chromatin features in 4 different Chromatin

Interaction Networks: A) PCHi-C; B)HiCap; C) PolII ChIA-PET; D) SMC1 ChIA-PET.

Figure 4. Comparing the assortativity of Promoter-Promoter (PP) and Promoter-Other

end (PO) contacts. A) PCHi-C PP subnetwork ChAS versus Abundance. B) PCHi-C PO

subnetwork ChAS versus Abundance. C) PCHi-C Compartive ChAS in PO versus PP

subnetwork. D) Differential subnetwork ChAS in HiCap versus PCHi-C.

Figure 5. ChAS of different RNAPII forms in promoter-capture and ChIA-pet networks.

a) Different forms of RNAPII in our chromatin feature set. b) Comparison of ChAS of RNAPII

in PCHi-C, HiCap, RNAPII and SMC1 ChIA-PET subnetworks. c) Comparison of Abundance

of RNAPII in PCHi-C, HiCap, RNAPII and SMC1 ChIA-PET subnetworks. d) PCHi-C ChAS

in PP and PO subnetworks. e) HiCap ChAS in PP and PO subnetworks. f) PCHi-C ChAS

compared in PP and different types of PO subnetworks. g) HiCap ChAS compared in PP

and different types of PO subnetworks.

Page 24: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

Figure 1

Page 25: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

AB

AC

No

t inte

restin

gW

ide

-sp

rea

d b

ut

no

t invo

lve

d

in 3

D s

tructu

re

Lo

w a

bu

nd

an

ce

Bu

t imp

orta

nt

in 3

D s

tructu

re

Hig

h a

bu

nd

an

ce

a

nd

imp

orta

nt

in 3

D s

tructu

re

Dis

asso

rtativ

e

Ca

n d

istin

gu

ish

two

typ

es o

f frag

me

nts

Hig

h C

hA

S >

0

No C

hA

S =

0

Ch

AS

< 0

ChAS

Ab

und

an

ce

Figure 2

Page 26: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

AB

ACD

Figure 3

Page 27: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

AB

ACD

Figure 4

Page 28: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

A BA

Repressed:POLII_S5p

Active Initiating:POLII_S5pPOLII_S7p

Active elongating:POLII_S5pPOLII_S7pPOLII_S2p

Transcription

All variants: POLII

Mostly unphosphorylated: POLII_8WG16

Different variants of Polymerase II in our features

C

FG

D E

Figure 5

Page 29: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

Chromatin assortativity: Integrating epigenomic data and 3D genomic structure

Vera Pancaldi1, Enrique Carrillo de Santa Pau1, Biola Maria Javierre2, Peter Fraser2, Mikhail

Spivakov2, Alfonso Valencia1, Daniel Rico1

1Spanish National Cancer Research Centre (CNIO), Madrid, Spain; 2The Babraham Institute,

Cambridge, United Kingdom

Supplementary Material

Supplementary Figure 1: PCHi-C networks. Promoter-promoter (A), Promoter-Other end (B).

A B

Page 30: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

Supplementary Figure 2Randomization of ChAS superimposed onto original ChAS values (top). andomized ChAS values (bottom).

Page 31: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

Supplementary Figure 3 A) HiCap network, B) HiCap PP subnetwork

B

A

Page 32: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

Supplementary Table 1: Overlap of fragments from different datasets. Proportion of overlap over total in % in the brackets.

SMC1 RNAPII PCHi-C P PCHi-C O HiCap P HiCap O

SMC1

RNAPII 4748(20)

PCHi-C P 4720(20) 19150(58)

PCHi-C O 6031(25) 6436(19)

HiCap P 2261(9) 16022(48) 8131(43) 674(2)

HiCap O 3799(16) 8009(24) 5594(30) 7464(21)

Total 24052 33126 18906 3644 15897 72025

Page 33: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

Supplementary Figure 4 HiCap subetworks analysis. ChAS vs Abundance in the PP (A) and PO (B) subnetwork. Comparison of abundances in subnetworks (C) and of ChAs values in subnetworks (D).

Page 34: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

Supplementary Figure 5: Comparison of PCHi-C and HiCap data. Abundance (A); ChAS (B) PP subnetwork abundance (C) and ChAS (D); PO subnetwork abundance (E) and ChAS (F).

Page 35: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

Supplementary Figure 6 ChAS (A,C) and abundance (B,D) for PP and different types of PO contacts.

Page 36: Chromatin assortativity: Integrating epigenomic data and ... · 1 Chromatin assortativity: Integrating epigenomic data and 3D genomic structure Vera Pancaldi1, Enrique Carrillo de

Supplementary Figure 7Abundance of different variants of RNAPII in P and O fragments, PCHi-C and HiCap.