Network-based exploration and visualisation of …...Network-based exploration and visualisation of...

47
Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic Division Channel Highway, Kingston 7050 AUSTRALIA Corresponding author: Dr Ben Raymond Tel: +61 (3) 6232 3336 Fax: +61 (3) 6283 2336 11 12 13 14 15 16 Email: [email protected] Correspondence address: Dr Ben Raymond Australian Antarctic Division Channel Highway, Kingston 7050 AUSTRALIA 1

Transcript of Network-based exploration and visualisation of …...Network-based exploration and visualisation of...

Page 1: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

Network-based exploration and visualisation of

ecological data

1

2

3

4

5

6

7

8

9

10

Ben Raymond and Graham Hosie

Australian Antarctic Division

Channel Highway, Kingston 7050 AUSTRALIA

Corresponding author:

Dr Ben Raymond

Tel: +61 (3) 6232 3336

Fax: +61 (3) 6283 2336

11

12

13

14

15

16

Email: [email protected]

Correspondence address:

Dr Ben Raymond

Australian Antarctic Division

Channel Highway, Kingston 7050 AUSTRALIA

1

Page 2: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

Abstract 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

Networks – structured graphs consisting of sets of nodes connected by edges –

provide a rich framework for data visualisation and exploratory analyses. Although

rarely used for the visualisation of ecological data, networks are well suited to this

purpose, including data that one might not normally think of as a network. We present

a simple method for transforming a data matrix into network format, and show how

this can be used as the basis for interactive exploratory analyses of ecological data.

The method is demonstrated using a database of marine zooplankton samples acquired

in the Southern Ocean. The network analyses revealed zooplankton community

structures that are in good agreement with previously published results. Variations in

community structure were observed to be related to the temporal and spatial pattern of

sampling, as well as to physical environmental factors such as sea ice cover. The

analyses also revealed a number of errors in the data, including taxon identification

errors and instrument failures.

The method allows the analyst to generate networks from different combinations of

variables in the data set, and to examine the effects of varying parameters such as the

scales of spatial, temporal, and taxonomic aggregation. This flexibility allows the

analyst to rapidly gain a number of perspectives on the data and provides a powerful

mechanism for exploration.

Keywords:

exploratory analyses; data visualisation; networks; zooplankton; Southern Ocean;

community structure

2

Page 3: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

Introduction 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

Exploratory analyses and data visualisation are often used during the early phases of

scientific investigations. Such analyses recognise and explore patterns and structures

in data, and enable investigators to form hypotheses and conceptual models for further

investigation and experimentation.

Networks provide a rich framework for data visualisation and exploration. The term

“network” is used here to denote a structured graph consisting of a set of nodes (also

termed “vertices”) connected by edges. Each node in a network represents an entity

or concept of interest — for ecological data, the entities of interest are commonly

individual species or sample sites, although any choice of entity could potentially be

used. Relationships between the entities of interest are indicated by edges between

nodes in the network. A network is most commonly diagramatically represented using

circles or other shapes for the nodes, with lines between nodes showing the edges.

The edges can be weighted to indicate the strengths of the relationships, and can also

be directed, indicating that the relationships have an inherent direction (e.g. predation

or temporal succession). Networks are well suited to the analysis of many types of

ecological data. Complex structures of inter-related elements are pervasive in natural

systems (Aloy and Russell, 2004; Green et al., 2005; Proulx et al., 2005) and network-

based methods can provide an intuitive framework for understanding those systems.

The importance of considering the overall ecosystem context when investigating

elements of that ecosystem has long been recognised in the ecological sciences, but

has been given recent re-emphasis (e.g. Jordán and Scheuring, 2004).

3

Page 4: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

Network-based methods have a long history as a general analytical framework in the

ecological sciences. The most common application has been to food web studies

(Pimm, 1982; Berlow et al., 2004), including non-trophic interactions (Paine, 1984;

Memmott, 1999; Brose et al., 2005). Networks with directed edges (termed “plexus

diagrams”) have long been used as a basis for investigating relationships amongst

species and between species and their environment (Whittaker and Warren Fairbanks,

1958; McIntosh, 1973; Gillison, 1978; Matthews, 1978; Dale, 2000) but their use for

this purpose is relatively uncommon, and classification and ordination techniques are

generally favoured. Networks in which edges represent quantified flows (often of

energy or matter) have also long been studied under the monikers of “ecological

network analysis” and “network environ analysis” (Ulanowicz, 1986; Christensen and

Pauly, 1992; Fath and Patten, 1999). Overviews of this field have been given by Fath

(2004) and Fath and Patten (1999). Networks have also been used to investigate

landscape connectivity (Fahrig and Merriam, 1985; Urban and Keitt, 2001;

Starzomski and Srivastava, 2007).

Network applications in ecology have received a recent surge of popularity (Green et

al., 2005; Proulx et al., 2005), riding on the wave of interest in “networks science” –

research that has examined network structures and processes across a number of

different disciplines (see e.g. Watts and Strogatz, 1998; Albert and Barabási, 2002;

Newman, 2003). The recent resurgence of interest in network theory in the ecological

sciences has been driven at least in part by the wider interest in networks science, but

probably more so by the recognition that network-based methods can facilitate the

analytical integration of an overall ecosystem with the dynamics of its individual

elements (Jordán and Scheuring, 2004). Networks offer insights into system-level

4

Page 5: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

properties that arise from the structure of the network, and which are not evident from

the properties of the entities alone. Topology can inform how networks function and

respond to change (Berlow, 1999; Jordán, 2001; Dambacher et al., 2003; Proulx et al.,

2005), and these ideas have been applied to examples such as the propagation of

disease (Newman, 2002; Shirley and Rushton, 2005; Jeger et al., 2007), dispersal of

seeds (Lázaro et al., 2005), contaminant effects (Rohr et al., 2006), and the selection

of habitat for conservation (Rhodes et al., 2006).

Despite their application to these more formal types of analyses, networks are rarely

used as tools for the visualisation and exploration of ecological data. Network-based

approaches can be used for visualisation and exploration of a variety of data – not just

datasets that one might traditionally think of in terms of networks. Network-based

methods can provide insights that complement those obtained from more conventional

exploratory methods.

Networks are commonly represented graphically as a connected structure in two- or

three-dimensional space. The nodes can be positioned according to a variety of layout

algorithms (see e.g. Herman et al., 2000). Such algorithms take diverse approaches to

the problem, and a review is beyond the scope of this paper. In general, however, their

aim is to find a geometric arrangement of the nodes and edges of the network that best

conveys the network structure to the user. Ecologists familiar with multidimensional

scaling (MDS) might expect that a network would be laid out so that the geometric

distance between a pair of nodes matches the corresponding pairwise dissimilarity, as

is the case in MDS. It is true that a number of graph layout algorithms are based on

concepts very similar to MDS (e.g. Kamada and Kawai, 1989). However, network

5

Page 6: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

layouts are generally not constrained in this manner, but tend to be more concerned

with visual criteria such as minimising the overlap of nodes and crossing of edges.

Regardless of the details of the layout, the visual appearance of a network can be

altered in a number of ways, including the size, shape, and colour of the nodes and

edges, and the colours of the background. Large networks can be simplified by

merging nodes or edges. Changes can be interactive, for example simplifying those

parts of the network outside of the user’s immediate region of interest. The

visualisation can also include dynamic elements, such as popups that display

information about nodes or edges. Exploring data in a dynamic, interactive manner is

a powerful mechanism for gaining a conceptual understanding of the structures and

patterns present in the data. Interactivity can be particularly useful with very large

networks, which can be overwhelming in their visual complexity and therefore

difficult to comprehend.

A given set of entities of interest can be represented as a network simply by

connecting with edges those entities that are in some way related, or interact. Some

data may have notions of connectivity that are self-evident (e.g. trophic or

genealogical data), or have well-established methods for defining relationships

between entities. An example of the latter is species observation data: methods for

calculating dissimilarities between species form the foundation for much of modern

numerical ecology. Creating a network from such data can be done from the species

dissimilarity matrix (Dale, 2000). In the general case, however, a given data set might

not have a natural notion of connectivity. In the next section, we describe an

algorithm that can create networks from a wide range of data.

6

Page 7: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

Methods 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

Scientific data sets commonly take the form of observations of a set of variables,

structured in the form of a matrix in which rows correspond to observations and

columns to variables. A simple algorithm to create a network structure from such data

is:

1. Decide on the entities of interest to the analysis. Each individual entity will be

represented by a single node in the network. Choose the variable in the data set (i.e.

column of the data matrix) that best delineates those entities. The nodes in the

network are then formed from this variable, with one node for distinct value of the

variable.

2. Choose a variable that defines the relationships of interest between those

entities defined in step 1. Any pair of nodes that share a common value for this

variable are then connected by an edge.

The implementation of this algorithm is straightforward. The rows of the matrix are

first sorted by the edge variable, and then a single pass over the rows of the matrix is

required, connecting by edges all nodes that have an equal edge variable value. For

matrices with n rows and m distinct values of the edge variable, the average time-

complexity is O(n + (n/m)2). In the worst case of a fully-connected network, m=1 and

the complexity becomes O(n2). The complexity of the initial sorting step is neglected

here as it will be less than that of the remainder of the algorithm with an efficient

sorting algorithm (e.g. Williams, 1964).

7

Page 8: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

The basic algorithm can be extended in a number of useful ways. Multiple node

variables can be defined, in which case a node would be formed for each distinct

combination of those variables. Alternatively, the multiple node variables could be

treated as alternate node definitions, allowing more than one type of entity (type of

node) in the one network. Similarly, multiple edge variables could either denote a

compound edge condition (i.e. an edge is only formed between nodes that share

values of all variables) or a set of alternative conditions (an edge is formed between a

pair of nodes that share a value for any one of the edge variables). The edges can also

be weighted. Some possible weighting schemes are:

wij=2Nij/(Ni + Nj) (1)

wij=Nij/(Ni + Nj - Nij) (2)

wij=1-(Ni + Nj - 2Nij)/K (3)

where wij is the weight of the edge between nodes i and j; Ni is the number of times

the ith value of the node variable was observed; Nij is the number of times that the ith

and jth nodes were linked by a common value of the edge variable; and K is the

number of rows in the matrix. The example weighting schemes above correspond to

some well-known ecological functions: (1) is the Sørensen index (equivalent to the

Bray-Curtis similarity applied to binary data); (2) is the Jaccard similarity coefficient;

and (3) is 1-DG, the Gower dissimilarity metric.

Each node and edge is naturally associated with a set of rows in the data matrix. For

example, any given node is associated with a specific value of the node variable

chosen in step 1 of the algorithm. That node is therefore associated with all rows in

8

Page 9: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

the data matrix for which the node variable takes that value. Each node and edge can

be assigned “attributes” – data drawn from the relevant rows of the data matrix.

Attribute data are not used to form the network itself, but can be used to help interpret

network structures and patterns. Consider a matrix of species observation data, in

which each row holds the time, location, and taxon observed, as well as physical

environmental data relating to that time and location. A network might be created

from this data in which the nodes represent observation locations. The attributes of

those nodes would be the dates, taxa, and physical environmental data associated with

each location.

Given a network G which uses variables v1 and v2 as node and edge variables, it is

straightforward to transform this into its “alternate” network H, which swaps the

variables and uses v2 for nodes and v1 for edges. Edges that connect to the same node

in G correspond to nodes that are connected by edges in H. The transformation can be

done by visiting each edge in G, creating a node in H for each unique value of the

edge variable. During that process, a list is created for each node in G, recording all of

the edge values in G associated with that node. Each list then provides a list of nodes

in H that are to be connected by edges. For example, consider a network in which

nodes represent observation locations and each edge indicates a pair of locations that

have at least one taxon in common (i.e. a network of sites, linked by taxa). The

alternate of this network would be one in which nodes represent individual taxa, and

each edge indicates a pair of taxa that have been observed at the same location (a

network of taxa, linked by sites). There is a clear analogy between this example and

the choice in community ecology studies to analyse a data matrix by its rows (sample

sites) or columns (taxa).

9

Page 10: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

Results 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

We illustrate our approach with data from the Southern Ocean continuous plankton

recorder (CPR) survey (Hosie et al., 2003). The CPR survey is an international

collaborative project with seven participating nations at the time of writing. The

survey uses a towed device to take contiguous transect samples of the zooplankton

present in the surface waters of the Southern Ocean.

[Insert Figure 1 about here]

We used CPR data collected aboard the RV Tangaroa during the 43rd Japanese

Antarctic Research Expedition, which occurred between 7 February and 3 March

2002. A sequence of tows were completed along the 140°E meridian using a CPR

towed 100m behind the vessel. The resulting track ran from approximately 47°S to

66°S and back again (Figure 1). Each tow was divided into segments with nominal

length 5 nautical miles; these segments represent the sampling units of the survey.

The taxa present in each segment were identified to species level wherever possible;

ostracods, and gelatinous plankton (hydromedusae, ctenophores, siphonophores)

were not identified to species level. Appendicularians were identified as either

Oikopleura sp. or Fritillaria sp. Physical environmental data (sea surface temperature

(SST) and salinity (SSS), photosynthetically active radiation (PAR), and fluorometry)

were recorded during the CPR tows at one-minute sample intervals. The mean value

of each of these variables was calculated for each segment. Further details of the data

acquisition are given by Hunt and Hosie (2005). For each segment we also calculated

the number of days since the sea ice cover had melted, using remotely-sensed passive

microwave estimates of sea ice cover (Cavalieri et al., 1996, updated 2006). Only

10

Page 11: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

samples collected at night were included in the analyses, to avoid the effects of the

diurnal vertical migration of many Southern Ocean zooplankton taxa (Hunt and

Hosie, 2003). Night was defined as PAR < 100 μmol s-1 m-2.

The data was collated into a matrix in which each row corresponded to an observation

of a single taxon, and included the taxon identifier number and taxon abundance,

sample site identifier (the tow and segment number), latitude, longitude, time and

date, and values of the physical environmental data of the observation. We used the

algorithm described in the previous section to generate various network structures,

implemented in the Matlab package (Mathworks, MA, 2008). The networks generated

using this algorithm were then passed to the GUESS package (Adar, 2005) for

visualisation and exploration. GUESS provides a number of common graph layout,

clustering, and other algorithms, and a Python-based interactive interface through

which the user can dynamically alter the graph, allowing rapid exploration and testing

of ideas.

We developed extensions to GUESS for the analyses described here, including a

graphical user interface for easily altering the node and edge colours and sizes, a filter

for removing edges based on their weights (or other properties), and a facility for

transforming a network into its alternate form. Our code and an interactive version of

the examples shown here are available at http://data.aad.gov.au/graphvis/.. 21

22

23

24

25

The geometric layout of each network shown here was calculated using the GEM

algorithm (Frick et al., 1995) implemented in GUESS. Briefly, this algorithm treats

the nodes as mutually repulsive particles, and edges as springs that attempt to draw

11

Page 12: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

the nodes together. A simulated annealing algorithm with various heuristics is used to

find the placement of nodes that results in an equilibrium between the forces in this

spring/particle system.

[Insert Figure 2 about here]

We first examine patterns in species composition of the sample sites. Figure 2 shows a

community network in which the nodes represent tow segments (sample sites). The

taxon compositions of the segments were used to form the edges in the network: two

segments were linked if the same taxon occurred in both. Edges were weighted using

equation 1 (Bray-Curtis), so that segments with more similar taxon compositions are

more strongly linked. Note that the similarities between tow segments do not

incorporate taxon abundance information, only presence or absence on a particular

tow segment. Weak edges in Figure 2 (those with weight less than 0.75) have been

removed for visual clarity. The pruning of weak edges resulted in a small number (23)

of nodes that became disconnected from the main component of the network, and

which are not shown in Figure 2. These disconnected nodes were all from northerly

latitudes close to Tasmania. The attributes of the nodes have been used to provide

additional visual cues: the nodes have been coloured according to the latitude of the

tow segment. White represents the most northerly latitudes just south of Tasmania,

and black represents high latitudes near Antarctica. The node size shows the number

of taxa, with larger nodes representing segments with higher species richness.

Five clusters (A-E) were defined by visual inspection of the network. The node

colours suggest that the clustering is related to latitude. The high-latitude segments

12

Page 13: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

appear to be divided into two or three clusters (A and B; top left of Figure 2). The tow

segments in cluster A were acquired at a mean latitude of 63.3°S (range 64.5°S –

62°S), and for cluster B, 64.5°S (65.5°S – 63.3°S). There are two clusters at

intermediate latitudes (C and D; latitudes 52.6°S, range 60.8°S – 47.9°S; and 56.6°S,

range 59.1°S – 54.0°S), and a cluster E of segments from the more northerly latitudes

(50.1°S, range 52.5°S – 47.1°S). The temporal progression of the ship track is

overlaid with a dashed line on Figure 2, starting and ending in cluster E.

The segments in the cluster C have smaller node size, indicating lower numbers of

taxa on these segments. This low species richness (uncharacteristic for a mid-latitude

community) was puzzling and prompted us to re-examine the data. We discovered

that the PAR data were erroneous (all zeros) for the first two days of the voyage.

Cluster C in fact comprises daytime samples; the low species richness is a result of the

downward migration of many zooplankton taxa during daylight hours (Hunt and

Hosie, 2003).

The patterns of other attribute variables can be explored by altering node and edge

characteristics such as colour and shape. If we change the node colours to represent

the date of sampling (not shown), we see that the temperate segments (E; bottom-right

of Figure 2) were acquired at the extremes of the sample period – i.e. as the ship was

both leaving and returning to Tasmania (this can also be seen in the dashed line

representing the ship track on Figure 2). That this cluster comprises a mixture of

sample dates suggests that the species community remained relatively stable over the

timespan of the voyage. Cluster A also shows a similar bimodal distribution of sample

dates (the ship completed the southward leg of the transect on the 11th of February and

13

Page 14: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

commenced the northward leg on the 27th). The two intermediate-latitude clusters (C

and D), while relatively similar in latitude, are distinct in sample dates. One (C)

comprises segments on the southward leg of the voyage, the other (D), the return leg.

The high-latitude clusters (A and B) also show some separation by sample date

(ranging from the 11th to the 27th of February), suggesting that either the regional

species composition changed over that time, or the ship’s movement caused the

sampling to traverse local variations in ecosystem structure.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

Figure 3 shows the network with node colours changed to show the number of days

since the sea ice cover melted. The tow segments in cluster B and about half of those

in A were taken in waters previously covered by sea ice. Sea ice is known to have a

role in structuring ecosystems in Antarctic waters (Eicken, 1992; Lizotte, 2001).

While the species compositions of segments in cluster A are all relatively similar (and

thus the cluster is relatively tight), there are variations in sea ice cover and sample

date that probably give rise to subtle variations in composition within the cluster.

[Insert Figure 3 about here]

[Insert Figure 4 about here]

The “clusters” referred to in the results above were determined by visual inspection,

but it is possible to determine clusters more formally. Figure 4 shows the result of

applying a clustering algorithm (Newman, 2004) to the network of Figure 2. The sizes

of the nodes in Figure 4b are proportional to the total taxon counts of the constituent

tow segments. The edge thicknesses are proportional to weighted fraction of edges

14

Page 15: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

between node clusters, so that clusters connected with a heavy edge are more closely

related than clusters connected with a thin edge. The clusters of Figure 4 correspond

quite closely with the visually-determined clusters in Figure 2, with the exception that

the high-latitude segments in cluster A have been broken into two clusters I and III in

Figure 4. This split of cluster A into I and III closely follows the pattern of sea

ice/open water segments shown in Figure 3. The high-latitude ice-zone segments in

cluster I appear to be more closely related to the high-latitude open-water segments

(cluster III) than to the other high-latitude ice-zone segments (cluster II).

The clustering of the tow segments suggests that the species compositions should be

further analysed. One approach could collate a list of segments in each cluster, and

then examine the taxa, as is often done with a conventional cluster analysis of sample

sites. However, recall that we have the notion of the “alternate” network at our

disposal. The alternate of a sites-by-species network is a species-by-sites network –

one in which the nodes represent taxa and the edges indicate which taxa have similar

distributions across sample sites. We can interactively transform our sites-by-species

network into a series of species-by-sites networks representing the species

compositions of some or all of the tow segments.

[Insert Figure 5 about here]

[Insert Figure 6 about here]

Figures 5 and 6 show a series of such species-by-sites networks. The network shown

in Figure 5 represents the full set of tow segments. This network shows a core

15

Page 16: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

structure of commonly observed taxa (Oithona similis, Neogloboquadrina

pachyderma, Fritillaria sp., Ctenocalanus citer, small calanoid copepods, Limicina

sp., and Thysanoessa macrura), with a periphery of those less frequently observed.

The core taxa are more highly connected (have more edges connecting them to other

nodes) than the peripheral taxa. The node colour shows the mean latitude of the taxon.

The peripheral taxa are members of high- and low-latitude communities while the

core taxa have intermediate mean latitudes. The structure of this network is

ambiguous: the core nodes could represent generalist species that are widely

distributed and therefore both commonly observed as well as highly connected.

Alternatively, the core nodes could represent an extensively sampled (and so these

taxa appear to be common) intermediate community that overlaps with the warm- and

cold-water extreme communities (thus giving the high connectivity). This ambiguity

arises because the use of the complete set of tow segments has effectively created a

juxtaposition of the different communities found along the latitudinal transect. Figures

6a to 6e show the species-by-sites network of each of the clusters I–VI; Figure 6f

shows that of those disconnected nodes that were not shown in Figure 2. This

sequence of networks resolves the ambiguity: clearly, the central taxa in Figure 5 are

relatively common at all latitudes, and the remainder of the community composition

varies with latitude. The community compositions of the clusters are similar to those

described by Hunt and Hosie (2005; 2006a; 2006b) and are not described in detail

here.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

The structural characteristics of this network raise a number of questions and so

suggest potentially interesting avenues for further exploration. The clusters are not

completely disjoint, but have a few edges connecting them. Is there anything

16

Page 17: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

interesting about the edges that provide this connectivity between the different

communities? In Figure 2, the edges represent shared taxa, and so the edge attributes

could be used to examine which taxa are common to two adjacent communities (the

taxa that link the communities). Alternatively, we might ask which taxa are absent

from inter-cluster edges, as these would be taxa that are present in one community but

not its neighbour. This latter question can be answered using the node and edge

attributes, by counting the number of edges that do not include a particular taxon, and

normalising by the maximum number of edges in which this taxon could potentially

have been found (the number of times that taxon appeared in either or both of the end-

nodes of the inter-cluster edges). We divided the segments from clusters I – III

(Figure 4) into those that were acquired in previously ice-covered waters, and those

that were acquired in areas of open water, and then examined the patterns of taxon

absence on the edges between those two sets of segments (Table 1). The ice-

associated taxa include Antarctic krill Euphausia superba, the association of which

with sea ice is well known and has been the subject of a great deal of research (see

e.g. Nicol, 2006). Although the genus Oikopleura, most likely O. gaussica in the

Southern Ocean (Tokioka, 1961), has a wide circumpolar distribution,

appendicularians in general were more abundant in the open ocean zone on this

transect (Hunt and Hosie 2005).

[Insert Table 1 about here]

Other single-taxa visualisations offer useful exploratory insights. Figures 7a–d shows

the network of Figure 2, but with emphasis on those nodes and edges associated with

the taxa Salpa thompsoni, Pelagobia longicirrata, Themisto gaudichaudii, and

17

Page 18: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

Metridia lucens. Salpa thompsoni (Figure 7a) can be seen to be associated principally

with clusters I – III (the high-latitude segments), but also with a few segments in

cluster VI (samples acquired in the vicinity of the subantarctic front). Salpa thompsoni

is known to be associated with waters to the north of the seasonal ice zone (Pakhomov

et al., 2002), so casting doubt on the veracity of the association with samples from

cluster VI. Salp specimens captured by the CPR are commonly damaged in the

process and can be difficult to identify. We suggest that the samples from cluster VI

are likely to be misidentified specimens, probably of Salpa fusiformis. Pelagobia

longicirrata (Figure 7b) is associated with distinct subsets of clusters II, IV, V and VI.

This is consistent with the known geographic distribution of this polychaete, which is

widespread in the Southern Ocean (Hopkins, 1985; 1987) but fairly uncommon in the

CPR records (a total of 464 specimens have been counted in 19,796 samples). The

locally clustered nature of the distribution of P. longicirrata across the network is

interesting, and suggests that the spatial distribution of this taxon might be similarly

patchy. Themisto gaudichaudii (Figure 7c) is predominantly associated with the

warmer-water clusters V and VI, but with a small number of records in the high-

latitude, ice-zone clusters I and II. Themisto gaudichaudii is notably very abundant in

the polar frontal and subantarctic zones (Bernard and Froneman, 2002; 2003; Donelly

et al., 2006). The distribution of T. gaudichaudii here shows very little overlap with

that of S. thompsoni (Figure 7a). This is in contrast to observations in the north

Atlantic, where juvenile T. gaudichaudii have been observed to be associated with the

salps Pegea bicaudata and Iasis zonaria (Madin and Harbison, 1977). Metridia lucens

(Figure 7d) exhibits a substantial degree of seasonality in its pattern. The species was

absent in most of the southward (outgoing) leg of the voyage (samples shown as

circles) except for a few sites in the sea ice zone and in the far north group E in Figure

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

18

Page 19: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

2. Metridia lucens was more common in the samples on the return leg north (squares)

taken approximately 2 weeks later. At this stage we can only speculate the cause of

the seasonality, but this does demonstrate the value of the methodology in probing the

data and finding unique patterns warranting further investigation.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

[Insert Figure 7 about here]

Discussion

The methods described here have strong parallels with other visualisation and

dimension-reduction techniques commonly used with ecological data. Generally, such

methods aim to reduce the dimensionality of high-dimensional data down to two or

three dimensions for graphical display, while preserving as much of the structure of

the data as possible. The application of such methods in ecology is commonly to

species abundance data (i.e. the observed abundances of species at a set of sample

sites), usually with accompanying environmental data relating to the sample sites. The

aim is to visualise the relationships between species assemblage patterns, and relate

these to environmental variables. Methods can be broadly divided into direct and

indirect gradient analysis methods. The former constructs the a visualisation on the

basis of the patterns in the species data alone, without reference to the environmental

data. Relating these patterns to the environmental variables is a subsequent step in the

analyses. Such methods include principal components analysis (Hotelling, 1933) and

multidimensional scaling (Kruskal, 1964a; b). Direct (also known as constrained)

methods — such as canonical correspondence analysis (Ter Braak, 1986) — differ in

that the visualisation is constrained to show only the variation in the species data that

can be explained by an a priori selected set of environmental variables. The network

19

Page 20: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

approach that we have described is an indirect method, since the network structure is

determined by a subset of all variables: those variables chosen as “edge” variables.

The remaining variables in the data set form the attributes of the nodes and edges, and

are used in subsequent exploratory phases of the analysis.

The network approach is perhaps most closely related to multidimensional scaling

(MDS) and its variants. MDS produces a low-dimensional representation of a

multidimensional data set such that the geometric proximity of any two points in the

low-dimensional representation conveys the degree of similarity between the two

associated data points in the original high-dimensional space. Points that are close

together on the MDS diagram are likely to be similar in terms of their associated data.

MDS uses an objective function (called the “stress”) that relates the geometric

configuration of the points in the diagram to the pairwise dissimilarities in high

dimensional space. High stress values indicate a poor overall match between the two.

Networks, in contrast, do not attempt to show inter-point similarity by geometric

proximity, but rather use edges to explicitly show the relationships between points

(nodes) in the network. There is no stress value associated with a network.

As mentioned previously, the positioning of the nodes in a network can be done

according to a variety of layout algorithms. It is often the case that related entities

(nodes) will tend to be placed near to each other in a layout by virtue of the edges that

connect them. However, in general, the geometric proximity of two nodes in a

network cannot be directly interpreted as an indicator of the degree of relatedness of

the associated data points. It is the edges that convey this information.

20

Page 21: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

The network-based approach is superficially similar to the idea of drawing a

minimum spanning tree on an MDS plot or other ordination diagram, to assist with

interpretation (Gower and Ross, 1969; Gillison, 1978). A spanning tree is a graph,

with no closed loops, that connects all objects in the MDS plot. The cost of a given

spanning tree is the sum of its edge weights (i.e. the sum of the pairwise

dissimilarities represented by the tree), and so a minimum spanning tree (MST) is one

that has the minimum cost of all possible spanning trees for a given plot. Thus, an

MST will tend to connect closely-related points, and can be used to gain an

impression of local ordering of points along a gradient (Austin, 1976). An MST is a

special case of a network, showing only a specific, minimal subset of pairwise

dissimilarities. It can be used as a component of a visualisation algorithm, but not as a

visualisation technique in its own right.

The explicit visual representation of relationships by edges in a network can be useful

with data that are difficult to represent clearly in an MDS diagram — for example,

four or more points in a two-dimensional MDS plot that are equally dissimilar from

each other. While the stress value of an MDS plot gives an indication of the overall fit

to the data, it does not identify individual points or areas of the diagram that are

poorly represented. Assessing the fit of individual points in an MDS plot requires an

additional step, such as calculating the contribution of individual points to the overall

stress. In a network, however, relationships that are poorly represented by the overall

geometric configuration of the nodes can still be identified by the edges in the

network. The drawback is that a network with a large number of edges can be

cluttered and difficult to interpret.

21

Page 22: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

The edges in a network show direct relationships between entities, and so a network

explicitly depicts the local structure within the data, allowing the topology of the

system as a whole to emerge from these individual relationships. This global-from-

local approach has also been used in a number of visualisation algorithms (e.g.

Sammon, 1969; Demartines and Hérault, 1997; Roweis and Saul, 2000). It is also

used in various ecological analytical techniques. One of the long-standing difficulties

in numerical ecology is the robust estimation of large ecological distances (Faith et

al., 1987). The difficulty arises when two objects have little or nothing in common.

These distant relationships can be estimated as a function of intermediate local

relationships (e.g. De'ath, 1999), or given special consideration in other ways (e.g.

Belbin, 1991). Local methods are generally also more robust to outliers in the data.

Outliers tend to have relatively large dissimilarities to other data points and so can

have a disproportionately large effect on the configuration of an MDS. This is often

addressed by selective removal of the outliers or by data transformation. Local

methods, including networks, are more robust against outliers because only the

immediate structure around an outlier is likely to be affected. One of the particular

disadvantages of local approaches is that the data manifold must be sufficiently

densely sampled, in order that neighbouring points in data space really do represent

closely-related entities. Sparse data sets might thus be poorly suited to these

approaches. In the approach presented here, sparse data will tend to form

disconnected networks in which relatively dense subsets of the data form individual

sub-networks.

The explicit characterisation of relationships by edges in a network has potential

advantages for the visual representation of complex data, as noted above, but perhaps

22

Page 23: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

more interestingly, allows subsequent exploration and analyses to focus on the

relationships themselves. Whereas a typical MDS analysis might examine the

attributes of the objects in relation to their geometric configuration (e.g. how the

sample site environmental variables vary with respect to the MDS configuration), a

network additionally allows the attributes of the edges to be examined. The edges in

the network also define its structure: how the different elements of the system relate to

one another. Many graph-theoretic algorithms have been developed for the analysis of

network structures, including clustering, outlier detection, and network traversal.

Visual exploratory analyses are often complemented by the use of algorithms such as

clustering and outlier detection. Applying graph-theoretic versions of such algorithms

can complement the information obtained by applying similar, but non-network

algorithms, to the same data. For example, clustering algorithms attempt to partition a

set of entities into a relatively small number of groups in such a way that entities

within a given group are more similar to each other than they are to entities from other

groups. The groups can then be used as a more succinct representation of the set as a

whole. Measures of similarity are conventionally based on the properties of the

entities of interest – for example, a cluster analysis of sediment samples might be

based on their chemical properties. In a network context, clustering similarly attempts

to create a partitioning of the entities (nodes), but the partitioning is based on the

topology of the network rather than the properties of the individual entities. Clustering

can simplify a complex network, in the same way that it can reduce the cardinality of

large data sets, facilitating the description and understanding of the various

components of the network. Clustering can also assist with visualisation, by reducing

the visual complexity of very large networks (Newman and Girvan, 2004), and by

obtaining more relevant representations of systems for which conventional networks

23

Page 24: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

are insufficient (Estrada and Rodríguez-Velázquez, 2005). There is a diversity of

network clustering algorithms and the reader is referred to e.g. Newman (2003) for an

overview of the field. Some are analogous to clustering methods used with non-

networked data, such as mixture model clustering (Newman and Leicht, 2007), or

hierarchical algorithms that successively merge nodes according to various criteria

(Clauset et al., 2004; Newman, 2004). Others have roots in network theory, including

methods based on graph partitioning (Wu and Leahy, 1993; Shi and Malik, 2000) and

edge betweenness (Newman and Girvan, 2004). Many of the latter have no clear

analogy to non-network algorithms. It is not yet well understood how structural

measures of a network (like betweenness) might relate to ecologically relevant

measures (Proulx et al., 2005), and so it is not clear how the clusterings produced by

such algorithms relate to the processes of the underlying ecosystem. However,

because network-based clustering methods utilise topological characteristics, it seems

likely that network-based methods can provide insights to complement those obtained

from more conventional methods of analysis. We have discussed clustering

algorithms in some detail here, and note that other exploratory algorithms, such as

outlier detection (Shekhar et al., 2001; Noble and Cook, 2003; Rattigan and Jensen,

2005), can be carried out in a network context, and so might similarly offer

complementary insights to conventional analyses.

Recent applications of network-theoretic theory to ecosystem analyses suggest that

many natural networks share certain structural characteristics. For example, the node

degree distribution (the distribution of the number of edges per node) of many natural

networks follows an asymmetric distribution, such that there are many nodes with

only a few edges, and only a few nodes with many edges (Allesina and Bodini, 2005).

24

Page 25: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

The node degree distribution of the sites-by-species network of Figure 2 bears little

resemblance to this, with roughly a skewed normal distribution with a long left tail

(not shown). Is this a genuine difference in network topology, with implications for

the underlying ecology? The interpretation is complicated by the nature of the edges

in our network. In the majority of the well-studied types of ecological networks (e.g.

food webs), the edges represent tangible interactions between entities (in the case of a

food web, trophic interactions). The edges in networks of species by sites, or sites by

species, such as those shown above, are subtly different. These edges represent

species (or site) similarities. Two species with similar distributions across sites thus

might be linked by an edge, but this does not imply that these two species necessarily

directly interact. Edges in such networks should perhaps be interpreted as indicators

only of potential interactions. It is not clear how to interpret topological descriptors

such as node degree distribution that are based on potential – rather than actual –

interactions.

Network approaches offer several possibilities for data integration during exploratory

phases of analysis. Modern ecological studies increasingly use online databases for

data management and delivery, with OBIS (http://iobis.org/) and GBIF

(

18

http://www.gbif.org/) being early and widely-known examples. Our algorithm can be

applied directly to data delivered through these services, offering a potential means

for exploring and visualising a user’s own data in the context of other data available

through such databases. Other online resources provide analytical services rather than

data delivery – for example, the food web constructor (

19

20

21

22

http://spire.umbc.edu/fwc/),

and OBIS-SEAMAP habitat modelling services (Best et al., 2007). Many of these

analytical resources fall within the fields of knowledge representation and other

23

24

25

25

Page 26: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

semantic technologies, and so provide information about relationships between

entities. This relationship information could be used to form edges in a network;

indeed, directed graphs are already commonly used to visualise the relationships

described by semantic-web RDF documents (e.g. IsaViz,

1

2

3

4

http://www.w3.org/2001/11/IsaViz/). The extraction of relationship information from

semantic sources is in development for biological data (Köhler et al., 2006). Networks

and MDS can visually distinguish different types of entity (e.g. by colour or shape).

By varying the visual characteristics of edges, a network can also distinguish different

types of relationship. Edges are sometimes drawn on MDS plots for this reason (e.g.

Starzomski and Srivastava, 2007). Networks can be used to integrate ecological with

other types of data such as economic (Janssen et al., 2006), not merely in visual

syntheses but in analytical integrations that allow the effects of human impacts on

ecosystems to be explored (Dambacher et al., 2003; Fath, 2004; Rooney et al., 2006;

Dambacher and Ramos-Jiliberto, 2007).

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

Conclusions

The idea of using networks for data visualisation and exploratory analyses is not new,

but its practical application in the ecological sciences seems to be uncommon.

We speculate that this limited uptake might be due to both a general lack of

appreciation of the wide applicability of such methods, and a historical lack of

readily-available software for such analyses. The latter has been addressed by

developments in both general network-analysis software such as GUESS, and of

ecological-specific network analysis software (e.g. Allesina and Bondavalli, 2004;

Fath and Borrett, 2006). We hope that the work presented here might go some way

toward addressing the former. While some data sets or applications may have a

26

Page 27: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

natural definition of connectivity that provides an intuitive view of the system, the

algorithm presented here allows the nodes and edges of a network to be formulated

from an arbitrary matrix of data. The analyst can generate networks from different

combinations of variables, and examine the effects of varying parameters such as the

scales of spatial, temporal, and taxonomic aggregation. This flexibility allows the

analyst to rapidly gain a number of perspectives on the data and providing a powerful

mechanism for exploration. Using network approaches in exploratory phases of

analysis may reveal hitherto unsuspected structural properties and so prompt the

analyst to consider the use of network-based approaches in later, more formal phases

of analysis. The fundamental structures and processes of ecological and biological

networks are still being discovered, and network-based methods of visualisation and

exploration both facilitate these discoveries as well as providing insights that can

complement other methods.

Acknowledgements

This work received support and critical input from a number of people, including Lee

Belbin, Eric Woehler, Jonny Stark, and Victoria Wadley. Comments from John

Leathwick and an anonymous referee improved the manuscript considerably. We

would like to thank K. Takahashi, and the Master and crew of the RV Tangaroa for

collecting the CPR samples, and T. Odate for making available the underway data.

References

Adar, E., 2005. GUESS: The Graph Exploration System.

Albert, R. and Barabási, A.L., 2002. Statistical mechanics of complex networks.

Reviews of Modern Physics, 74:47-97.

27

Page 28: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

Allesina, S. and Bodini, A., 2005. Food web networks: scaling relation revisited.

Ecological Complexity, 2:323-338.

Allesina, S. and Bondavalli, C., 2004. WAND: an ecological network analysis user-

friendly tool. Environmental Modelling and Software, 19:337-340.

Aloy, P. and Russell, R.B., 2004. Taking the mystery out of biological networks.

European Molecular Biology Organization Reports, 5:349-350.

Austin, M., 1976. Performance of four ordination techniques assuming three different

non-linear species response models. Vegetatio, 33:43-49.

Belbin, L., 1991. Semi-strong hybrid scaling, a new ordination algorithm. Journal of

Vegetation Science, 2:491-496.

Berlow, E.L., 1999. Strong effects of weak interactions in ecological communities.

Nature, 398:330-334.

Berlow, E.L., Neutel, A.-M., Cohen, J.E., De Ruiter, P.C., Ebenman, B., Emmerson,

M., Fox, J.W., Jansen, V.A.A., Iwan Jones, J., Kokkoris, G.D., Logofet, D.O.,

McKane, A.J., Montoya, J.M. and Petchey, O., 2004. Interaction strengths in food

webs: issues and opportunities. Journal of Animal Ecology, 73:585-598.

Bernard, K.S. and Froneman, P.W., 2002. Mesozooplankton community structure in

the Southern Ocean upstream of the Prince Edward Islands. Polar Biology, 25:597-

604.

Bernard, K.S. and Froneman, P.W., 2003. Mesozooplankton community structure and

grazing impact in the Polar Frontal Zone of the south Indian Ocean during the austral

autumn 2002. Polar Biology, 26:268-275.

28

Page 29: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

Best, B.D., Halpin, P.N., Fujioka, E., Read, A.J., Qian, S.S., Hazen, L.J. and Schick,

R.S., 2007. Geospatial web services within a scientific workflow: predictiing marine

mammal habitats in a dynamic environment. Ecological Informatics, submitted.,

2:210-223.

Brose, U., Berlow, E.L. and Martinez, N.D., 2005. From food webs to ecological

networks: linking non-linear trophic interactions with nutrient competition. In: P.C. de

Ruiter, V. Wolters and J.C. Moore (Editor), Dynamic Food Webs: Multispecies

Assemblages, Ecosystem Development and Environmental Change. Academic Press,

pp. 27-36.

Cavalieri, D., Parkinson, C., Gloersen, P. and Zwally, H.J., 1996, updated 2006. Sea

ice concentrations from Nimbus-7 SMMR and DMSP SSM/I passive microwave data.

Boulder, Colorado USA: National Snow and Ice Data Center. Digital media.

Christensen, V. and Pauly, D., 1992. Ecopath II: a software for balancing steady-state

ecosystem models and calculating network characteristics. Ecological Modelling,

61:169-185.

Clauset, A., Newman, M.E.J. and Moore, C., 2004. Finding community structure in

very large networks. Physical Review E, 70.

Dale, M.B., 2000. On plexus representation of dissimilarities. Community Ecology,

1:43-56.

Dambacher, J.M., Luh, H.-K., Li, H.W. and Rossignol, P.A., 2003. Qualitative

Stability and Ambiguity in Model Ecosystems. The American Naturalist, 161:876-

888.

29

Page 30: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

Dambacher, J.M. and Ramos-Jiliberto, R., 2007. Understanding and predicting effects

of modified interactions through a qualitative analysis of community structure.

Quarterly Review of Biology, in press.

De'ath, G., 1999. Extended dissimilarity: a method of robust estimation of ecological

distances from high beta-diversity data. Plant Ecology, 144:191-199.

Demartines, P. and Hérault, J., 1997. Curvilinear components analysis: a self-

organizing neural network for nonlinear mapping of data sets. IEEE Transactions on

Neural Networks, 8:148-154.

Donelly, J., Sutton, T.T. and Torres, J.J., 2006. Distribution and abundance of

micronekton and macrozooplankton in the NW Weddell Sea: relation to a spring ice-

edge bloom. Polar Biology, 29:280-293.

Eicken, H., 1992. The role of sea ice in structuring Antarctic ecosystems. Polar

Biology, 12:3-13.

Estrada, E. and Rodríguez-Velázquez, J.A., 2005. Complex Networks as

Hypergraphs. eprint arXiv:physics/0505137.

Fahrig, L. and Merriam, G., 1985. Habitat patch connectivity and population survival.

Ecology, 66:1762-1768.

Faith, D.P., Minchin, P.R. and Belbin, L., 1987. Compositional dissimilarity as a

robust measure of ecological distance. Vegetatio, 69:57-68.

Fath, B.D., 2004. Network analysis in perspective: comments on "WAND: an

ecological network analysis user-friendly tool". Environmental Modelling &

Software, 19:341-343.

30

Page 31: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

Fath, B.D. and Borrett, S.R., 2006. A MATLAB function for network environ

analysis. Environmental Modelling & Software, 21:375-405.

Fath, B.D. and Patten, B.C., 1999. Review of the foundations of network environ

analysis. Ecosystems, 2:167-179.

Frick, A., Ludwig, A. and Mehldau, H., 1995. A Fast Adaptive Layout Algorithm for

Undirected Graphs. Lecture Notes in Computer Science, Proceedings of the Graph

Drawing Conference 1994, 894:388-403.

Gillison, A.N., 1978. Minimum spanning ordination — a graphic-analytical technique

for three-dimensional ordination display. Austral Ecology, 3:233-238.

Gower, J.C. and Ross, G.J.S., 1969. Minimum spanning trees and single linkage

cluster analysis. Applied Statistics, 18:54-64.

Green, J.L., Hastings, A., Arzberger, P., Ayala, F.J. and Cottingham, K.L.e.a., 2005.

Complexity in ecology and conservation: mathematical, statistical, and computational

challenges. BioScience, 55:501-510.

Herman, I., Melançon, G. and Marshall, S.M., 2000. Graph Visualization and

Navigation in Information Visualization: a Survey. IEEE Transactions on

Visualization and Computer Graphics, 6:24-44.

Hopkins, T.L., 1985. Food web of an Antarctic midwater ecosystem. Marine Biology,

89:197-212.

Hopkins, T.L., 1987. Midwater food web in McMurdo Sound, Ross Sea, Antarctica.

Marine Biology, 96:93-106.

31

Page 32: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

Hosie, G.W., Fukuchi, M. and Kawaguchi, S., 2003. Development of the southern

ocean continuous plankton recorder survey. Progress in Oceanography, 58:263-283.

Hotelling, H., 1933. Analysis of a complex of statistical variables into principal

components. Journal of Educational Psychology, 24:417-441,498-520.

Hunt, B.P.V. and Hosie, G.W., 2003. The continuous plankton recorder in the

Southern Ocean: a comparative analysis of zooplankton communities sampled by the

CPR and vertical net hauls along 140°E. Journal of Plankton Research, 25:1561-1579.

Hunt, B.P.V. and Hosie, G.W., 2005. Zonal structure of zooplankton communities in

the Southern Ocean south of Australia: results from a 2150km continuous plankton

recorder transect. Deep-Sea Research I, 52:1241-1271.

Hunt, B.P.V. and Hosie, G.W., 2006a. The seasonal succession of zooplankton in the

Southern Ocean south of Australia, part I: The seasonal ice zone. Deep-Sea Research

I, 53:1182-1202.

Hunt, B.P.V. and Hosie, G.W., 2006b. The seasonal succession of zooplankton in the

Southern Ocean south of Australia, part II: The Sub-Antarctic to Polar Frontal Zones.

Deep-Sea Research I, 53:1203-1223.

Janssen, M.A., Bodin, Ö., Anderies, J.M., Elmqvist, T., Ernstson, H., McAllister,

R.R.J., Olsson, P. and Ryan, P., 2006. A network perspective on the resilience of

social-ecological systems. Ecology and Society, 11:15.

Jeger, M.J., Pautasso, M., Holdenrieder, O. and Shaw, M.W., 2007. Modelling disease

spread and control in networks: implications for plant sciences. New Phytologist,

174:279-297.

32

Page 33: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

Jordán, F., 2001. Strong threads and weak chains? - a graph theoretical estimation of

the power of indirect effects. Community Ecology, 2:17-20.

Jordán, F. and Scheuring, I., 2004. Network ecology: topological constraints on

ecosystem dynamics. Physics of Life Reviews 1, 1:139-172.

Kamada, T. and Kawai, S., 1989. An algorithm for drawing general undirected

graphs. Information Processing Letters, 31:7-15.

Köhler, J., Philippi, S., Specht, M. and Rüeg, A., 2006. Ontology based text indexing

and querying for the semantic web. Knowledge-Based Systems, 19:744-754.

Kruskal, J.B., 1964a. Multidimensional scaling by optimizing goodness of fit to a

nonmetric hypothesis. Psychometrika, 29:1-27.

Kruskal, J.B., 1964b. Nonmetric multidimensional scaling: a numerical method.

Psychometrika, 29:115-129.

Lázaro, A., Mark, S. and Olesen, J.M., 2005. Bird-made fruit orchards in northern

Europe: nestedness and network properties. Oikos, 110:321-329.

Lizotte, M.P., 2001. The contributions of sea ice algae to Antarctic marine primary

production. American Zoologist, 41:57-73.

Madin, L.P. and Harbison, G.R., 1977. The associations of Amphipoda

Hyperiidea with gelatinous zooplankton. I. Associations

with Salpidae. Deep Sea Research, 24:449-463.

Matthews, J.A., 1978. An Application of Non-Metric Multidimensional Scaling to the

Construction of an Improved Species Plexus. Journal of Ecology, 66:157-173.

33

Page 34: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

McIntosh, R.P., 1973. Matrix and plexus techniques. In: R.H. Whittaker (Editor),

Ordination and Classification of Communities. Junk, The Hague, pp. 159-191.

Memmott, J., 1999. The structure of a plant-pollinator food web. Ecological Letters,

2:276-280.

Newman, M.E.J., 2002. The spread of epidemic disease on networks. Phys. Rev. E,

66:016128.

Newman, M.E.J., 2003. The structure and function of complex networks. Scientific

American.

Newman, M.E.J., 2004. Fast algorithm for detecting community structure in networks.

Physical Review E, 69.

Newman, M.E.J. and Girvan, M., 2004. Finding and evaluating community structure

in networks. Phys. Rev. E, 69:026113.

Newman, M.E.J. and Leicht, E.A., 2007. Mixture models and exploratory data

analysis in networks. Proc Nat Acad Sci USA, in press.

Nicol, S., 2006. Krill, currents, and sea ice: Euphausia superba and its changing

environment. BioScience, 56:111-120.

Noble, C.C. and Cook, D.J., 2003. Graph-based anomaly detection. In: L. Getoor,

T.E. Senator, P. Domingos and C. Faloutsos (Editor). ACM, Washington, DC, USA,

pp. 631-636.

Orsi, A., Whitworth, T., III and Nowlin, W.D., Jr, 1995. On the meridional extent and

fronts of the Antarctic Circumpolar Current. Deep-Sea Research, 42:641-673.

34

Page 35: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

Paine, R.T., 1984. Ecological determinism in the competition for space. Ecology,

65:1339-1348.

Pakhomov, E.A., Froneman, P.W. and Perissinotto, R., 2002. Salp/krill interactions in

the Southern Ocean: spatial segregation and implications for the carbon flux. Deep-

Sea Research I, 49:1881-1907.

Pimm, S.L., 1982. Food Webs. Chapman and Hall, London.

Proulx, S.R., Promislow, D.E.L. and Phillips, P.C., 2005. Network thinking in

ecology and evolution. TRENDS in Ecology and Evolution, 20:345-353.

Rattigan, M.J. and Jensen, D., 2005. The Case for Anomalous Link Detection. In: S.

Džeroski and H. Blockeel (Editor), Chicago.

Rhodes, M., Wardell-Johnson, G.W., Rhodes, M.P. and Raymond, B., 2006. Applying

network theory to the conservation of habitat trees in urban environments: a case

study from Brisbane, Australia. Conservation Biology, 20:861-870.

Rohr, J.R., Kerby, J.L. and Sih, A., 2006. Community ecology as a framework for

predicting contaminant effects. TRENDS in Ecology and Evolution, 21:606-613.

Rooney, N., McCann, K., Gellner, G. and Moore, J.C., 2006. Structural asymmetry

and the stability of diverse food webs. Nature, 442:265-269.

Roweis, S.T. and Saul, L.K., 2000. Nonlinear dimensionality reduction by locally

linear embedding. Science, 290:2323-2326.

Sammon, J.W., 1969. A nonlinear mapping for data structure analysis. IEEE

Transactions on Computers, 18:401-409.

35

Page 36: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

Shekhar, S., Lu, C.T. and Zhang, P., 2001. Detecting graph-based spatial outliers:

algorithms and applications (a summary of results). In: F. Provost and R. Srikant

(Editor), Proceedings of the Seventh ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining, pp. 371-376.

Shi, J. and Malik, J., 2000. Normalized cuts and image segmentation. IEEE

Transactions on Pattern Analysis and Machine Intelligence, 22:888-905.

Shirley, M.D.F. and Rushton, S.P., 2005. The impacts of network topology on disease

spread. Ecological Complexity, 2:287-299.

Starzomski, B.M. and Srivastava, D.S., 2007. Landscape geometry determines

community response to disturbance. Oikos, 116:690-699.

Ter Braak, C.J.F., 1986. Canonical Correspondence Analysis: a new eigenvector

technique for multivariate direct gradient analysis. Ecology, 67:1167-1179.

Tokioka, T., 1961. Appendicularians of the Japanese Antarctic Research Expedition.

Bulletin of Marine Biological Station of Asamushi, 5:241-245.

Ulanowicz, R.E., 1986. Growth and development: ecosystem phenomenology.

Springer-Verlag, New York, New York, USA.

Urban, D. and Keitt, T., 2001. Landscape connectivity: a graph-theoretic perspective.

Ecology, 82:1205-1218.

Watts, D.J. and Strogatz, S.H., 1998. Collective dynamics of 'small-world' networks.

Nature, 393:440-442.

36

Page 37: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

2

3

4

5

6

7

8

9

Whittaker, R.H. and Warren Fairbanks, C., 1958. A Study of Plankton Copepod

Communities in the Columbia Basin, Southeastern Washington. Ecology, 39:46-65.

Williams, J.W.J., 1964. Algorithm 232 - Heapsort. Communications of the ACM,

7:347-348.

Wu, Z. and Leahy, R., 1993. An optimal graph theoretic approach to data clustering:

theory and application to image segmentation. IEEE Transactions on Pattern Analysis

and Machine Intelligence, 15:1101-1113.

37

Page 38: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

Figure and table captions 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

Figure 1. The continuous plankton recorder transect (solid line) in relation to land

masses, oceanic fronts (as defined by Orsi et al., 1995), and sea ice extent. The

outward and return legs of the voyage overlap. STF=subtropical front;

SAF=subantarctic front; PF=polar front; SACCF=southern Antarctic circumpolar

current front; SB=southern boundary of the Antarctic circumpolar current; Ice=mean

maximum October sea ice extent.

Figure 2. A network of the continuous plankton recorder data, in which nodes

represent tow segments (sample sites) and the edges indicate sites with common

species. The dashed line indicates the temporal progression of the ship track. The end

of the southward leg is indicated by the white star. The colours of the nodes indicate

the latitude of the segment (see scale). The labels A–E provide references to features

that are discussed in the text.

Figure 3. The network of Figure 2, with node colours changed to reflect the number of

days since sea ice melt. White nodes indicate segments taken in open ocean (no sea

ice present over the preceding winter).

Figure 4. (a) The network of Figure 2, after clustering. Segments within the same

cluster are shown with the same colour and node shape; (b) a schematic representation

of the same network, in which the nodes within each cluster have been merged.

38

Page 39: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

39

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

Figure 5. Plankton community network, in which nodes represent taxa and edges

indicate taxa with similar spatio-temporal distributions. The network has been

constructed from the full set of tow segments.

Figure 6. Plankton community networks. Each is similar to that shown in Figure 5,

but generated from a subset of tow segments. The networks (a)-(f) correspond to

clusters I-VI in Figure 4; network (g) corresponds to disconnected nodes not visible in

Figure 2. The layout of the nodes is the same as in Figure 4. The taxa in the centre of

the network shown in Figure 5 are common to all clusters, and the remainder of the

community composition varies across the clusters.

Figure 7. The network of Figure 2, showing with dark grey those nodes and edges

specifically associated with the taxa (a) Salpa thompsoni, (b) Pelagobia longicirrata,

(c) Themisto gaudichaudii, and (d) Metridia lucens.

Table 1. Taxa from the segments of clusters I – III of Figure 4. These taxa were

typically absent from edges that paired an ice-zone segment with an open-water

segment, suggesting that these taxa might distinguish the ice-zone community from

the open-water community in this region.

Page 40: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic
Page 41: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic
Page 42: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic
Page 43: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic
Page 44: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic
Page 45: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic
Page 46: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic
Page 47: Network-based exploration and visualisation of …...Network-based exploration and visualisation of ecological data 1 2 3 4 5 6 7 8 9 10 Ben Raymond and Graham Hosie Australian Antarctic

1

Taxa in ice-zone

segments but

not open water

segments

Fraction of

edges from

which taxon

was absent

Total

count

Taxa in open

water segments

but not ice-zone

segments

Fraction of

edges from

which taxon

was absent

Total

count

Pelagobia

longicirrata

1.0 11 Oikopleura sp. 0.90 32

Copepod indet. 0.90 36

Euphausia

superba

0.73 33

Rhincalanus

gigas nauplius

0.64 52

2

3

4

Table 1

1