Biophysical and Population Genetic Models Predict the ... · Samples were separately ground in...

17
Current Biology, Volume 26 Supplemental Information Biophysical and Population Genetic Models Predict the Presence of ``Phantom'' Stepping Stones Connecting Mid-Atlantic Ridge Vent Ecosystems Corinna Breusing, Arne Biastoch, Annika Drews, Anna Metaxas, Didier Jollivet, Robert C. Vrijenhoek, Till Bayer, Frank Melzner, Lizbeth Sayavedra, Jillian M. Petersen, Nicole Dubilier, Markus B. Schilhabel, Philip Rosenstiel, and Thorsten B.H. Reusch

Transcript of Biophysical and Population Genetic Models Predict the ... · Samples were separately ground in...

Current Biology, Volume 26

Supplemental Information

Biophysical and Population Genetic Models Predict

the Presence of ``Phantom'' Stepping Stones

Connecting Mid-Atlantic Ridge Vent Ecosystems

Corinna Breusing, Arne Biastoch, Annika Drews, Anna Metaxas, Didier Jollivet, Robert C.Vrijenhoek, Till Bayer, Frank Melzner, Lizbeth Sayavedra, Jillian M. Petersen, NicoleDubilier, Markus B. Schilhabel, Philip Rosenstiel, and Thorsten B.H. Reusch

Supplemental Figures

Figure S1.

Figure S2.

Supplemental Figure Legends

Figure S1. Related to Experimental Procedures (Larval dispersal modeling) and Supplemental Experimental

Procedures (Biophysical model). (A) Vertical profile of speed (mean and standard deviation, based on 5-daily data)

above Lucky Strike. (B) Stick-plot of bottom currents at the Lucky Strike vent field. The vector plot shows 5-day

average horizontal velocities (i.e., including directions) directly above the seafloor. Mean current maps of

VIKING20 at 1850 m depth (model level 29) for (C) northwestward and (D) southeastward bottom flow (identified

from B). Shaded is model bathymetry at horizontal model resolution and with intervals of vertical model resolution

Figure S2. Related to Experimental Procedures (Larval dispersal modeling) and Supplemental Experimental

Procedures (Biophysical model). Comparison of sea surface height (SSH) mean (A and C, in cm) and variance (B

and D, in cm²) between AVISO satellite (A, B) and VIKING20 model (C, D) data. SSH is a dynamic quantity for

upper-ocean current velocity. White stars indicate locations of the three modeled vent sites Menez Gwen, Lucky

Strike and Rainbow. The altimeter products were produced by Ssalto/Duacs and distributed by AVISO, with support

from Cnes (http://www.aviso.altimetry.fr/duacs/)

Supplemental Tables

Table S1. Related to Figure 1. Bathymodiolus sampling localities along the Mid-Atlantic Ridge

Locality Abbr. Latitude Longitude Depth [m] Cruise* Dive Samples

Menez Gwen MG‡,1

37°50.7'N 31°31.2'W 813–860 M:82/3

PP:BIOBAZ

690/27–761/6

N/A 50

Lucky Strike LS 37°17.0'N 32°15.0'W 1710 AT:03/3 3120 30

Rainbow RB 36°14.0'N 33°54.0'W 2251 AT:03/3 3121–3122 30

Broken Spur BS 29°10.0'N 43°10.0'W 3350 AT:03/3

AT:05/3

3125

3676 30

Snake Pit SP 23°22.0'N 44°56.0'W 3480 AT:03/3

AT:05/3

3129

3672–3674 30

Irina (Logatchev) IR‡,2

14°45.2'N 44°58.8'W 3020–3034 MSM:04/3 244/9–267/1

282/3 48

Quest (Logatchev) QS 14°45.2'N 44°58.8'W 3024–3047 MSM:04/3 267/7–271/5

313/2 30

Semenov SM 13°30.8'N 44°57.8'W 2432 PP:ODEMAR 541 40

Clueless CL‡,3

04°48.2'S 12°22.3'W 2992–2995 M:78/2

ATA

302/15

52/11 30

Lilliput LP‡,4

09°32.8'S 13°12.6'W 1489–1491 M:78/2 319/8–335/5 30

*Research vessels: M = Meteor, PP = Pourquoi Pas?, AT = Atlantis, MSM = Maria S. Merian, ATA = L'Atalante

‡5–6 samples from these sites were used for RNA sequencing to obtain reference transcriptomes for the putative

species: 1) B. azoricus, 2) B. puteoserpentis, 3) B. sp. 5°S, 4) B. sp. 9°S

Table S2. Related to Experimental Procedures (SNP marker design and DNA extraction and genotyping). Fluidigm

primer information (5' to 3'), putative gene functions and population allele frequencies of the 94 SNP markers

designed in this study. The comment column indicates which locus was used in which analyses (if only a subset of

SNPs was investigated). ASP1 = SNP allele detected with allele-specific primer 1, ASP2 = SNP allele detected with

allele-specific primer 2, SNP_SEQ = sequence of the amplified fragment containing the SNP, ASP1_SEQ =

sequence of allele-specific primer 1, ASP2_SEQ = sequence of allele-specific primer 2, LSP_SEQ = sequence of

locus-specific reverse primer, STA_SEQ = sequence of forward primer for specific target amplification, AMP_GC =

proportional GC content, REFSEQ = blast annotation in the RefSeq protein database, SWISSPROT = blast

annotation in the Swiss-Prot database, TREMBL = blast annotation in the TrEMBL database, FREQ_ASP1 =

frequency of SNP amplified with ASP1 in the respective population. Please refer to separate Excel sheet

Table S3. Related to Experimental Procedures (MSAT marker design and DNA extraction and genotyping). Primer

pools and sequences for the 9 microsatellites that were used in this study. c64270_g3_i1 was excluded from the

analysis of contemporary migration rates due to low polymorphism

Pool Locus Primer 5'-label Sequence Fragment [bp] Period

1

c27696_g2_i1 BMAR1F

BMAR1R

HEX

GCGTTATAACACCAAAACTCT

AACCACAGTCATTCACAAGG 127–159 4

c63755_g5_i1 BMAR4F

BMAR4R

HEX

TGGAGGCCAGGTTGTTTT

TGTTGCAAAGGGACATAAACAG 180–252 4

c60427_g1_i2 BMAR14F

BMAR14R

6-FAM

CCACATCAACACAAGTAGAAAGC

AACCTGTTGTGTTCACCGTC 183–210 3

2

c61565_g2_i3 BMAR5F

BMAR5R

6-FAM

GAAAAGTCAGCACCATGGCT

TGCTTTGTCCTGTAAACGCT 192–213 3

c55227_g3_i3 BMAR10F

BMAR10R

HEX

GACGTTTGACCAGAATAGGGG

GGGTTCTGGACAAATTCTCTGT 131–200 3

c43280_g1_i1 BMAR12F

BMAR12R

NED

GCTTCGCTTCTTCCTTTCTTCT

GGCAAGACAATAATTCCAGACGA 135–204 3

3

c42547_g1_i1 BMAR8F

BMAR8R

6-FAM

AAAAGCTGGGTTATATACTGCA

AGGGGTTGTATGACTAGGAAC 241–273 4

c50564_g1_i1 BMAR16F

BMAR16R

HEX

CACTCTTACAGGCTAGGATCCA

CGTCTTTCTTCCCCACAACA 246–267 3

c64270_g3_i1 BMAR21F

BMAR21R

NED

TGGTCGAATGAAGAGGAGCT

ACTTGTATCCATGGCCTCCT 159–189 3

Table S4. Related to Experimental Procedures (FST-outlier test). FST-outlier test using BayeScan. Only naturally

selected loci (consistent q < 0.1) are shown. Positive α values indicate positive selection, while negative ones imply

balancing or purifying selection. Prob = posterior probability of the model, log10(PO) = logarithm of Posterior Odds

to base 10 for the model, type of selection = positive (+) or negative/balancing (‒) selection between pairs of species.

No significant outliers within species were found

Locus Prob log10(PO) α FST Type of selection

c54079_g1_i1 0.8302 0.6891 0.9354 0.6406 + B. azoricus B. puteoserpentis

c44266_g1_i1 0.9754 1.5982 -1.7525 0.1975 ‒ B. azoricus B. puteoserpentis

c54250_g1_i1 0.8524 0.7615 -1.6606 0.2203 ‒ B. azoricus B. puteoserpentis

c14746_g1_i1 0.8396 0.7188 -1.6874 0.2187 ‒ B. azoricus B. puteoserpentis

c61080_g8_i1 0.8816 0.8718 1.0554 0.6608 + B. azoricus B. puteoserpentis

c62359_g8_i4 0.9078 0.9932 1.0744 0.6643 + B. azoricus B. puteoserpentis

c63335_g5_i2 0.9508 1.2860 -1.2576 0.2618 ‒ B. azoricus B. puteoserpentis

c34434_g1_i1 0.9848 1.8114 1.2899 0.7004 + B. azoricus B. puteoserpentis

c54783_g1_i2 0.9970 2.5215 2.0027 0.7980 + B. sp. 5°S B. puteoserpentis

c35009_g1_i1 1.0000 1000.0000 -2.7882 0.0936 ‒ B. azoricus all other species

c63790_g3_i1 0.9998 3.6988 -1.9354 0.1708 ‒ B. azoricus B. puteoserpentis

c58708_g7_i1 0.9996 3.3977 1.7073 0.7621 + B. azoricus B. puteoserpentis

c42320_g1_i1 0.8720 0.8332 -1.1034 0.2867 ‒ B. azoricus B. puteoserpentis

c60277_g2_i2 0.9336 1.1479 -1.1985 0.2708 ‒ B. azoricus B. puteoserpentis

c42562_g1_i1 0.9732 1.5600 1.3626 0.7110 + B. azoricus B. puteoserpentis

Table S5. Related to Figure 3. Results of the BayesAss analysis for the pooled data set using 44 neutral molecular markers. The table shows the mean

immigration rates (fraction of individuals that are derived from source per generation) ± 95% confidence interval. Numbers in bold denote rates that were outside

the 95% confidence interval for uninformative data. NMAR = Menez Gwen, Lucky Strike, Rainbow; BS = Broken Spur; MMAR = Snake Pit, Irina, Quest,

Semenov; CL = Clueless; LP = Lilliput

SOURCE

DE

ST

INA

TIO

N

NMAR BS MMAR CL LP

NMAR 0.9884 (0.9772; 0.9996) 0.0029 (-0.0028; 0.0086) 0.0029 (-0.0028; 0.0086) 0.0029 (-0.0028; 0.0086) 0.0029 (-0.0028; 0.0086)

BS 0.0259 (-0.0029; 0.0547) 0.7084 (0.6672; 0.7496) 0.2488 (0.1957; 0.3019) 0.0085 (-0.0080; 0.0250) 0.0085 (-0.0080; 0.0250)

MMAR 0.0022 (-0.0021; 0.0065) 0.0024 (-0.0021; 0.0069) 0.9894 (0.9800; 0.9988) 0.0039 (-0.0020; 0.0098) 0.0022 (-0.0021; 0.0065)

CL 0.0104 (-0.0092; 0.0300) 0.0104 (-0.0092; 0.0300) 0.0104 (-0.0092; 0.0300) 0.9532 (0.9115; 0.9949) 0.0156 (-0.0120; 0.0432)

LP 0.0083 (-0.0076; 0.0242) 0.0083 (-0.0076; 0.0242) 0.0084 (-0.0075; 0.0243) 0.0090 (-0.0081; 0.0261) 0.9660 (0.9350; 0.9970)

Table S6. Related to Experimental Procedures (De novo transcriptome assembly). Basic assembly statistics for the

raw and filtered de novo transcriptomes of the four Bathymodiolus (sub-)species. Contigs = number of all contigs

including splice variants. Loci = number of different "genes". CEGs = 248 low copy core genes that are ultra-

conserved in eukaryotes and are used for assessing the completeness and quality of assemblies, ORFs = unique

contigs with open reading frame

B. azoricus B. puteoserpentis B. sp. 5°S B. sp. 9°S

Raw

Contigs 173300 177587 225279 345535

Loci 122630 124196 158654 293714

Median contig length [bp] 439 442 412 279

Mean contig length [bp] 882.22 887.30 826.36 464.49

Assembled bases 152889522 157572715 186162113 160498529

GC content [%] 34.42 34.26 34.18 33.87

Filtered

Contigs 108448 134384 132632 319766

Loci 71371 91462 84326 274115

Median contig length [bp] 616 515 589 286

Mean contig length [bp] 1081.89 970.14 1037.03 457.37

Assembled bases 117328861 130370673 137543597 146252403

GC content [%] 34.77 34.44 34.54 33.74

CEGs [%]

Partial 100 99.60 100 95.56

Complete 99.19 97.58 97.18 78.23

Proteins

BLAST 36946 35870 39906 36863

Swiss-Prot 23707 24052 26581 22531

TrEMBL 36562 35528 39496 36316

RefSeq 32360 31415 34830 31434

ORFs 49268 49441 55137 48776

Supplemental Experimental Procedures

RNA extraction and high-throughput transcriptome sequencing

Samples were separately ground in liquid nitrogen and incubated at 4°C in RNAlater overnight to prepare samples

for RNA purification. This step has improved RNA isolation efficiency from mussel tissues in our lab. RNA was

subsequently isolated using a modified protocol of the RNeasy Mini Kit (Qiagen, Germany) for spin technology. To

ensure complete tissue homogenization, we prolonged the incubation time in RLT-β-mercaptoethanol buffer to 10

min and disrupted the samples with VWR VDI 12 and QIAshredder homogenizers (Qiagen, Germany) before adding

the lysates to the extraction columns. Prior to RNA elution, samples were put under a fumehood for 3 min to let the

remaining ethanol evaporate and then incubated for 10 min in RNAse free water (elution solution) on ice to improve

diffusion into the column membrane. Potential DNA contaminations were removed with the DNA free DNAse

Treatment & Removal Kit (Invitrogen/Ambion, Germany). RNA integrity and concentration were measured on the

Experion Automated Electrophoresis Station using the Experion RNA StdSens Analysis Kit (Bio-Rad, Germany)

before submitting samples to the Institute of Clinical Molecular Biology (Kiel, Germany) for 2×101 bp paired-end

RNA sequencing on an Illumina HiSeq2000 machine (Illumina, USA). 45 indexed sequencing libraries were

constructed from poly(A)-selected RNA with the TruSeq RNA Sample Preparation Kit v2 (350‒550 bp insert size),

while cluster generation was performed with the TruSeq PE Cluster Kit v3-cBot-HS (Illumina, USA). As the six LP

samples were needed for further RNA expression analyses of the bacterial endosymbionts [S1], RNA-Seq libraries

for these samples were not poly(A)-enriched and three of them were sequenced in a strand-specific way. For

ensuring sufficient read coverage [S2], six unstranded libraries were multiplexed on each lane of an Illumina

flowcell, whereas the three stranded LP libraries were sequenced on a separate lane.

De novo transcriptome assembly

Raw reads from multiplexed paired-end sequencing were first quality checked with FASTQC v0.10.1

(http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and then processed with FASTQ_ILLUMINA_FILTER

v0.1 (http://cancan.cshl.edu/labmembers/gordon/fastq_illumina_ filter/) to remove Y-tagged reads that failed to pass

the internal quality filter of the Casava 1.8+ pipeline. As transcriptome accuracy was more important than

completeness for our purpose, we opted for a stringent trimming strategy [S3, S4]. Adapter and read clipping was

performed with Flexbar v2.4 [S5], allowing for adapter removal in any part of the read, a minimum match of 10 bp

between adapter and sequence and not more than one mismatch or gap per 10 bases overlap. The custom adapter file

contained all Illumina oligonucleotide sequences used by the TruSeq Kits and their reverse complements. To account

for an increased error rate near the 5' end [S6] the first 15 bp of each read were removed (as suggested by the

FASTQC output), while the 3' end was trimmed until a minimum base quality of 20 (Illumina 1.8+ phred score

encoding) was reached. Reads shorter than 50 bp after processing or those with more than 5 uncalled bases were

discarded. Ribosomal sequences were removed from non-poly(A)-selected libraries with riboPicker v0.4.3 [S7],

using the following settings: -i 95 -c 90 -l 45 -dbs rrnadb. Filtered sequences were subsequently aligned against

unpublished symbiont genomes from B. azoricus (MG) and B. sp. 9°S (LP) via Bowtie2 v2.1.0 [S8] to separate the

bacterial reads (parameters: -I 0 -X 600 --very-sensitive). All paired-end sequences that did not match concordantly

were kept for assembly. To obtain comprehensive reference transcriptomes, we merged reads from all individuals

and tissues per (sub-)species. De novo assemblies were done with Trinity r20140413 [S9] using the PasaFly

algorithm with jaccard-clip option to reduce creation of chimeric transcripts. An in silico normalization to 30–50X

coverage was performed with an allowed kmer coverage deviation of 100% to decrease computational resources

(exception: LP reads were not normalized due to lower coverage). The group pairs distance was set to 600 bp, while

the minimum contig length was left at its default of 200 bp. Potential assembly artifacts with FPKM values < 1 were

removed with Trinity's support scripts "align_and_estimate_ abundance.pl" and "filter_fasta_by_rsem_values.pl",

choosing Bowtie2 v2.1.0 and RSEM v1.2.12 [S10] as read mapping and expression analysis software, respectively

(parameters: --no-mixed --no-discordant --gbar 99999999 --dpad 0 --very-sensitive). To avoid exclusion of real, but

lowly expressed transcripts, we re-integrated contigs from the discarded data set into the cleaned assembly, if they

had a significant BLASTx hit in the TrEMBL, Swiss-Prot or RefSeq protein databases (e-value threshold: 1e-20;

releases from April 2014) or contained an ORF of at least 100 aa based on maximum likelihood scores and Pfam AB

homologies as determined by TransDecoder rel16JAN2014 (https://transdecoder.github.io/). The final assembly was

clustered with an identity threshold of 99% with CD-HIT-EST v4.6.1 [S11], using the additional settings: -n 9 -r 1 -T

0 -M 50000 -g 1 -aS 0.99 -uS 0.01. This approach seemed to produce optimal results for our data sets in terms of

assembly quality and completeness as evaluated by the CEGMA pipeline ([S12]; Table S6).

SNP marker design

The B. azoricus transcriptome was chosen as a reference for inter- and intra-specific SNP detection. Cleaned tissue-

specific paired-end reads of each individual were first aligned with Bowtie2 v2.1.0 in the "very-sensitive" mode.

Subsequently, Picard tools v1.93 (http://picard.sourceforge.net) was used to mark PCR duplicates and merge the

tissue-specific BAM files for each specimen. To prevent base mismatch due to misalignment in indel regions, reads

were locally realigned with the IndelRealigner suite in GATK v3.0.0 [S13]. SNP calling was performed using the

mpileup2snp command in VarScan v2.3.7 [S14, S15]. MPILEUP files were created from each realigned BAM file

with SAMtools v1.1 [S16], applying the following settings: -C 50 -E -d 700 -Q 20 -q 10. SNPs were called, if the

respective position had a minimum read depth and base quality of 30, if each variant was supported with at least 15

reads and if the variant allele had a minimum frequency of 5%. The p-value threshold for variant calling was set to

0.05, while the minimum frequency for homozygote calls was fixed at 0.80. Variants with more than 90% support on

one strand were ignored. Raw SNP outputs for each individual were checked for false positive calls with VarScan's

"fpfilter.pl" script and all unique variants that passed the filter were saved in a list. The SNP calling procedure was

then repeated on the whole data set containing all individuals and the consensus between this approach and the pass

list was taken as the true SNP set. This set was further filtered with Reads2SNP v2.0 [S17, S18] to remove potential

paralogous variants (parameters: -min 15 -th1 0.95 -par 1 -th2 0.05 -nbth 16 -aeb -tol -spa -fis 0.0 -opt newton -rlg 50

-bqt 30 -rgt 10 -rleading 10 -rtrailing 20). Putative species-diagnostic SNPs were identified with the vcf-contrast

command in VCFtools v0.1.12a [S19] and based on the NOVELAL description in the resulting VCF files. The final

marker design only focused on bi-allelic SNPs that had a BLAST annotation, which yielded 80160 SNP variants. A

pre-selection of likely diagnostic and undiagnostic contigs containing these variants was blasted against a draft

genome of B. azoricus (T. Takeshi, unpublished dataset, Bathymodiolus genome JST project, coord. N. Satoh and A.

Tanguy) that had been made accessible to validate the uniqueness of the SNP regions and define exon-intron

boundaries for primer design. Cross-species amplifiability was evaluated based on the presence of conserved regions

using IGV v2.3 [S20]. Primers were designed in the Fluidigm D3 system (San Francisco, USA; Table S2).

MSAT marker design

MSATs were identified in the B. azoricus de novo transcriptome using TRF v4.07b [S21]. Weights for match,

mismatch and indels were set to 2, 7 and 7, respectively, while detection parameters were left at their defaults. All

detected MSATs with period sizes > 4 were discarded. To ensure amplifiability across species, we compared the

MSAT contigs to all other Bathymodiolus transcriptomes using NCBI BLAST v2.2.29+ [S22] with an e-value

threshold of 1e-20 and kept only those that had a significant hit in at least 2 out of 4 transcriptomes. This approach

yielded 83 potential MSAT candidates. We used the Geneious alignment algorithm in Geneious v8.1.3

(http://www.geneious.com; [S23]) to multiple align MSAT contigs and search for conserved regions within 150 bp

upstream and downstream of the repeat motif. Primer design was done in Primer3Web v4.0.0 [S24] with default

settings except that the minimum annealing temperature was lowered to 54°C and the maximum temperature

difference between forward and reverse primer was set to 1.5°C. Positions that contained SNPs or indels or were

outside the 150 bp flanking regions were excluded. Under these conditions primers for 21 gene regions could be

designed. Primer pairs were preliminarily triplexed into seven pools based on expected fragment lengths and

annealing temperatures. According to these theoretical groupings, unlabelled reverse primers and 5'-fluorescent

labeled forward primers (6-FAM, HEX or NED) were ordered at Metabion (Planegg, Germany) and Applied

Biosystems (Warrington, Cheshire, UK). After testing primers in single reactions and checking for polymorphism in

about 24 individuals per species, we kept 9 primer pairs, which were arranged in three pools (Table S3).

DNA extraction

DNA was extracted with the DNeasy Blood & Tissue Plate Kit (Qiagen, Germany) according to manufacturer's

suggestions with one modification. To increase DNA yield we conducted a second elution step with the previously

isolated DNA. DNA concentration and purity was measured on a NanoDrop ND-1000 spectrophotometer (Peqlab,

Germany).

SNP genotyping

SNPs were genotyped following the Fluidigm 96×96 SNP Type Genotyping protocol with optimized volumes of the

used reagents. Prior to end-point fluorescence detection on the Biomark HD system (Fluidigm, USA) we performed

a pre-amplification step with 16 cycles to make sure that enough DNA was available for each reaction. In each chip

run we included two negative controls to enable data normalization. SNP genotypes were called in the Fluidigm SNP

Genotyping Analysis software using the recommended SNP Type Normalization method. All genotype calls were

inspected by eye and manually adjusted, if necessary.

MSAT genotyping

MSATs were amplified in 10 µl triplex reactions containing 2–4 pmol of each primer, 5 µl Multiplex PCR Master

Mix (Qiagen, Germany), 1 µl template DNA and a variable amount of HPLC-H2O. Hot start at 95°C for 15 min was

followed by 26 cycles of 94°C for 30 s, 56°C for 30 s and 72°C for 1 min on Veriti Thermal Cyclers (Applied

Biosystems, Germany). The final extension was done at 72°C for 20 min. 1 µl of each PCR product was mixed with

8.75 µl Hi-Di formamide and 0.25 µl GeneScan 350 ROX Size Standard (Applied Biosystems, Germany) and then

denatured for 2 min at 95°C. Capillary electrophoresis was run on an ABI 3130xl Genetic Analyzer (Applied

Biosystems, Germany). MSAT genotypes were read with GeneMarker v1.91 [S25] using default settings in the run

wizard. Size standards for each sample were checked for correctness and adjusted by hand, if necessary. Bins were

constructed for each MSAT considering period size, peak frequency and height.

Analysis of mitochondrial ND4

Mitochondrial ND4 was amplified in a 15 µl PCR reaction using universal primers [S26, S27] and the DreamTaq

DNA Polymerase (Fermentas, Germany) as recommended by the manufacturer. PCRs were run following the

thermal cycling protocol with 28 cycles and an annealing temperature of 55°C. PCR products were submitted to the

Institute of Clinical Molecular Biology (Kiel, Germany) for bi-directional Sanger sequencing on an ABI 3730xl

Genetic Analyzer (Applied Biosystems, Germany). Sequence analysis and haplotype encoding was done as described

previously [S28].

Genetic structure and differentiation

STRUCTURE v2.3.4 [S29, S30] was used to determine the degree of genetic subdivision along the MAR. The

admixture model with correlated allele frequencies was iterated 107 times after a burnin period of 10

6. To identify the

number of genetic groups K (112), we corrected the posterior probabilities according to Evanno et al. [S31] and

investigated all bar plots by eye as suggested by Meirmans [S32]. STRUCTURE plots were converted to vector

graphics using "bar_plotter.rb" (http://evolution.unibas.ch/salzburger/software.htm). Pairwise FST values between all

sampled localities were computed in Arlequin v3.5.1.2 [S33] using a non-parametric permutation procedure (10000

replications). Post-hoc corrections were done with the Benjamini-Yekutieli False Discovery Rate method [S34].

FST-outlier test

We used BayeScan v2.1 [S35] to identify selectively adaptive FST-outlier loci that would bias the estimation of

contemporary migration rates. To investigate inter- and intra-specific patterns of selection and assess robustness of

the estimates the analysis was run with (1) the combined data set of all sampled sites, (2) data subsets including

populations from only two of the previously described species (for all pairwise comparisons), and (3) data subsets

including populations from only one species (if multiple sites had been sampled). Outliers were considered as

significant, if they had a q value < 0.1 and were identified in both the combined data set (1) and at least one of the

between-species (2) or within-species (3) data subsets. Parameter settings for each run followed suggestions in the

manual.

Biophysical model

We used the particle tracking tool Ariane [S36] with velocity data from the North Atlantic OGCM NEMO [S37] to

model dispersal between the three known vent sites MG, LS and RB. The configuration contains an eddy-resolving,

1/20° grid of the (sub-)polar North Atlantic region, VIKING20 (~30°N to 85°N; [S38, S39]), which is hosted within

the global 1/4° ORCA025 grid [S40] by using a two-way nesting approach [S41]. In the region of interest the

horizontal resolution of the grid is about 4.4 km. In the vertical, 46 depth layers are specified, the spacing of which

increases from 6 m at the surface to 250 m in the deep ocean. Characteristic layer thicknesses at the depths of the

simulated vents were between 134 m and 237 m. The seafloor of the model is defined by the ETOPO2 bathymetric

database [S42], where a partial-cell representation is used for the bottom topography [S40]. To simulate vertical

mixing a 1.5-level turbulent kinetic energy scheme is applied [S43], while discrete bi-Laplacian and isoneutral

Laplacian operators are used to model viscosity and diffusion, respectively. The model is initialized with

climatological salinity and temperature fields from Levitus et al. [S44]. Following a 30-year spin-up, simulations are

forced with atmospheric fluxes based on the CORE2 hindcast [S45, S46] with 6-hourly (wind speed, humidity, and

atmospheric temperature), daily (short- and long-wave radiation), and monthly (rain and snow) resolutions, and an

interannual variability for the time period 1948–2007. Simulated velocities are stored as 5-daily three-dimensional

averages. The VIKING20 model itself has been rigorously validated in previous studies [S39, S47] and has been

used for dispersal studies in the upper ocean [S48]. To demonstrate the reliability of our model simulations in the

area of interest, we compared sea surface height for the investigated years between AVISO satellite data and

VIKING20. Both mean and variance fields show that the model simulates the detailed current structure in the

subtropical gyre with sufficient detail, including the Azores Current seen as a variance maximum at 34°N (Figure

S2). In addition, we checked for the presence of small scale current variation around the model vent locations. It is

clear that the depths and specific bathymetric settings of the individual vents are crucial. For example, LS is deeply

located in the rift valley, with connections to the northern and southern flank of the MAR (Figure S1). As a result,

currents and its associated variability are bottom-intensified. The bathymetric setting causes a bi-directional

behavior, with current reversal on short timescales which matches the characteristics of observations [S49]. Coupling

between deep and surface flow is most likely indirect and appeared to be a complex result of oceanographic

conditions northwest and southeast of the MAR. A more thorough investigation of the dynamics of deep flow was

beyond the scope of this study. The most southern vent, RB, might occasionally be enveloped by the Azores Current.

In some years its mesoscale variability seems to be connected to the deep flow and might influence retention rates of

larval dispersal. However, we see no obvious connection between surface conditions and deep flow.

Calculation of particle starting positions and dispersal probabilities

At the East Pacific Rise, larvae can be entrained in rising hydrothermal plumes, which results in an inverse

relationship between larval abundance and distance from the seafloor [S50]. The observations for the bivalve

Bathymodiolus thermophilus (Figure 3 in [S50]; on-vent pattern represented by black bars) were best described by a

power function [y = 0.1619x0.377

(R² = 0.79), where y denotes the number of larvae per m³ at height x in m above

bottom]. In our study, release positions of larvae were within 400 m above the seafloorthe maximum height of

hydrothermal plumes in the Atlantic [S51]. After release, larvae were allowed to drift in any depth with the three-

dimensional ocean currents, while no assumptions about mortality were made at any time of the simulations. Larval

dispersal probabilities were computed with the FORTRAN script "tcdfprob.f90" [S52] using a horizontal resolution

of 0.1° and a vertical resolution of 10 m.

Supplemental References

S1. Sayavedra, L., Kleiner, M., Ponnudurai, R., Wetzel, S., Pelletier, E., Barbe, V., Satoh, N., Shoguchi, E., Fink,

D., Breusing, C., et al. (2015). Abundant toxin-related genes in the genomes of beneficial symbionts from

deep-sea hydrothermal vent mussels. eLife 4, e07966.

S2. Francis, W.R., Christianson, L.M., Kiko, R., Powers, M.L., Shaner, N.C., and Haddock, S.H.D. (2013). A

comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome

assembly. BMC Genomics 14, 167.

S3. Del Fabbro, C., Scalabrin, S., Morgante, M., and Giorgi, F.M. (2013). An extensive evaluation of read

trimming effects on Illumina NGS data analysis. PLOS ONE 8, e85024.

S4. MacManes, M.D. (2014). On the optimal trimming of high-throughput mRNA sequence data. Front. Genet. 5,

13.

S5. Dodt, M., Roehr, J.T., Ahmed, R., and Dieterich, C. (2012). FLEXBAR‒Flexible barcode and adapter

processing for next-generation sequencing platforms. Biology 1, 895–905.

S6. Van Gurp, T.P., McIntyre, L.M., and Verhoeven, K.J.F. (2013). Consistent errors in first strand cDNA due to

random hexamer mispriming. PLOS ONE 8, e85583.

S7. Schmieder, R., Lim, Y.W., and Edwards, R. (2012). Identification and removal of ribosomal RNA sequences

from metatranscriptomes. Bioinformatics 28, 433–435.

S8. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–

359.

S9. Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L.,

Raychowdhury, R., Zeng, Q., et al. (2011). Full-length transcriptome assembly from RNA-Seq data without a

reference genome. Nat. Biotechnol. 29, 644–652.

S10. Li, B., and Dewey, C.N. (2011). RSEM: accurate transcript quantification from RNA-Seq data with or

without a reference genome. BMC Bioinformatics 12, 323.

S11. Li, W., and Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or

nucleotide sequences. Bioinformatics 22, 1658–1659.

S12. Parra, G., Bradnam, K., and Korf, I. (2007). CEGMA: a pipeline to accurately annotate core genes in

eukaryotic genomes. Bioinformatics 23, 1061–1067.

S13. DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel,

G., Rivas, M.A., Hanna, M., et al. (2011). A framework for variation discovery and genotyping using next-

generation DNA sequencing data. Nat. Genet. 43, 491–498.

S14. Koboldt, D.C., Chen, K., Wylie, T., Larson, D.E., McLellan, M.D., Mardis, E.R., Weinstock, G.M., Wilson,

R.K., and Ding, L. (2009). VarScan: variant detection in massively parallel sequencing of individual and

pooled samples. Bioinformatics 25, 2283–2285.

S15. Koboldt, D.C., Zhang, Q., Larson, D.E., Shen, D., McLellan, M.D., Lin, L., Miller, C.A., Mardis, E.R., Ding,

L., and Wilson, R.K. (2012). VarScan 2: somatic mutation and copy number alteration discovery in cancer by

exome sequencing. Genome Res. 22, 568–576.

S16. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R.,

and 1000 Genome Project Data Processing Subgroup (2009). The Sequence Alignment/Map format and

SAMtools. Bioinformatics 25, 2078–2079.

S17. Tsagkogeorga, G., Cahais, V., and Galtier, N. (2012). The population genomics of a fast evolver: high levels

of diversity, functional constraint, and molecular adaptation in the tunicate Ciona intestinalis. Genome Biol.

Evol. 4, 740–749.

S18. Gayral, P., Melo-Ferreira, J., Glémin, S., Bierne, N., Carneiro, M., Nabholz, B., Lourenco, J.M., Alves, P.C.,

Ballenghien, M., Faivre, N., et al. (2013). Reference-free population genomics from next-generation

transcriptome data and the vertebrate-invertebrate gap. PLOS Genet. 9, e1003457.

S19. Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., Handsaker, R.E., Lunter, G.,

Marth, G.T., Sherry, S.T., et al. (2011). The variant call format and VCFtools. Bioinformatics 27, 2156–

2158.

S20. Robinson, J.T., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and Mesirov, J.P.

(2011). Integrative genomics viewer. Nat. Biotechnol. 29, 24–26.

S21. Benson, G. (1999). Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27,

573–580.

S22. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic local alignment search

tool. J. Mol. Biol. 215, 403–410.

S23. Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S., Cooper, A.,

Markowitz, S., Duran, C., et al. (2012). Geneious Basic: an integrated and extendable desktop software

platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649.

S24. Untergasser, A., Cutcutache, I., Koressaar, T., Ye, J., Faircloth, B.C., Remm, M., and Rozen, S.G. (2012).

Primer3‒new capabilities and interfaces. Nucleic Acids Res. 40, e115.

S25. Hulce, D., Li, X., Snyder-Leiby, T., and Liu, C.S.J. (2011). GeneMarker® Genotyping Software: tools to

increase the statistical power of DNA fragment analysis. J. Biomol. Tech. 22 (Suppl), S35–S36.

S26. Arévalo, E., Davis, S.K., and Sites, J.W. (1994). Mitochondrial DNA sequence divergence and phylogenetic

relationships among eight chromosome races of the Sceloporus grammicus complex (Phrynosomatidae) in

central Mexico. Syst. Biol. 43, 387–418.

S27. Bielawski, J.P., and Gold, J.R. (1996). Unequal synonymous substitution rates within and between two

protein-coding mitochondrial genes. Mol. Biol. Evol. 13, 889–892.

S28. Breusing, C., Johnson, S.B., Tunnicliffe, V., and Vrijenhoek, R.C. (2015). Population structure and

connectivity in Indo-Pacific deep-sea mussels of the Bathymodiolus septemdierum complex. Conserv. Genet.

16, 1415–1430.

S29. Pritchard, J.K., Stephens, M., and Donnelly, P. (2000). Inference of population structure using multilocus

genotype data. Genetics 155, 945–959.

S30. Falush, D., Stephens, M., and Pritchard, J.K. (2003). Inference of population structure using multilocus

genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587.

S31. Evanno, G., Regnaut, S., and Goudet, J. (2005). Detecting the number of clusters of individuals using the

software STRUCTURE: a simulation study. Mol. Ecol. 14, 2611–2620.

S32. Meirmans, P.G. (2015). Seven common mistakes in population genetics and how to avoid them. Mol. Ecol.

24, 3223–3231.

S33. Excoffier, L., and Lischer, H.E.L. (2010). Arlequin suite ver 3.5: a new series of programs to perform

population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564–567.

S34. Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under

dependency. Ann. Stat. 29, 1165–1188.

S35. Foll, M., and Gaggiotti, O. (2008). A genome-scan method to identify selected loci appropriate for both

dominant and codominant markers: a Bayesian perspective. Genetics 180, 977–993.

S36. Blanke, B., Arhan, M., Madec, G., and Roche, S. (1999). Warm water paths in the equatorial Atlantic as

diagnosed with a general circulation model. J. Phys. Oceanogr. 29, 2753–2768.

S37. Madec, G. (2008). NEMO ocean engine. Technical Report 27. Note du Pôle de modélisation, (France: Institut

Pierre-Simon Laplace), ISSN No 1288–1619.

S38. Behrens, E. (2013). The oceanic response to Greenland melting: the effect of increasing model resolution.

Dissertation, Christian-Albrechts-Universität zu Kiel, 166 pp.

S39. Böning, C.W., Behrens, E., Biastoch, A., and Bamber, J.L. (2016). Emerging impact of Greenland meltwater

on deepwater formation in the North Atlantic Ocean. Nat. Geosci. doi: 10.1038/NGEO2740.

S40. Barnier, B., Madec, G., Penduff, T., Molines, J.-M., Treguier, A.-M., Le Sommer, J., Beckmann, A.,

Biastoch, A., Böning, C., Dengg, J., et al. (2006). Impact of partial steps and momentum advection schemes

in a global ocean circulation model at eddy-permitting resolution. Ocean Dyn. 56, 543–567.

S41. Debreu, L., Vouland, C., and Blayo, E. (2008). AGRIF. Adaptive grid refinement in Fortran. Comput. Geosci.

34, 8–13.

S42. National Geophysical Data Center (2006). 2-minute Gridded Global Relief Data (ETOPO2) v2. National

Geophysical Data Center, NOAA. doi:10.7289/V5J1012Q.

S43. Blanke, B., and Delecluse, P. (1993). Variability of the tropical Atlantic Ocean simulated by a general

circulation model with two different mixed-layer physics. J. Phys. Oceanogr. 23, 1363–1388.

S44. Levitus, S., Boyer, T.P., Conkright, M.E., Johnson, D., O' Brien, T., Antonov, J., Stephens, C., and Gelfeld,

R. (1998) NOAA Atlas NESDIS 18, World Ocean Database 1998. Volume 1: Introduction. U.S. Gov.

Printing Office, Washington D.C., 346 pp.

S45. Large, W.G., and Yeager, S. (2009). The global climatology of an interannually varying air-sea flux data set.

Climate Dyn. 33, 341–364.

S46. Griffies, S.M., Biastoch, A., Böning, C., Bryan, F., Danabasoglu, G., Chassignet, E.P., England, M.H, Gerdes,

R., Haak, H., Hallberg, R.W., et al. (2009). Coordinated Ocean-ice Reference Experiments (COREs). Ocean

Model. 26, 1–46.

S47. Mertens, C., Rhein, M., Walter, M., Böning, C.W., Behrens, E., Kieke, D., Steinfeldt, R., and Stöber, U.

(2014). Circulation and transports in the Newfoundland Basin, western subpolar North Atlantic. J. Geophys.

Res. 119, 7772–7793.

S48. Baltazar-Soares, M., Biastoch, A., Harrod, C., Hanel, R., Marohn, L., Prigge, E., Evans, D., Bodles, K.,

Behrens, E., Böning, C.W., and Eizaguirre, C. (2014). Recruitment collapse and population structure of the

European eel shaped by local ocean current dynamics. Curr. Biol. 24, 104–108.

S49. Khripounoff, A., Comtet, T., Vangriesheim, A., and Crassous, P. (2000). Near-bottom biological and mineral

particle flux in the Lucky Strike hydrothermal vent area (Mid-Atlantic Ridge). J. Mar. Sys. 25, 101–118.

S50. Mullineaux, L.S., Mills, S.W., Sweetman, A.K., Beaudreau, A.H., Metaxas, A., and Hunt, H.L. (2005).

Vertical, lateral and temporal structure in larval distributions at hydrothermal vents. Mar. Ecol. Prog. Ser.

293, 1–16.

S51. Speer, K.G., and Rona, P.A. (1989). A model of an Atlantic and Pacific hydrothermal plume. J. Geophys.

Res. 94, 6213–6220.

S52. Gary, S.F., Lozier, M.S., Biastoch, A., and Böning, C.W. (2012). Reconciling tracer and float observations of

the export pathways of Labrador Sea Water. Geophys. Res. Lett. 39, L24606.