Mining cyanobacterial genomes for genes encoding complex biosynthetic pathways
Transcript of Mining cyanobacterial genomes for genes encoding complex biosynthetic pathways
REVIEW www.rsc.org/npr | Natural Product Reports
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online / Journal Homepage / Table of Contents for this issue
Mining cyanobacterial genomes for genes encoding complex biosyntheticpathways†
John A. Kalaitzis, Federico M. Lauro and Brett A. Neilan*
Received 19th June 2009
First published as an Advance Article on the web 14th September 2009
DOI: 10.1039/b817074f
Covering: up to February 2009
This review describes genome mining of cyanobacteria for natural product discovery and biosynthesis
pathways. Also presented is an overview of the genetic basis of natural product biosynthesis in
cyanobacteria. It includes 143 references.
1 Introduction to cyanobacteria
2 Cyanobacteria-derived natural product diversity
3 Biosynthesis of cyanobacteria-derived natural prod-
ucts
4 Other characteristics of cyanobacterial natural
product biosynthesis
4.1 Epibiosynthetic tailoring
4.2 Natural product transporters
4.3 Transposition and recombination of biosynthesis
genes
4.4 Molecular regulation of toxin biosynthesis in cyano-
bacteria
5 Cloning and characterisation of biosynthesis gene
clusters in cyanobacteria
5.1 Natural product biosynthesis gene clusters from
brackish/freshwater strains
5.1.1 Macrocycles
5.1.2 Toxic alkaloids
5.2 Natural product biosynthesis gene clusters from
marine Lyngbya spp.
5.2.1 Barbamide and curacin A
5.2.2 Jamaicamides
5.2.3 Lyngbyatoxins
5.2.4 Hectochlorin
6 Genome mining
6.1 Introduction to genome mining
6.2 Genome mining techniques
6.3 Illustrative examples of genome mining in cyano-
bacteria
6.3.1 Patellamides
6.3.2 Trichamide
6.3.3 Microcyclamides
6.3.4 Scytonemin
7 Mining sequenced cyanobacterial genomes for
biosynthesis gene clusters
7.1 Our approach
7.2 Summary of our findings
School of Biotechnology and Biomolecular Sciences, The University of NewSouth Wales, Sydney, NSW, 2052, Australia. E-mail: [email protected]; Fax: +61 2 93851483; Tel: +61 2 93853235
† This article is part of a themed issue on genomics.
This journal is ª The Royal Society of Chemistry 2009
7.2.1 Microcystis aeruginosa NIES-843
7.2.2 Nostoc sp. PCC7120
7.2.3 Gloeobacter violaceus PCC7421
7.2.4 Nostoc punctiforme PCC73102
7.2.5 Anabaena variabilis ATCC29413
8 Discussion
9 Concluding remarks
10 Acknowledgements
11 References
1 Introduction to cyanobacteria
Cyanobacteria are among Earth’s oldest life forms and the
stromatolites are the fossilised evidence of cyanobacterial
metabolism.1 Stromatolites are layered deposits of carbonate,
either branching or dome-shaped, which have formed extensive
reef-like structures that date back to the Precambrian period,
the same time as the earliest evidence of atmospheric oxygen
was found.2 The ancestors of cyanobacteria are considered the
inventors of oxygenic photosynthesis, and today cyanobacteria
contribute, through the process of oxygenic photosynthesis, up
to 30% of Earth’s oxygen. Coupled with their photosynthetic
abilities, their roles as primary producers and nitrogen fixers
greatly influence Earth’s carbon and nitrogen cycles and thus the
prokaryotic photoautotrophic cyanobacteria are extremely
important (or essential) for all higher life.3–6
The cyanobacteria are broadly described as being photosyn-
thetic bacteria, containing chlorophyll a and accessory pigments.
They are facultative aerobes, Gram-negative, oxygenic, exist as
single cells, colonies or as filaments, and lack membrane-bound
organelles. Cyanobacterial cells may differentiate into a number
of metabolic or reproductive structures, and actively growing
samples may also possess a sheath that may be pigmented.7
Growth is typically seen in conditions of neutral to alkaline pH
and moderate levels of light and warmth,7 although a long
evolutionary history has allowed cyanobacteria to adapt to, and
inhabit, many extreme and diverse environments,8 such as
arid desert soils, thermal springs, rocks, plants, marine, brackish
and fresh waters, ice, plants and animals. The immense diversity
within this group of microorganisms, apart from the variability
of morphology and range of habitats, is also reflected in the
Nat. Prod. Rep., 2009, 26, 1447–1465 | 1447
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
extent of their natural production. Cyanobacteria have evolved
to produce a diverse array of secondary metabolites that have
aided species survival in these varied and highly competitive
ecological niches. Cyanobacteria are commonly associated with
the toxic blooms encountered in many eutrophic fresh and
brackish waters and are widely known for their potential to
produce a range of neurotoxic, hepatotoxic, and tumour
promoting-secondary metabolites.9
2 Cyanobacteria-derived natural product diversity
The structural diversity and biological activities of cyano-
bacteria-derived natural products (secondary metabolites) have
been reviewed in great depth elsewhere.10–13 Recent reviews
John A: Kalaitzis
John Kalaitzis obtained
a Bachelor of Applied Science
degree with First Class Honours
in Chemistry from the Univer-
sity of Western Sydney, and
a PhD from Griffith University
(Brisbane, Australia) where he
conducted natural products
research under the supervision of
Prof. R. J. Quinn. During his
postdoctoral studies with Prof.
B. S. Moore at the University of
Arizona and Scripps Institution
of Oceanography, he investi-
gated biosynthetic pathways in
marine actinomycetes and was introduced to genome mining. In
2007 he moved to the University of New South Wales as a Research
Fellow of the Environmental Microbiology Initiative, where he
works closely with Prof. Brett Neilan and Dr. Federico Lauro
exploring biosynthetic pathways in microorganisms from extreme
environments.
Federico M: Lauro
Federico M. Lauro received his
undergraduate education in
Molecular Biology and Micro-
biology from the University of
Padova, Italy, where he
obtained a ‘Laurea in Biologia’.
He then continued working as
a project scientist at the
Department of Microbiology
and Immunology of the Univer-
sity of Padova in the group of
Prof. Giulio Bertoloni. In 2000
he moved to San Diego, Cali-
fornia, to study at the Scripps
Institution of Oceanography. He
received his PhD in oceanography in 2007 characterizing genetic
adaptations to elevated hydrostatic pressure in deep-sea bacteria.
The same year he joined the Environmental Microbiology Initiative
at the University of New South Wales. His research is focussed on
developing bioinformatics approaches to characterize microbial
genomes in the context of evolution, biochemical adaptations, and
ecology.
1448 | Nat. Prod. Rep., 2009, 26, 1447–1465
covering specific topics such as ‘‘Biogenetic diversity of cyano-
bacterial metabolites’’,14 and ‘‘Bioactive natural products from
marine cyanobacteria for drug discovery’’15 should be consulted
for a complete overview of the chemistry of cyanobacteria-
derived natural product discovery. Natural products isolated
specifically from marine cyanobacteria have been reviewed
annually in reports in this journal by Faulkner between 198416
and 2002,17 and by Blunt and co-authors from 200318
onwards.19
As the focus here is biosynthesis and genome mining, it is out
of the scope of this review to detail cyanobacteria-derived natural
products, though it is appropriate to highlight some aspects of
cyanobacterial products by way of a brief overview.
The majority of cyanobacterial natural products fall into
the nitrogen-containing lipopeptide, cyclic peptide and
alkaloid classes. Polyketides, terpenoids, and hybrids of all the
aforementioned classes are also produced by cyanobacteria,
and in general it is these unusual hybrid molecules that
are often the focus of biosynthetic studies. Due to the
immense chemical diversity of cyanobacterial products, further
classifications have been proposed based on specific structural
variants.
In general, bacteria are considered prolific producers of
novel and bioactive chemical entities and a primary source
of compounds with potential therapeutic benefits such as
antibiotics.20,21 Cyanobacteria, though, are better known as
producers of highly toxic compounds (cyanotoxins), and
the majority are commonly grouped according to their physi-
ological effects as either cytotoxins (e.g. cryptophycins, dolas-
tatins, symplostatins), neurotoxins (e.g. anatoxins, saxitoxins),
hepatotoxins (e.g. microcystins, nodularins), or as irritants and
gastrointestinal toxins (e.g. aplysiatoxins and lyngbyatoxin).
Some of these are discussed in more detail in the following
sections.22,23
Brett A: Neilan
Brett Neilan is a molecular
biologist and an expert in the
study of toxic cyanobacteria. He
obtained a Bachelor of Applied
Science degree in Biomedical
Science (1985) at the University
of Technology, Sydney, and then
worked as a medical researcher,
hospital scientist and forensic
biologist. He obtained his PhD
in microbial and molecular
biology from UNSW in 1995
and was awarded an Alexander
von Humboldt Fellowship, which
allowed him to conduct post-
doctoral studies (Berlin) on non-ribosomal peptide biosynthesis
genetics. The continuation of this early work has become the basis
for current studies regarding the search for microbial natural
products in novel environments, including Antarctica, the hyper-
saline lagoon of Shark Bay, Western Australia, and Indonesian
volcanoes. In 2008 he was awarded a prestigious Australian
Research Council Federation Fellowship.
This journal is ª The Royal Society of Chemistry 2009
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
3 Biosynthesis of cyanobacteria-derived naturalproducts
To date, biosynthetic studies of cyanobacterial natural products
have largely focussed upon those compounds that are toxic to
humans and those proposed to be derived from complex
biosynthetic pathways that give rise to their structurally unique
chemical structures. While novel compounds continue to be
reported from various cyanobacteria, reports of biosynthetic
studies are not as forthcoming, due mainly to technical difficul-
ties associated with working with such organisms.
Historically, complex biosynthetic pathways in cyanobacteria
were elucidated by feeding isotopically labelled precursors to
laboratory cultured micro-organisms. While feeding studies do
not always allow for the entire biosynthetic route to be elucidated
or proved conclusively, they were considered standard by those
undertaking biosynthesis-targeted research. One of the earliest
reported (1971) examples of a biosynthesis feeding study
undertaken in cyanobacteria was by Botham and Pennock, who
investigated the biosynthesis of tocopherols in Anabaena variabilis
using radiolabelled precursors.24 The use of NMR spectrometry as
the preferred analytical tool to assist in deciphering biosyntheses of13C-labelled cyanobacteria derived metabolites was pioneered by
Moore and co-workers in the early 1990s when investigating the
biosyntheses of anatoxin A 1 in Anabaena flos-aquae NRC525-17
(Scheme 1) and the [7.7]paracyclophanes nostocyclophane D 2 and
cylindrocyclophane D 3 in Nostoc linkia and Cylindrospermum
licheniforme, respectively (Scheme 2).25,26
Nowadays, molecular genetic techniques coupled with bio-
informatics analyses have greatly aided the elucidation of natural
product biosynthesis pathways and dramatically influenced the
way research is conducted. These techniques along with isotope
feeding studies have allowed not only the elucidation of novel
pathways but also the discovery of novel products of orphan
pathways.27
Before proceeding to a discussion of the modern-day tech-
niques used to elaborate biosynthesis pathways, a brief
Scheme 1
Scheme 2
This journal is ª The Royal Society of Chemistry 2009
introduction to the genetic machinery encoding complex
biosynthetic pathways is necessary. This review will largely focus
on pathways derived from thiotemplate modular systems such as
nonribosomal peptide synthetases (NRPS) and polyketide
synthases (PKS), and will not include in-depth discussion of
other biosynthetic pathways.
Nonribosomal peptides, polyketides and hybrids thereof, are
biosynthesised by multifunctional enzyme complexes that
sequentially assemble small carboxylate and amino acid derived
precursor building blocks into their products in an assembly-line-
like fashion.28 These megasynthases generally follow a co-line-
arity rule whereby chemical structures for unknown natural
products can be predicted. Both NRPSs, and PKSs share similar
architectures with their respective modules containing
a minimum of three enzyme domains. NRPS modules contain an
ATP-dependent adenylation (A) domain which activates
a specific or preferred amino acid, a peptidyl carrier protein
(PCP) for tethering substrates during the assembly, and
a condensation (C) domain which catalyses the formation of
amide bonds between PCP-bound substrates. Likewise, the PKS
modules contain an acyl transferase (AT) which selects
a preferred acyl-CoA thioester substrate, acyl carrier protein
(ACP) and a ketosynthase (KS) which catalyses the condensation
of two ACP-bound substrates. In both systems, each module
typically extends the backbone of the molecule by one unit, and
often these modules also contain a number of additional catalytic
domains that function to tailor the assembling molecule and
therefore generate further structural diversity. The assembled
molecule is ultimately released from the enzyme complex, usually
by a thioesterase (TE) domain, which may also function to direct
cyclisation of the final product.29–31
PKSs increase the diversity of NRPS products, and vice versa
when combined in hybrid NRPS/PKS systems such as those
assembling the cyanobacterial toxins microcystin and nodularin.32,33
Hybrid PKS/NRPSs assembling lipopeptides incorporate short
carboxylic acids into the peptidyl compound by transfer and
subsequent condensation of the b-hydroxy acyl group to the
NRPS-bound amino acid.34 Alternatively, a b-amino fatty acid
can be converted to an ACP-bound b-amino acid via an amino-
transferase (AMT)-domain as in microcystin synthesis.33 Specific
examples of hybrid systems are discussed in more detail below.
Recent reviews by Walsh and co-workers should be consulted for
an in-depth discussion of the assembly of NRPS and PKS-derived
products.28,31,35
4 Other characteristics of cyanobacterial naturalproduct biosynthesis
4.1 Epibiosynthetic tailoring
Apart from the assembly of the carbon backbone, further
structural and functional diversity observed in cyanobacterial
secondary metabolites is largely due to pre- or post-synthetic
modifications introduced by tailoring enzymes.
As in other orders of microorganisms, common tailoring
enzymes include oxygenases, ketoreductases, group transferases
such as glycosyl transferases and methyl transferases, cyclases,
and halogenases.36 Such enzymes may be embedded within PKS/
NRPS modules and act in cis during chain elongation, or may
Nat. Prod. Rep., 2009, 26, 1447–1465 | 1449
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
alternatively function as separate subunits or stand-alone
enzymes and act in trans. The following is a brief overview of
some important tailoring enzymes involved in complex bio-
synthesis pathways of cyanobacterial products.
Oxidoreductase reactions are among the most commonly
observed tailoring reactions in NRPS and PKS biosynthesis
pathways. A diverse range of enzymes catalyse these reactions
and include oxidases, oxygenases, peroxidases, reductases and
dehydrogenases. Oxidoreductase reactions can have a dramatic
impact on the stereo-electronic and physical properties of
molecules. They generate or remove chiral centres, introduce
highly reactive functional groups, and interconvert H-bond
donor/acceptor sites.36 Monoxygenases typically generate
hydroxy groups and epoxides by transferring one oxygen atom,
while dioxygenases usually transfer both atoms of molecular
oxygen. Ketoreductases are commonly integrated within the
modules of PKSs; however, several stand-alone enzymes have
also been identified. While post-PKS ketoreductases are quite
rare, they are excellent candidates for combinatorial biosynthesis
of novel compounds.36
D-Amino acids are a common feature of many non-ribosomal
peptides. These non-proteinogenic building blocks are important
for creating structural diversity, providing resistance to
proteolysis, and in some cases imposing stereochemical
constraints.37,38 Occasionally, stand-alone enzymes generate the
D-amino acids in non-ribosomal pathways; however, they are
often generated from L-amino acids that are epimerised during
the peptide elongation process by an embedded epimerisation (E)
domain. Such domains are approximately 50 kDa in size and are
located adjacent to A-domains at the C-terminal end of NRPS
modules. While E-domains are capable of producing a mixture of
D- and L-isomers, there is evidence that C-domains immediately
downstream of epimerases are D-specific.39 The conversion of
L-amino acids to D-isomers is also achieved by stand-alone
enzymes, as in cyclosporin production, and are alternatively
referred to as racemases. A notable example in cyanobacteria are
racemases in the microcystin biosynthetic pathway.40
A variety of non-ribosomally produced cyanobacterial
compounds contain methylated amino acid residues. Methylation
can greatly alter the structure and biological activity of secondary
metabolites by increasing molecular lipophilicity, and influencing
stereochemistry and introduce chiral centres.36 N-, C- and
O-methyltransferases are all cofactor-dependent, meaning they
Scheme
1450 | Nat. Prod. Rep., 2009, 26, 1447–1465
require the presence of a methyl donor group for catalysis. The
requisite cofactor usually takes the form of the highly reactive
sulfonium ion, S-adenosyl methionine (SAM). Methyltransferases
associated with cyanobacteria secondary metabolite pathways are
encoded within the modules of PKSs and NRPS genes. Methylation
(Scheme 3) is important to the toxicity of the cyanobacterial hep-
atotoxins microcystin 4 (the LR-form is shown) and nodularin 5.
The O-methyl transferase (OMT)-domain is thought to be involved
in the transfer of a methyl group to the hydroxyl group on the
unusual amino acid 3-amino-9-methoxy-2,6,8-trimethyl-10-phenyl-
4,6-decadienoic acid (Adda).33,41,42 Previous studies have shown that
natural microcystin variants lacking the Adda O-methylation have
reduced inhibition of protein phosphatases and hence lower
toxicity.43 Furthermore, the recent disruption of the mcyJ O-methyl
transferase in P. aghardii resulted in the production of a demethy-
lated microcystin variant with similarly reduced toxicity.41 Further
modification of an N-methylated residue in microcystin also occurs
in trans, via dehydration, to incorporate N-methyl-dehydroalanine
from the L-Ser substrate.
Glycosylation is also a common feature of many NRPS and
PKS products. The addition of glycosyl residues to the aglycones
of PKs can dramatically influence their biological activity and is
critically important to the biological activity of many drugs in
clinical use such as the glycopeptide and macrolide anti-
biotics.44,45 Glycosyl transferases are relatively rare in cyano-
bacterial NRPS and PKS pathways, and there are very few
examples of glycosylated cyanobacteria-derived natural products
in the literature.
4.2 Natural product transporters
ABC transporters represent one of the largest, most highly
conserved protein superfamilies in existence.46 These proteins,
found in bacteria, eukaryotes and archaea,47 are responsible for
the ATP-dependent transport of a vast range of molecules and
substrates (allocrites) across intracellular and cell surface bio-
logical membranes.48 ABC transporter genes are encoded within
many NRPS and PKS biosynthesis gene clusters, including those
found in cyanobacterial genomes. Few of the bacterial secondary
metabolite transporters have been functionally characterised;
however, they have been shown to confer self-resistance to the
producing organisms.49,50
3
This journal is ª The Royal Society of Chemistry 2009
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
The superfamily also shares many common structural features,
including a highly conserved ABC-ATPase and at least one
cognate, but much less conserved, membrane domain. Most
ABC-ATPases have an approximate molecular mass of 27 kDa
and share an overall amino acid sequence identity in excess of
30%.51 This identity is concentrated in several key regions,
making ABC-ATPases readily recognisable by at least two
unique, highly conserved motifs, the Walker A site and the
hydrophobic Walker B sites.52 Whilst the prokaryotic import
systems do possess a short conserved ‘‘EAA’’ motif (EAA-G-I-
LP) in their cytoplasmic loop,53 the exporters display no signifi-
cant sequence conservation.54 The most distinguishing feature of
the ABC transporter phylogenetic tree is its division into two
major branches representing the export (ABC-A) and import
(ABC-B) systems.46 Phylogenetic analysis reveals a tendency for
transporters in both import and export subdivisions to cluster
according to allocrite specificity. Thus by simply knowing the
primary structure (peptide sequence) of a newly discovered ABC
protein, it is possible to predict the type of allocrite specified by
the system, and the direction of allocrite transport.
ABC transporters of modified cyclic peptides such as micro-
cystin and nodularin phylogenetically cluster with peroxisomal
membrane proteins (PMPs).55 While this phylogenetic relation-
ship may seem unusual, the long-chain fatty acid substrates of
PMPs, are structurally similar to polyketides, and are produced
in an analogous fashion by modular synthase enzymes. This
suggests that transporters of NRPs, PKs, and structurally similar
compounds are evolutionarily and functionally related; however,
further characterization of such transporters is required.
4.3 Transposition and recombination of biosynthesis genes
Putative transposases have also been identified in association
with secondary metabolite pathways and are adjacent to the
microcystin synthetase gene clusters (Fig. 1) in Microcystis
Fig. 1 Microcystin biosynthesis clusters in cyanobacteria.
This journal is ª The Royal Society of Chemistry 2009
aeruginosa PCC7806,33 Planktothrix agardhii CYA126/8
(Genbank accession AJ441056) and Anabaena sp. 90
(AY212249). This suggests that transposition may have been
involved in transfer of microcystin synthetase genes between the
various microcystin-producing cyanobacterial genera. The
presence of a truncated, and hence inactive, putative transposase
downstream of the related nodularin synthetase supports this
hypothesis, as only Nodularia spumigena NSOR10 is capable of
producing nodularin.32 The identification of short sequences
within a Microcystis aeruginosa cyanophage, with identity to
regions flanking the putative microcystin synthetase associated
transposases, also supports this theory and suggests a mechanism
for transposition facilitated by transduction.56 The variable
architecture of the microcystin gene clusters in different cyano-
bacterial genera may also be a result of transposition-mediated
gene rearrangement events. Consequently, these complex hybrid
polyketide/nonribosomal peptides serve as elegant examples of
natural combinatorial biosynthesis that have evolved as
a competitive advantage for the cyanobacteria.57
Sequencing of the Synechocystis sp. PCC6803,58 Nostoc
punctiforme ATCC29133 (also known as N. punctiforme
PCC73102),59 Thermosynechococcus elongatus BP-160 and
Microcystis aeruginosa NIES-84361 genomes, amongst others,
has revealed an abundance of transposases, insertion sequences
and short sequence repeats that may have contributed to the
rapid evolution of these species. Insertion sequences were also
found to disrupt the microcystin synthetase gene cluster in
various Planktothrix species, resulting in strains that were
deficient in toxin production.62
The characterisation of microcystin biosynthesis gene cluster
(mcy) in the genomes of M. aeruginosa, P. agardhii, and
Anabaena sp. has enabled the study of the origins and evolution
of hepatotoxin biosynthesis in cyanobacteria. Identification of
transposases associated with the mcy and nodularin biosynthesis
gene clusters (nda) and subsequent phylogenetic analysis has led
to the theory that horizontal gene transfer and recombination
events are responsible for the sporadic distribution of the mcy
gene cluster throughout the cyanobacteria and the various
microcystin isoforms that have been identified to date.33,63,64
Recent genetic studies suggest that the nda cluster evolved from
the mcy cluster through the deletion of two NRPS modules.32,63,65
Likewise, chemical structure differences between the nosto-
cyclopeptides and the nostopeptolides are predictable based on
comparisons of the architectures of their respective biosynthesis
gene clusters (ncp and nos) in Nostoc spp.66,67 The absence of
a PKS module, a dimodular NRPS, and a TE domain in the
ncp cluster, as compared with the nos cluster, is suggestive of
a genetic rearrangement resulting in two distinct biosynthesis
pathways.
4.4 Molecular regulation of toxin biosynthesis in cyanobacteria
While most toxin regulation studies have focused on direct
measurements of cellular toxin, the description of the mcy gene
cluster33 enabled a closer examination of microcystin regulation
at the molecular level. Transcription of the mcy genes occurs via
two polycistronic operons from a central bi-directional promoter
between mcyA and mcyD. Alternate transcriptional start sites
were identified for both operons when cells were cultured under
Nat. Prod. Rep., 2009, 26, 1447–1465 | 1451
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
high or low light intensities.68 Kaebernick et al.69 also proposed
that microcystin is constitutively produced under low and
medium light intensities, and exported when a higher threshold
intensity is reached. The recently identified ABC transporter
McyH was later proposed to be responsible for this export.55
Comparative analysis of the proteomes of the wild-type and the
mutant cultures of toxic Microcystis resulted in the identification
of a protein, MrpA, which was expressed only in the wild-type.70
The genomic locus encoding this protein is homologous to RhiA
and RhiB from Rhizobium leguminosarum, which are regulated
via a quorum-sensing mechanism and are up-regulated in
response to blue light.
As observed for the mcy cluster, the nda gene cluster is
transcriptionally regulated by a bidirectional promoter region.
Analysis of transcription of the nda cluster revealed that it is
transcribed as two polycistronic mRNA, ndaAB/ORF1/ORF2,
and ndaC.71 The two genes downstream of ndaAB, ORF1 and
ORF2 encode a putative transposase and a putative high light-
inducible chlorophyll-binding protein homolog, respectively,
however, it is not clear why these proteins are also co-tran-
scribed with the nda gene cluster. ORF2 has been identified in
all strains of toxic Nodularia, and the association between
ORF2 and nodularin biosynthesis may suggest a physiological
function associated with high-light stress. A putative heat shock
repressor protein, encoded by the gene ORF3, was also identi-
fied downstream of ORF2, which may be involved in the
transcriptional regulation of the nda genes in response to heat
stress.32
A genomic region adjacent to the sxt gene cluster in C. raci-
borskii T3 was identified and characterised, putatively encoding
a regulatory two-component system. This system appears to be
involved in the sensing of environmental signals, in particular
depleted phosphate, while activating the transcription of genes
involved in its uptake and transport.72
Table 1 Selected cyanobacteria-derived biosynthesis gene clusters
Gene cluster Source organism
Barbamide (bar) Lyngbya majuscula 19LCryptophycin (crp) Nostoc sp. ATCC53789Cylindrospermopsin (cyr) Cylindrospermopsis raciborskii AWT20Curacin (cur) Lyngbya majuscula 19LHectochlorin (hct) Lyngbya majuscula JHBJamaicamide (jam) Lyngbya majuscula JHBMicrocystin (mcy) Microcystis aeruginosa PCC7806Nodularin (nda) Nodularia. spumigena NSOR10Nostopeptolide (nos) Nostoc sp. GSV224Aeruginosin (aer) Planktothrix agardhii CYA126/8Anabaenopeptilide (apd) Anabaena sp 90Anatoxin A (ana) Oscillatoria PCC 6506Cyanopeptolin (mcn) Microcystis N–C 172/5Nostocyclopeptide (ncp) Nostoc sp. ATCC53789Lyngbyatoxin (ltx) Lyngbya majusculaSaxitoxin (sxt) Cylindrospermopsis raciborskii T3Scytonemin Nostoc punctiforme ATCC29133Microcyclamide (mca) Microcystis aeruginosa NIES298
and PCC7806Microviridin (mdn/mvd) Microcystis aeruginosa NIES298
and P. agardhii CYA126/8Patellamide (pat) Prochloron didemniTrichamide (tri) Trichodesmium erythraeum ISM101Trunkamide (tru) Prochloron sp.
1452 | Nat. Prod. Rep., 2009, 26, 1447–1465
5 Cloning and characterisation of biosynthesis geneclusters in cyanobacteria
The number of characterised biosynthesis gene clusters from
cyanobacteria is limited compared with the numbers reported
from other microorganisms such as the streptomycetes. This is
more a reflection of the number of researchers pursuing such
clusters rather than a lack of interest in the natural products
derived from them. The following sections serve to highlight
some cyanobacteria-derived clusters and the unique biosynthesis
pathways in operation. These examples are included to illustrate
features of biosynthesis gene clusters useful for genome mining.
Other notable cyanobacterial biosynthesis gene clusters not
detailed in this review are presented in Table 1. These include the
cryptophycin (crp),73,74 aeruginosin (aer),75,76 anabaenapeptilide
(apd),77 anatoxin A (ana),78 cyanopeptolin (mcn),79,80 micro-
viridin (mdn/mvd)81,82 and trunkamide (tru)83 biosynthesis gene
clusters.
5.1 Natural product biosynthesis gene clusters from brackish/
freshwater strains
5.1.1 Macrocycles. As highlighted in section 4.3, these complex
molecules serve as elegant examples of natural combinatorial
biosynthesis. The gene clusters encoding their biosynthesis are
associated with prototypical thiotemplate modular systems, and
provide the basis for our genome mining analyses in section 7.2.
Microcystins. The microcystins are potent inhibitors of
eukaryotic protein phosphatases 1 and 2A. The biosynthetic gene
cluster of these cyclic heptapeptides was first characterised in
Microcystis aeruginosa PCC7806.33,84,85 Prior feeding studies
revealed the origin of the carbons in the unusual (2S,3S,8S,9S)-3-
amino-9-methoxy-2,6,8-trimethyl-10-phenyl-4,6-decadienoic acid
Approx. size (kb) Biosynthetic origin
26 Polyketide/peptide40 Polyketide/peptide
5 43 Polyketide/peptide64 Polyketide/peptide38 Polyketide/peptide58 Polyketide/peptide55 Polyketide/peptide48 Polyketide/peptide40 Polyketide/peptide34 Peptide28 Peptide29 Polyketide30 Peptide33 Peptide11 Peptide/terpenoid35 Polyketide/amino acid28 Shikimate11 Ribosomal peptide
7 Ribosomal peptide
11 Ribosomal peptide13 Ribosomal peptide11 Ribosomal peptide/terpenoid
This journal is ª The Royal Society of Chemistry 2009
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
(Adda) and (2R,3S)-3-methylaspartic acid (Masp) residues.
Feeding studies confirmed that Adda was derived from
L-phenylalanine, and suggested phenylacetate as the primer;86,87
however, recent studies have revealed that it is in fact derived
from phenyllactate.88 The microcystin biosynthesis gene cluster
(mcy) spans 55 kb and is composed of ten bidirectionally
transcribed ORFS. The PKS McyG contains an N-terminal
adenylation-peptidyl carrier protein loading didomain that is
predicted to load the starter unit, which then undergoes
subsequent malonate and amino acid extensions (encoded by
mcyGJDEABCL). The C-terminal TE is suspected to facilitate
hydrolysis and cyclisation to yield the final product. Tailoring
enzymes associated with the cluster include an O-methyl-
transferase (McyJ) and a stand-alone dehydratase (McyI).
Another tailoring enzyme of interest is McyF which showed
greatest similarity to an aspartate racemase.40
Nodularin. The cyclic pentapeptide nodularin 5 is assembled,
as expected, in a manner similar to the structurally related
heptapeptide microcystin. The nodularin synthetase gene cluster
in Nodularia spumigena NSOR10 spans 48 kb and consists of
nine ORFS. The nodularin biosynthesis gene cluster (nda)
encodes proteins more or less co-linearly with their respective
catalytic function in the assembly of nodularin.32 Interestingly,
the nda cluster encodes two NRPS domains modules that
activate amino acids (D-Ala and L-Leu) that are not present in
the final structure. Cases such as these need to be considered
when predicting compound structures from gene sequence data
in any genome mining project. The deletion of the two NRPS
modules raises many questions regarding recombination and
shuffling of genes and also the transposition of gene clusters
between organisms. NdaI encodes a protein most similar to
the ABC transporters McyH and NosG associated with
microcystin and nostopeptolide biosynthesis.32 Conserved
encoding sequences such as these could be targeted with the aim
of identifying biosynthesis gene clusters of potentially toxic
molecules.
Nostopeptolides. The cryptophycin-producing74 strain Nostoc
sp. GSV224 also produces other cyclic peptide polyketide
hybrid natural products known as nostopeptolide A1 6 and A2
7. The nos gene cluster includes eight ORFs, spans 40 kb, and
contains most genes required for biosynthesis and transport.
The domain organisation is co-linear with the proposed order
of biosynthetic assembly, with nosA encoding a tetramodular
NRPS, nosC a trimodular NRPS, and nosD a dimodular
NRPS. Located between nosA and nosC is nosB, which encodes
a single PKS module.67 A putative thioesterase is located at the
C-terminal of NosD. NosA3, within the third NRPS module
encoded by nosA, is proposed to adenylate the rare non-
proteinogenic amino acid residue L-4-methylproline.89 Sequence
analysis of the NosA3 adenylation domain would suggest on
first glance that it activated proline; however, a single amino
acid difference in its substrate-binding pocket relative to that of
NosD2 (which adenylates proline), together with the chemical
structure of the nostopeptolides, supported the characterisation
of NosA3. NosE and nosF encode enzymes involved in the
biosynthesis of L-4-methylproline from L-leucine. Other ORFs
associated with the cluster include orf5, located between nosD
This journal is ª The Royal Society of Chemistry 2009
and nosE and coding for a 265 amino acid protein of unknown
function, and nosG, which encodes an ATP binding cassette
(ABC) transporter.67
Nostocyclopeptides. The 33 kb nostocyclopeptide (ncp)
biosynthesis gene cluster in Nostoc sp. ATCC53789 has been
sequenced and characterised.66 Like many cyanobacterial NRPS-
derived natural products, the cluster is co-linear with the
proposed order of nostocyclopeptide A1 8, A2 9, and A3 10
assembly. The cluster encodes two proteins NcpA and NcpB
containing three and four NRPS modules, respectively. Like the
nos cluster, genes encoding for L-4-methylproline biosynthesis
and transport enzymes are also present. The cluster architecture
mirrors that of the nos synthetase and interestingly it encodes
a 265 amino acid protein, NcpC, of unknown function. In both
the ncp and nos clusters, the encoding gene (ncpC and orf5) is
located between NRPS modules and L-4-methylproline biosyn-
thesis genes.66 A recent blast search has shed no further light on its
role, if any, in peptide biosynthesis. The most striking feature of
this cluster is the encoded reductase domain at the C-terminal end
of NcpB, which is responsible for the reductive offloading of the
peptide.90 This reductase facilitates both the release and cyclisa-
tion of the linear peptide to the unusual imine-linked macrocyclic
peptide product. NcpB contains a putative NAD(P)H binding
domain, suggesting that the offloading of the peptide is reductive
in nature, therefore generating a linear peptide with a terminal
aldehyde. The aldehyde is then captured intramolecularly with the
amino group of the N-terminal tyrosine to form a stable imine
bond. Identification of similar reductase domains through
genome mining could ultimately lead to the characterisation of
other clusters encoding for imino-linked macrocycles, thus
providing further tools to assist in the complete characterisation
of this unusual biosynthetic pathway. A recent review by Kopp
and Marahiel describes in great depth ‘Macrocylization strategies
in polyketide and nonribosomal peptide biosynthesis’.91
5.1.2 Toxic alkaloids. The biosynthesis gene clusters of these
toxic alkaloids code for the incorporation of unusual substrates,
rare pathways, and numerous tailoring reactions. Also of
interest in genome mining terms are the clustered genes
encoding toxin efflux and transport proteins. These could prove
useful targets for locating biosynthesis clusters encoding such
toxins.
Nat. Prod. Rep., 2009, 26, 1447–1465 | 1453
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
Saxitoxins. The somewhat cryptic saxitoxin 11 biosynthesis
gene cluster (sxt) in Cylindrospermopsis raciborskii T3 has been
sequenced and characterised.72 Close examination of the cluster
has shed light on the evolution of paralytic shellfish poisoning
(PSP) toxins and has enabled an early warning system for the
detection of potentially fatal paralytic shellfish toxin (PST)
producing algal blooms. The 35 kb cluster contains genes
encoding for enzymes that incorporate the anticipated (based on
feeding studies) substrates arginine, acetate, and a SAM derived
methyl group.92 Salient features of the cluster include a rare
N-acetyl transferase like (GNAT)93 enzyme encoded by sxtA2,
which catalyses the transfer of acetate from its CoA to the thiol
on the ACP (S-acetyl transfer) in a similar manner to the curacin
biosynthesis enzyme CurA. The sxt gene cluster also encodes an
aminotransferase and an amidinotransferase, which facilitate the
incorporation of arginine and an arginine-derived amidino
group, respectively.72 For the purposes of an analysis based on
genome mining, the tailoring enzymes such as the less common
sulfotransferase and carbomoyl transferase encoded by sxtI and
sxtN respectively are notable. The extremely potent saxitoxin is
exported from the cell in C. raciborskii T3, and the sxtM and sxtF
gene products are proposed to play a role in that function. SxtM
and SxtF have high similarities to sodium-driven multidrug and
toxic compound extrusion (MATE) proteins, and in this
cyanobacterium are the likely saxitoxin transporters.72
Cylindrospermopsin. A biosynthetic pathway for cylin-
drospermopsin 12 was proposed based on a feeding study con-
ducted by Moore and co-workers in Cylindrospermopsis
raciborskii in which guanidinoacetate was proposed as the PKS
starter unit.94 This rare starter is generated from glycine and
a guanidine moiety likely derived from arginine. The 43 kb
cylindrospermopsin (cyr) gene cluster in C. raciborskii AWT205
harbours a centrally located gene (cyrA) whose product is most
similar to the human arginine:glycine amidinotransferase.95 This
observation supported the feeding experiment results, which
suggested that the uracil ring in cylindrospermopsin was not
derived from primary metabolism but rather synthesized during
product assembly. The majority of the cluster encodes NRPS and
PKS modules responsible for the formation of the carbon
skeleton, and genes associated with the uracil ring formation,
were also identified. Genes encoding tailoring enzymes in the
1454 | Nat. Prod. Rep., 2009, 26, 1447–1465
pathway include cyrJ, which encodes a 30-phosphoadenyl sulfate
(PAPS)-dependent sulfotransferase-like protein, and the associ-
ated cyrN, which codes for an adenylsulfate kinase protein
(CyrN) that catalyses the formation of PAPS. Aside from
biosynthesis genes, the cluster also encodes proteins involved in
the regulation and export of the toxin. As in the case of saxitoxin,
a sodium-driven multidrug efflux pump type protein (CyrK) is
proposed to be responsible for cylindrospermopsin transport.95
5.2 Natural product biosynthesis gene clusters from marine
Lyngbya spp.
Strains of Lyngbya majuscula are noted producers of chemically
diverse and highly unique metabolites possessing broad ranges of
biological activities.96 The biosyntheses of L. majuscula natural
products are keenly pursued due to their rarely encountered
chemical features. To date the biosynthetic gene clusters of
several Lyngbya-derived metabolites have been functionally
characterised. These examples illustrate the diversity of biosyn-
thesis pathways in operation in different strains of the same
species.
5.2.1 Barbamide and curacin A. The cyanobacterium
L. majuscula strain 19L collected from Curacao produces,
amongst other molecules, both barbamide 13 and curacin A 14.
From a biosynthesis point of view, the mollusicidal chlorinated
lipopeptide barbamide is of great interest due mainly to its
unusual 5,5,5-trichloroleucine-derived moiety.97 The barbamide
(bar) biosynthetic gene cluster was the first reported PKS/NRPS
from a marine cyanobacterium.97,98 Precursor incorporation
studies revealed that barbamide A is derived from the amino
acids L-leucine, L-phenylacetate and L-cysteine, acetate, and two
SAM-derived methyl groups.97,99 Sequence analysis of the
biosynthetically co-linear 12 ORF, 26 kb gene cluster revealed
the presence of two genes, barB1 and barB2, whose products
catalyse the unprecedented chlorination of the pro-R methyl
of leucine to form the intermediate trichloroleucine.98,99 As
exemplified by barbamide, a common characteristic of many
L. majuscula metabolites – and indeed other cyanobacterial
natural products – is that they are halogenated, and these provide
the inspiration for discovery of novel, naturally occurring,
halogenation reactions. The usefulness of characterising bio-
synthesis gene clusters was further exemplified by the use of bar gene
probes to provide direct evidence that barbaleucamide B 15, which
was originally reported from a Philippine marine sponge Dysidea
sp., is of cyanobacterial origin.100 This supported the long-held
notion that symbiont cyanobacteria are the true producers of many
natural products reported to be sponge-derived.
Curacin A 14 is a potent antitubulin natural product derived
from cysteine, ten acetates and two SAM-derived methyls. Its
biosynthetic gene cluster, like that of barbamide, is a hybrid
This journal is ª The Royal Society of Chemistry 2009
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
NRPS/PKS that is highly co-linear with the predicted biochem-
ical steps required for its biosynthesis. The curacin A (cur) gene
cluster spans 64 kb over 14 ORFs and harbours an interesting
terminating motif that is predicted to facilitate product release
from the megasynthetase and catalyse a dehydrative decarboxy-
lation.101 The mechanism of this unique decarboxylation was
deduced using deuterium labelling experiments and NMR spec-
trometry. Another striking feature of curacin A biosynthesis is
the incorporation of a unit derived from 3-hydroxy-3-methyl-
glutaryl-CoA (HMG-CoA). The HMG-CoA synthase plays
a role in the formation of the rare cyclopropyl ring, which is
derived from an isopentenyl-ACP intermediate.102,103 This rare
pathway is encoded for by curA–curF.93,101–103 The protein CurF
also contains the only NRPS domains associated (C, A, PCP)
with the biosynthesis of curacin A, and downstream of curF are
seven PKS modules responsible for seven rounds of condensa-
tion with malonyl-CoA extenders.101 CurM contains a sulfo-
transferase domain, and a TE domain at the C-terminus of the
final PKS monomodule. The role of the sulfotransferase has not
been completely elucidated nor has the product of the proposed
hydrolase curN. It is suspected that CurN plays a role in both
product release and dehydrative decarboxylation, leading to the
formation of the terminal olefin.101
5.2.2 Jamaicamides. The cytotoxic and neurotoxic jamai-
camides A–C were isolated from a culture of L. majuscula JHB.
The rare alkynyl bromide and vinyl chloride groups along with
a pyrrolinone ring warranted an in-depth study of the bio-
synthesis of this family of molecules. A biosynthesis pathway to
jamaicamide A 16 was proposed using 13C-labelled acetates,
alanines, and SAM, and incorporation was measured using
NMR.104 These precursors accounted for the all carbon atoms in
the molecule. The methyl pyrrolinone was proposed to be
derived from Claisen-like condensation and cyclisation of an
acetate and alanine.104 The halogenation pathways are still
unclear. The jam gene cluster was identified using HMG-CoA
synthase based probes to screen PKS-containing fosmids by
Southern hybridisation. From three fosmids, a 70 kb region of
L. majuscula JHB was sequenced, 58 kb of which were assigned
This journal is ª The Royal Society of Chemistry 2009
to the jam cluster. The cluster is comprised of 17 ORFs, arranged
in a highly co-linear fashion with respect to its proposed
assembly.104 The uniqueness of the cluster centres around the
mixed nature of this NRPS/PKS. The megasynthase encodes two
switch points between NRPS and PKS segments and another
two ‘reverse’ switch points between NRPS and PKS segments.
No real clues as to the nature of cryptic halogenation pathways
in operation were revealed by sequencing the cluster. The size of
the jam cluster complicates its heterologous expression, and thus
the authors could not unequivocally prove that all of the
components required for jamaicamide biosynthesis were con-
tained within the sequenced cluster.104 Nevertheless, the highly
ordered nature of the jam cluster allows strong correlations
between genetic and structural features to be made, and it is these
direct correlations that form the basis of structure prediction
from genome mining.
5.2.3 Lyngbyatoxins. The lyngbyatoxins A–C 17–19 (Scheme
4) represent yet another structural class of toxic metabolites
biosynthesised by L. majuscula. Though their assembly is less
cryptic than other Lyngbya-derived metabolites, the 11.3 kb
lyngbyatoxin biosynthesis gene cluster (ltx) consisting of four
ORFs (ltxA–ltxB) all transcribed in the same direction, harbours
a novel aromatic prenyl transferase encoded by ltxC.105 As could
be predicted from the structure of the molecule, the cluster
encodes an NRPS (ltxA) responsible for the assembly of the
N-methyl-L-valine-L-tryptophan-derived dipeptide. LtxA also
facilitates the reductive offloading of the dipeptide from the
complex. The cleaved product is then proposed to undergo
subsequent P450 oxygenase (LtxB)-catalysed oxidation and cyc-
lisation reactions to yield the intermediate (�)-indolactam V 20,
which is the substrate for prenylation.105 Although 20 itself has
not been isolated from L. majuscula, it has been isolated from
Streptomyces spp.106 LtxC shows little sequence similarity to other
aromatic prenyl transferases aside from the cyclomarin N-prenyl-
transferase CymD, with which it shares 24% identity.107 In order
to confirm the function of the proposed prenyl transferase, puri-
fied LtxC was incubated with 20 in the presence of geranyl
pyrophosphate (GPP) to yield the expected product lyngbyatoxin
A. LtxD is part of a diverse family of oxidase/reductase-type
proteins and is proposed to play a role in the conversion of the
major metabolite 17 to 18 and 19 (Scheme 4).105 In this instance,
the conversion to the minor metabolites provides some insight
into the substrate specificity of such enzymes. Recent in vitro
studies have established an alternative chain termination and
release mechanism in the context of NRPSs. Probing of LtxA
revealed that the PCP-peptidyl thioester is reduced to a primary
alcohol via an aldehyde prior to substrate prenylation.108
5.2.4 Hectochlorin. The cytotoxic and antifungal agent hec-
tochlorin 21 is another L. majuscula JHB metabolite whose
Nat. Prod. Rep., 2009, 26, 1447–1465 | 1455
Scheme 4
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
biosynthesis gene cluster is remarkably co-linear with its
product.109 The hectochlorin gene cluster (hct) consists of eight
ORFs and spans 38 kb, and encodes NRPS, PKS, cytochrome
P450, and halogenase enzymes. The hct gene cluster shares
significant sequence similarities with NRPS and PKS elements
from both the jam and bar clusters. HctB codes for a putative
halogenase (as well as an ACP) which shows sequence similarity
at the N-terminal region, to BarB1 and BarB2. The authors
speculate that HctB is involved in the formation of hectochlorins
gem-dichloro group. Other notable features of the hct gene
cluster include a C-methyl transferase encoded within the PKS
module HctD and rarely observed NRPS-embedded ketor-
eductases adjacent to two 2-oxo-isovaleric acid adenylation
domains. These unusual dual function KRs110 reduce 2-oxo-
isovaleric acid-derived moieties to their corresponding
2-hydroxyisovaleric acid moieties, and these are then further
oxidised to generate the 2,3-dihydroxyisovaleric-derived moieties
in hectochlorin. The sequence of the putative transposase
encoding gene hctT is similar to insertion sequence (IS) elements
from other bacteria, and these elements are thought to play a role
in the plasticity of bacterial genomes.109 IS elements are of
interest in terms of genome mining as they are often involved in
the assembly of gene clusters with specialised functions.
As can be gleaned from the above discussion of characterised
biosynthesis gene clusters, a common theme of ‘co-linearity’ is
apparent. The importance of the co-linearity rule with respect to
gene cluster characterisation and natural product structure
prediction will become more apparent in the following sections.
Fig. 2 Completed bacterial genomes to 2008.
6 Genome mining
6.1 Introduction to genome mining
Genome mining is a broad term used to describe several
processes that exploit information which is genetically encoded
1456 | Nat. Prod. Rep., 2009, 26, 1447–1465
within biosynthesis gene clusters, with the ultimate aim of
isolating a novel compound or novel biosynthetic pathway. The
genetically encoded sequence of events which govern a mole-
cule’s assembly not only allows for precise reprogramming of
PKS and NRPS systems, but also for their annotation to provide
predictive chemical structures for unknown products.
Up until recently, a natural product’s biosynthesis gene cluster
was usually characterised after the structure of the natural
product had been determined, and used mainly for the purpose
of investigating the molecule’s biosynthesis. Though this is
still the case today, dramatic breakthroughs in DNA sequencing
technologies has allowed entire bacterial genomes to be
sequenced both quickly and cost-effectively, concomitantly
providing many fully sequenced biosynthesis gene clusters, the
products of which are often unknown. These gene clusters are
referred to as ‘orphans’. Streptomyces coelicolor has been
the focus of biosynthetic studies for many years due mainly to the
diverse array of natural products it produces, and is considered
the model organism for genomics-based investigations. Genome
sequencing of the model streptomycete S. coelicolor A3(2)
revealed many more orphan biosynthesis gene clusters than
anticipated, demonstrating that even well-studied taxa have the
potential to yield novel products.111 These observations sparked
greater interest in genome mining as an alternative route to
discovering novel and potentially bioactive natural products.
Upwards of one thousand bacterial genomes have been
sequenced (of which only a small percentage are cyanobacterial –
Fig. 2) thus providing massive amounts of freely available gene
sequence data for genome mining. Bacterial genomes have been
sequenced in an effort to understand their roles in processes such
This journal is ª The Royal Society of Chemistry 2009
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
as CO2 and N2 fixation, iron acquisition, symbioses and envi-
ronmental adaptation, while some genomes, in particular some
actinomycete genomes, have been sequenced for the sole purpose of
identifying natural product biosynthesis gene clusters. The utility of
sequencing is exemplified by the recent report of the genome of the
marine obligate actinomycete Salinispora tropica.112 Careful anno-
tation of the genome revealed that approximately 10% of the strain’s
5.2 Mb are dedicated to natural product assembly. Of the 17 gene
clusters described, approximately two-thirds could be classed as
orphans. A bioinformatics approach allowed, at very least, struc-
ture types encoded by these orphan clusters to be assigned, while
other clusters were assigned to compounds already identified from
S. tropica. As with all prediction-based methods, rigorous analytical
techniques are required to confirm these predictions. The vital
interplay between genomics-driven structure prediction based on
biosynthetic logic and the isolation of novel natural products
was demonstrated by the discovery of a polyene macrolactam,
salinilactam A.112 A bioinformatics analysis of three ketoreductase
domains associated with the encoding gene clusters allowed the
tentative assignment of absolute stereochemistries to three hydroxyl
groups. Conversely, the repetitive nature of the polyene helped
organise a highly repetitive region of the encoding gene cluster,
which ultimately led to the closure of the genome sequence.112
6.2 Genome mining techniques
Discovery of new natural products derived from orphan gene
clusters using combinations of analytical and molecular
biology techniques is now a reality, and several short reviews
outlining these techniques and have been published
recently.113,114 It is out of the scope of this paper to present
each technique in great detail; however, it is necessary to
highlight those which could be used specifically for genome
mining of cyanobacteria.
The relatively common technique of biosynthesis gene inacti-
vation followed by comparative metabolic profiling of the wild-
type and mutant strains by way of analytical HPLC is used to
relate a gene cluster and its product. Assuming that the strain is
indeed culturable, technical difficulties associated with inacti-
vating genes in cyanobacteria render this method less than
desirable. Heterologous gene expression represents an alternative
technique, but given that many cyanobacterial gene clusters of
interest are of the larger NRPS type, the usual hurdles associated
with expressing these, such as determining the optimal expression
system, first need to be overcome. Small cyanobacterial NRPS/
PKS gene clusters have also been cloned using fosmids, such as in
the cloning of the lyngbyatoxin gene cluster from the cyano-
bacterium Lyngbya majuscula.105 The small size of the lyngbya-
toxin biosynthetic gene cluster permitted this approach; however,
fosmid cloning and other phagemid-based systems have been
largely superceded by BAC cloning, due to the limited insert size.
Promising candidates for the heterologous expression of cyano-
bacterial compounds include the well-characterised cyano-
bacterial genera Synechocystis and Synechococcus. These strains
do not produce secondary metabolites but are relatively fast-
growing and easy to manipulate.115,116 However, mechanisms for
large gene cluster transfer, such as BACs and cosmids, have not
yet been developed for cyanobacteria. Stepwise homologous
recombination for the chromosomal integration of NRPS/PKS
This journal is ª The Royal Society of Chemistry 2009
genes, such as in the transfer of the 49 kb bacitracin synthetase
gene cluster in B. subtilis117 and the 65 kb epothilone NRPS/PKS
gene cluster into Myxococcus xanthus,118 may be the only
currently feasible approach for cyanobacterial species. The
technique likely to provide the most direct avenue toward iden-
tification of orphan gene cluster products in cyanobacteria
appears to be the ‘Genomisotopic Approach’. This approach was
first used to identify the novel NRPS-derived molecule orfamide
A 22 from the bacterium Pseudomonas fluorescens Pf-5.27 The
two-pronged approach relies initially upon a bioinformatics-
based prediction of natural product precursors from the gene
cluster sequence. The appropriate 15N or 13C–15N labelled
precursors are fed to the culture under optimised conditions, and
the isolation of the natural product is guided by selective NMR
experiments. This technique is particularly useful for decoding
the products of clusters encoding NRPS adenylation domains.
Models used to predict amino acid recognition by NRPS ade-
nylation domains have been developed based on critical binding-
pocket residues.119 These models were not developed using many
cyanobacterial sequences, and should thus be used with caution
when mining cyanobacterial genomes. Though the predictive
tools could indeed provide accurate data, the relaxed substrate
specificity of adenylation domains encoded by cyanobacterial
genomes means that such tools should be used only as a guide,
and not considered definitive. Predictions could be verified using
the traditional ATP-[32P]pyrophosphate (PPi) exchange assay or
the recently described non-radioactive colorimetric assay that
quantifies orthophosphate (Pi) derived from degraded PPi as
a means of determining activity.120
6.3 Illustrative examples of genome mining in cyanobacteria
To date there have been very few examples of genome mining in
cyanobacteria. The examples that do exist serve to highlight the
utility of genome mining, not only for new natural products but
also for the identification and characterisation of elusive
biosynthesis gene clusters. Aside from the scytonemin bio-
synthesis gene cluster, the gene clusters highlighted below all
code for the biosynthesis of cyclic peptides containing hetero-
cyclized residues. Collectively, these peptides are known as
cyanobactins.83
6.3.1 Patellamides. The patellamides are cyclic peptides often
isolated from didemnid ascidians and are thought to be bio-
synthesised by obligate cyanobacterial symbionts of the genus
Nat. Prod. Rep., 2009, 26, 1447–1465 | 1457
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
Prochloron. Attempts to culture Prochloron spp. have so far been
unsuccessful, and thus biosynthetic studies have proven difficult.
In an effort to sustain metabolite production, an approach
employing shotgun cloning and heterologous expression of
Prochloron sp. DNA in E. coli was used to confirm Prochloron sp.
as a patellamide producer.121 Independently, and as part of the
Prochloron didemni sequencing project, the draft genome
sequence was mined for patellamide biosynthesis genes.122 The
patellamide biosynthesis gene cluster (pat) was identified;
however, rather than being assembled by an NRPS as first
anticipated,121,123 the patellamides were found to be assembled
ribosomally.83,122 To unequivocally prove that the 11-kb 7-ORF
(patA–patG) gene cluster was indeed responsible for patellamide
biosynthesis, the cluster was heterologously expressed in E. coli.
Analysis of the resulting extract by LC-MS revealed production
of patellamide A 23, thus providing conclusive evidence for
ribosomal assembly.83,122
6.3.2 Trichamide. Sequence similarities to homologs of
patellamide biosynthesis genes in the genome of Trichodesmium
erythraeum ISM101, revealed the strikingly similar trichamide
biosynthesis gene cluster (tri) encompassing eleven ORFs and
spanning 12.5 kb. Structure prediction followed by rigorous
chemical analysis of the culture extract revealed trichamide 24,
a novel natural product and the first reported metabolite from
Trichodesmium erythraeum, thus illustrating the power of
genome mining for natural product discovery.124
6.3.3 Microcyclamides. The discovery of pat biosynthesis
gene cluster led to the identification of ribosomal biosynthesis
genes in Microcystis aeruginosa NIES-298 responsible for the
biosynthesis of microcyclamide using a PCR-directed approach
employing degenerate primers based on pat and tri gene
1458 | Nat. Prod. Rep., 2009, 26, 1447–1465
sequences. Scanning of the M. aeruginosa PCC7806 genome for
similar clusters ultimately led to the isolation and structure
elucidation of two new microcyclamides, 7806A 25 and 7806B
26.81,125 A recent review comparing biosynthesis gene clusters
yielding similar peptides in other cyanobacteria revealed
a global assembly line responsible for cyanobactins. After
PKSs and NRPSs, the cyanobactin biosynthesis assembly
line represents another major route to small molecules in
cyanobacteria.126
6.3.4 Scytonemin. Scytonemin 27 is a UV-absorbing pigment
that plays an important role in protecting cyanobacteria from
harmful exposure.127 Transposon mutagenesis of the scytonemin-
producing strain Nostoc punctiforme ATCC 29133 resulted in
the generation of a mutant strain, SCY 59, unable to produce
scytonemin.128 Genomic analyses of the mutated region sup-
ported a biosynthetic role for the mutated gene, and allowed the
putative assignment of the 18-ORF 28-kb scytonemin biosyn-
thesis gene cluster. A biosynthetic route to scytonemin, however,
was not proposed.128 Subsequent gene expression studies
confirmed that the putative cluster was indeed involved in scy-
tonemin biosynthesis,129 and the proposed initial steps of its
biosynthesis from L-tryptophan and prephenate have been vali-
dated.130 Comparison of scytonemin biosynthesis gene clusters in
six cyanobacterial genomes revealed two major architectures,
and these appear to have evolved through genetic rearrange-
ments and insertions.131
7 Mining sequenced cyanobacterial genomes forbiosynthesis gene clusters
Literature concerning the sequenced genomes of cyanobacteria
lacks any real discussion of genes or gene clusters encoding for
This journal is ª The Royal Society of Chemistry 2009
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
proteins involved in natural product biosynthesis. Unlike those
papers111,112,132 reporting the genomes of the actinomycetes
S. avermitilis, S. coelicolor and S. tropica where detailed analyses
of their biosynthetic potential is presented, cyanobacterial
genome reports focus upon topics such as their evolutionary
adaptation to environmental niches and proposed roles in
complex microbial communities. Some literature describes the
production of vital primary metabolites and the genes respon-
sible for their biosynthesis, as well as genes involved in the
biosynthesis of cofactors and carrier proteins, which may
indeed be linked to secondary metabolite production; however,
there is no global view of the organism’s biosynthetic potential.
In a recent study, cyanobacterial strains were analysed for their
biosynthetic potential using PCR by amplifying NRPS and
PKS genes with degenerate primers. Of the twenty-four test
strains, both NRPS and PKS genes were amplified from
seventeen, while only three showed negative results for both
PKS and NRPS genes.133 While these data may be biased in
that many of the cyanobacteria screened are known toxin
producers, it does reveal the widespread occurrence of PKS and
NRPS genes in cyanobacteria and thus supports the notion that
these genes are indeed useful targets for genome mining.
A genome mining study of 223 bacterial genome sequences
deposited up to 2005 included eight cyanobacteria. The focus of
the study was to identify genes associated with clustered PKSs
and/or NRPSs, or as the authors termed them, thiotemplate
modular systems (TMS).134 TMS genes were detected in only
two of the strains, Anabaena sp. PCC7120 and Gloeobacter
violaceus PCC7421, and they constitute �1.5% and 1% of their
respective genomes.134 In comparison, the same study revealed
that TMS genes constituted �3.9% and 1.5% of the 9.0 Mb
S. avermitilis MA-4680 and 8.7 Mb S. coelicolor A3(2)
genomes, respectively.134 S. avermitilis and S. coelicolor A3(2)
are considered prolific producers of secondary metabolites with
�6.6%132 and 8.0%111 of their respective genomes dedicated to
secondary metabolism. The recently sequenced S. tropica
dedicates �10% of its 5.2 Mb genome to natural product
assembly, with at least 6% dedicated to genes associated
with clusters (not only TMS genes) encoding the biosynthesis of
type I PKS, NRPS, or hybrid PKS/NRPS-derived products,
known and unknown.112 It is interesting to note that the biosyn-
thesis of secondary metabolites is linked to cyanobacteria whose
genomes are generally larger than 4 Mb, and thus the production
of such metabolites can be considered a luxury. To date (as was
the case up to 2005) the majority of published cyanobacterial
genome sequences are Synechococcus and Prochlorococcus spp., all
of which possess small genomes (<4 Mb), and appear to lack
secondary metabolite biosynthesis genes.134 This observation is
primarily true of most bacterial taxa.
This journal is ª The Royal Society of Chemistry 2009
7.1 Our approach
Up until August 2008, 34 cyanobacterial genomes (Table 2) had
been completely sequenced, and many of these have never
been investigated for their biosynthetic potential using a purely
bioinformatics-based approach. Like previous studies, we
focussed upon genes encoding NRPSs and PKSs, as these are
responsible for the biosynthesis of the biologically active mole-
cules commonly associated with cyanobacteria and, as described
earlier, the clustered gene architecture makes them amenable to
automated bioinformatics analyses.
Rather than searching for genes and gene clusters, our
approach involved identifying replicated genomic features
across sequenced genomes in our dataset, specifically common
catalytic protein domains essential to NRPSs and PKSs. This
methodology is an extension of the method developed by Pasek
and co-workers.135,136 It adopts the technique of targeting protein
domains such as those found in the Protein Family database
(Pfam),137 and allows for the detection of strings of domains
(termed ‘domain teams’) that are conserved in their content but
not necessarily their order. In designating domain teams, we
essentially decomposed NRPS and PKS genes into the domains
of the proteins they code for. To the best of our knowledge this is
the first report using such methodology to mine sequenced
genomes for conserved protein domains, as domain teams, for
the discovery of natural product biosynthesis gene clusters. As
seen from the gene clusters presented earlier, the highly ordered
and repetitive nature of such NRPS and PKS systems makes
them ideal candidates for domain team analysis.
In short, Pfam is a collection of protein families and domains.
Pfam (version 22.0) contains a total of 9318 protein families
including those involved in natural product assembly, and covers
73% of sequences found in UniProtKB version 9.7.137 In gener-
ating our data, NRPS modules were detected using the Pfam
identifiers PF00501 (AMP-binding enzyme), PF00608 (Conden-
sation domain), and PF00550 (Phosphopantetheine attachment
site), and PF00109/PF02801 (Beta-ketoacyl synthase N- and
C-terminal domains) was used to identify ketosynthase modules.
All sequences and genome coordinates of protein-coding genes
were extracted from the Genbank entries of the available closed
cyanobacterial genomes and searched against the Pfam database
using an automated HMMER wrapper.138 The search results
were tabulated as an input file for Domain Teams using a custom
PERL script. Each individual chromosome was assigned an
identification code, and in cases where the organism had multiple
chromosomes (i.e. plasmids), the genome and plasmids were
assigned individual identification codes. The final dataset
comprised 59 chromosomes and a total of 154 279 domains.
Domain teams were computed using a d-value of 3. That is, in
defining a domain team, we have stipulated that there cannot be
more than 3 domains between any two of the targeted protein
domains on the same chromosome. Domain teams were sorted
using the formula (10 Nchromosomes + 5 Ndomains + Nproteins) where
Nchromosomes, Ndomains and Nproteins are the number of chromo-
somes, domains and proteins, respectively, in a domain team.
Sorting allows those teams containing the highest number of
domains and the larger variety of syntenic chromosomes to be
examined first. In order to predict the total number of clusters
present in each genome and because each putative cluster can
Nat. Prod. Rep., 2009, 26, 1447–1465 | 1459
Table 2 Sequenced cyanobacterial genomes
Strain Genome size (bp) Institution (submission date)
Acaryochloris marina MBIC11017 8 361 599d Translational Genomics Research Institute (27 Aug 2007)Anabaena variabilis ATCC29413 7 068 601d US DOE Joint Genome Institute (12 Sep 2005)Crocosphaera watsonii WH8501 6 238 156 US DOE Joint Genome Institute (13 Jun 2005)Gloeobacter violaceus PCC7421 4 659 019 Kazusa DNA Research Institute, Chiba, Japan (15 Aug 2003)Lyngbya aestuarii CCY9616 7 087 904 J. Craig Venter Institute (14 Dec 2006)Microcystis aeruginosa NIES-843 5 842 795 Kazusa DNA Research Institute, Chiba, Japan (22 Nov 2007)Nodularia spumigena CCY9414 5 357 061 J. Craig Venter Institute (25 Oct 2005)Nostoc punctiforme PCC73102 9 059 191d US DOE Joint Genome Institute (07 Apr 2008)Nostoc sp. PCC7120a 7 211 789d Kazusa DNA Research Institute, Chiba, Japan (02 May 2001)Prochlorococcus marinus AS9601 1 669 886 J. Craig Venter Institute (06 Nov 2006)Prochlorococcus marinus CCMP1375 1 751 080 Genoscope – Centre National de Sequencage, Evry, France (28 May 2003)Prochlorococcus marinus MED4 1 657 990 DOE Joint Genome Institute (03 Jul 2003)Prochlorococcus marinus MIT 9211 1 688 963 Massachusetts Institute of Technology (24 Oct 2007)Prochlorococcus marinus MIT 9215 1 738 790 US DOE Joint Genome Institute (07 Sep 2007)Prochlorococcus marinus MIT 9301 1 641 879 J. Craig Venter Institute (16 Feb 2007)Prochlorococcus marinus MIT9303 2 682 675 J. Craig Venter Institute (10 Jan 2007)Prochlorococcus marinus MIT 9312 1 709 204 US DOE Joint Genome Institute (27 Jul 2005)Prochlorococcus marinus MIT 9313 2 410 873 US DOE Joint Genome Institute (03 Jul 2003)Prochlorococcus marinus MIT 9515 1 704 176 J. Craig Venter Institute (06 Nov 2006)Prochlorococcus marinus NATL1A 1 864 731 J. Craig Venter Institute (06 Nov 2006)Prochlorococcus marinus NATL2A 1 842 899 US DOE Joint Genome Institute (08 Aug 2005)Synechococcus sp. CC9311 2 606 748 The Institute for Genomic Research (04 Aug 2006)Synechococcus sp. CC9605 2 510 659 US DOE Joint Genome Institute (27 Jul 2005)Synechococcus sp. CC9902 2 234 828 US DOE Joint Genome Institute (08 Aug 2005)Synechococcus elongatus PCC6301 2 696 255d Center for Gene Research, Nagoya University, Japan (10 Dec 2004)Synechococcus elongatus PCC7942 2 742 269 US DOE Joint Genome Institute (08 Aug 2005)Synechococcus sp. RCC307 2 224 914 Genoscope – Centre National de Sequencage, Evry, France (19 May 2006)Synechococcus sp. WH8102 2 434 428 US DOE Joint Genome Institute (25 Jun 2001)Synechococcus sp. WH7803 2 366 980 Genoscope – Centre National de Sequencage, Evry, France (19 May 2006)Synechococcus sp. JA-3-3A0bb 2 932 766 The Institute for Genomic Research (10 Mar 2006)Synechococcus sp. JA-2-3B0ac 3 046 682 The Institute for Genomic Research (21 Mar 2006)Synechocystis sp. PCC6803 3 947 019d Kazusa DNA Research Institute, Chiba, Japan (01 Nov 2001)Thermosynechococcus elongatus BP-1 2 593 857 Kazusa DNA Research Institute, Chiba, Japan (05 Jun 2002)Trichodesmium erythraeum IMS101 7 750 108 US DOE Joint Genome Institute (21 Jun 2006)
a Also known as Anabaena sp. PCC7120. b Also known as Cyanobacteria Yellowstone A-Prime. c Also known as Cyanobacteria Yellowstone B-Prime.d Plasmid inclusive.
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
appear in multiple domain teams, we also developed a custom
PERL script to dereplicate the set of unique PKS/NRPS clusters.
The major limitation of this method is that the amount of
computational time required to determine domain teams grows
exponentially with the size of the input dataset.
7.2 Summary of our findings
Our preliminary genome mining data is displayed in Table 3. The
table shows the numbers of domain teams of PKS, NRPS, and
hybrid PKS/NRPS protein domains detected from sequenced
cyanobacterial genomes (and associated plasmids) using the
approach described above. In order to simplify the table,
biosynthesis clusters (that is clusters of protein domains rather
than the classical gene clusters) detected on plasmids of an
organism are included in brackets. An in-depth discussion of
each biosynthesis cluster (or even each genome) is out of the
scope of this review.
Like any genome mining exercise, we encountered some data
which could be considered falsely positive. That is, any PKS and/
or NRPS elements (Pfam identifiers) detected were considered to
be a cluster if the arrangement of domains was detected in
a larger, or for that matter smaller, cluster from any one of the
other chromosomes in the data set. The data included in the table
1460 | Nat. Prod. Rep., 2009, 26, 1447–1465
are the raw, dereplicated, output of our genome mining analysis.
In analysing the data generated, the logical first step is to
determine whether the output can be truly considered a bona fide
biosynthesis cluster. This can be considered selective mining and
be achieved simply by scanning the data, a task simplified by the
graphical output of the Domain Teams analysis. An example is
shown in Fig. 3. Another consideration is one which still causes
some consternation, and that is determining gene cluster
boundaries and linker regions between genes. While our domain
teams based approach doesn’t allow us to unequivocally deter-
mine these, automated bioinformatics tools such as UMA
(Udwary–Merski algorithm)139 and CLUSEAN (CLUster
SEquence ANalyzer)140 can be used for multimodular systems
such as PKSs and NRPSs and assist with the complete anno-
tation of a biosynthesis gene cluster.
The graphical output of Domain Teams is highly interactive.
All identified protein domains are clickable links to the Pfam
website, where a summary of the protein family and its function
can be found. This allows for other domains which were not
initially searched for in the genome mining stage of the analysis
to be quickly determined, such as PF00975 (thioesterase domain)
and PF08242 (methyl transferase domain) as shown in Fig. 3.
Protein domains are grouped according to their respective coding
genes, and the clickable NCBI linked locus tag becomes a query
This journal is ª The Royal Society of Chemistry 2009
Table 3 Numbers of domain teams identified in sequenced cyano-bacterial strainsd
Strain PKS NRPS Hybrid
Acaryochloris marina MBIC11017 3a 2b (2)b 1 (1)Anabaena variabilis ATCC29413 5 1 (1) 4 (1)Gloeobacter violaceus PCC7421 4 3 2Microcystis aeruginosa NIES-843 4 3 3Nostoc punctiforme PCC73102 7 6 (1) 8 (1)Nostoc sp. PCC7120 6 1 5Prochlorococcus marinus AS9601 2a 1b 1c
Prochlorococcus marinusCCMP1375
2a 1b 1c
Prochlorococcus marinus MED4 2a 1b 1c
Prochlorococcus marinus MIT9211 2a 1b 1c
Prochlorococcus marinus MIT9215 2a 1b 1c
Prochlorococcus marinus MIT9301 2a 1b 1c
Prochlorococcus marinus MIT9303 2a 1 2c
Prochlorococcus marinus MIT9312 2a 1b 1c
Prochlorococcus marinus MIT9313 1 1b 1c
Prochlorococcus marinus MIT9515 2a 1b 1c
Prochlorococcus marinus NATL1A 2a 1b 1c
Prochlorococcus marinus NATL2A 2a 1b 1c
Synechococcus sp. CC9311 3a 1b 2c
Synechococcus sp. CC9605 3a 1 1c
Synechococcus sp. CC9902 2 1b 1c
Synechococcus elongatus PCC6301 2a 1b 1c
Synechococcus elongatus PCC7942 2a 1b 1c
Synechococcus sp. RCC307 3a 2b 1c
Synechococcus sp. WH7803 2a 1b 1c
Synechococcus sp. WH8102 3a 1b 1c
Synechococcus sp. JA-3-3A0b 3a 1b 1Synechococcus sp. JA-2-3B0a 2a 1 1c
Synechocystis sp. PCC6803 3 1b 1c
Thermosynechococcus elongatusBP-1
3a 1b 1c
Trichodesmium erythraeum IMS101 3 2 1
a Chromosomes share a common PKS domain team. b Chromosomesshare a common NRPS domain team. c Chromosomes sharea common hybrid domain team. d Numbers in brackets refer todomain teams detected on plasmids.
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
term for an Entrez cross-database search, revealing copious
amounts of useful information, particularly gene sequence
data and locations. This provides an efficient means of der-
eplicating previously sequenced gene clusters and dereplicating
highly similar clusters across sequenced genomes. With a unique
(orphan) cluster at hand, bioinformatics tools such as the
NRPS/PKS predictive BLAST141 can be used as a first step
Fig. 3 Domain Teams graphical output of the microcys
This journal is ª The Royal Society of Chemistry 2009
toward predicting the backbone of the molecule prior to
employing rigorous chemical and molecular techniques required
to elucidate the natural product’s structure. Full details of this
new method and derived results will be published elsewhere.
The following sections highlight some of our findings by way
of comparison of our data with those reported from previous
cyanobacterial genome mining studies.
7.2.1 Microcystis aeruginosa NIES-843. The original report
and annotation of the Microcystis aeruginosa NIES-843 genome
revealed, as expected, gene clusters for the biosynthesis of the
microcystins (mcyA–J) and the cyanopeptolins (A 28) (mcnA–C
and mcnE–G), as well as one other orphan NRPS spanning 17 kb
reportedly located between co-ordinates 5202708–5219745.61
Using the domain teams approach, all three of these clusters were
detected along with an additional hybrid NRPS/PKS spanning at
least 17 kb over five ORFs (MAE_27800–MAE_27840) located
between co-ordinates 2508556–2525761.
7.2.2 Nostoc sp. PCC7120. The domain teams analysis sup-
ported Donadio’s findings in Nostoc sp. PCC7120.134 Major
clusters include a small PKS likely responsible for heterocyst
glycolipid synthesis, a hybrid biosynthesis cluster encoding four
PKSs and one NRPS, a monomodular NRPS, and a large cluster
encoding two PKSs and ten NRPSs. This cluster spans at least
55 kb over fourteen ORFs and codes for adenylation domains
activating serine (�2), glycine (�4), cysteine, valine, as well as
two others whose substrate could not be unequivocally deter-
mined using the NRPS/PKS BLAST tool. Adjacent to the
terminal NRPS module is a gene encoding a putative ABC
transporter. The domain teams approach revealed a small cluster
coding for two PKS modules, both of which code for a methyl
tin biosynthesis cluster in M. aeruginosa NIES-843.
Nat. Prod. Rep., 2009, 26, 1447–1465 | 1461
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
transferase. Interestingly, a �3 kb gene located nearby to this
PKS on the genome was annotated as an ABC transporter linked
to toxin secretion.
7.2.3 Gloeobacter violaceus PCC7421. Domain teams anal-
ysis revealed a �30 kb region of the genome containing three
thioesterase domains. Two of these could be assigned to a tri-
modular PKS likely involved in glycolipid biosynthesis134 and
a bimodular PKS. The remaining thioesterase domain is located
on a gene that does not appear to be directly linked to any
biosynthesis cluster. Two small modular PKSs (�11 and 14 kb)
and numerous isolated ketosynthase domains were also detected.
7.2.4 Nostoc punctiforme PCC73102. A bioinformatics-
based overview of the �9 Mb Nostoc punctiforme PCC73102
(also known as ATCC 29133) revealed 62 ORFs encoding
proteins involved in secondary metabolite biosynthesis.59 While
the authors state that these are involved in microcystins
biosynthesis, it is likely that they are involved in the assembly of
other non-microcystin NRPS- and PKS-derived products, as to
date there is no chemical evidence of microcystin production by
terrestrial Nostoc spp. Two clusters of twelve and fourteen
biosynthesis genes spanning �47 and 49 kb, respectively, were
identified along with a host of smaller gene sets. A subsequent
BLAST-based genomic survey of N. punctiforme ATCC29133
revealed 17 genes encoding NRPSs (encompassing 42
A-domains) and 10 genes encoding PKSs (encompassing
22 KS-domains).142 These data suggest that N. punctiforme’s
biosynthetic potential is greater than any other sequenced
cyanobacterium and supports the notion that organisms with
larger genomes encode a greater number of secondary metabolite
pathways than those organisms with smaller genomes. Our
analyses revealed several noteworthy biosynthesis gene clusters,
many of which encoded archetypal PKSs and/or NRPSs. The
seven major biosynthesis clusters (the smallest being 16 kb)
combined encoded 45 A-domains and 24 KS-domains. In total,
these seven gene clusters span at least 250 kb, or 2.5 times the
length of TMS gene sequence as determined by Donadio for
Anabaena sp. PCC7120.134 One of these clusters, located between
co-ordinates 2667205–2708401 (Npun_F2181-NpunF_2188),
codes for nostopeptolide biosynthesis, as previously reported.67
An NRPS/PKS hybrid cluster spanning at least 57 kb over five
genes includes one gene 16 kb in length encoding three NRPSs
and one PKS. An even larger cluster (62 kb) spanning 27 genes
encodes four NRPSs and five PKSs, as well as two methyl-
transferases and an aminotransferase. Of the seven major
clusters, six are hybrids, and the other encodes an NRPS with six
adenylation domains.
7.2.5 Anabaena variabilis ATCC29413. The 8.4 Mb
Anabaena variabilis ATCC29413 genome codes for several small
to mid-sized hybrid NRPS/PKSs including a �36 kb cluster
encoding seven adenylation domains and a single PKS which we
determined to be the largest cluster from this organism. A 27 kb
locus coding for four ketosynthase domains and two adenylation
domains was the only other cluster identified greater than 20 kb.
The remainder of the�130 kb TMS sequence from A. variabilis is
accounted for by a small PKS/NRPS (16 kb) containing four
adenylation domains and one ketosynthase, a trimodular NRPS
1462 | Nat. Prod. Rep., 2009, 26, 1447–1465
(13 kb), a small PKS (16 kb) and two 11 kb fragments in close
proximity to each other, each encoding two NRPSs.134 An
unanticipated finding resulted from the analysis of a plasmid
from this strain. We discovered an NRPS containing seven
adenylation domains and a terminal thioesterase encoded by
a 25 kb gene cluster. This discovery nicely illustrates the power of
this approach we used for genome mining.
These selected examples of genome mining using the ‘Domain
Teams’ approach reveal a rich array of potential non-ribosomal
peptide and polyketide biosynthesis gene clusters in cyanobac-
teria. Here, we have found that TMS genes account for at least
127 kb (or 2.2%) of the Microcystis aeruginosa NIES-843
genome, 92 kb (1.3%) of the Nostoc sp. PCC7120 genome, 56 kb
(1.2%) of the Gloeobacter violaceus PCC7421 genome, 283 kb
(3.1%) of the Nostoc punctiforme PCC73102 genome and 157 kb
(2.2%) of the Anabaena variabilis ATCC29413 genome. It is
tempting to speculate on the structures of the natural products
these gene clusters encode, or even assign these clusters to known
chemical entities. However, it is out of the scope of this review to
detail all the different permutations, and our efforts in assigning
these clusters to either new or known natural products are
ongoing.
8 Discussion
We have clearly shown that the ‘Domain Teams’ approach that
we have developed for genome mining is indeed a valid and
robust method for identifying, in this case, biosynthesis clusters
encoding PKSs, NRPSs, and hybrids thereof. Our data support
Donadio’s findings that in general small genomes are devoid of
TMS genes,134 though we did detect protein domains commonly
associated with PKSs (likely involved in fatty acid biosynthesis)
and NRPSs (AMP binding domains), agreeing with Webb’s
observation that there is a significant correlation between
numbers of A and KS domains and genome size.142 In general,
cyanobacteria from the order Nostocales, which includes
Anabaena, Nodularia and Nostoc species, are amongst the more
prolific of the small-molecule-producing microorganisms, and it
is these that possess larger genomes. We successfully mined
genomes for clusters not previously identified from sequenced
cyanobacteria or reported from other genome mining surveys;
however, we are yet to determine whether these are novel. Some
of these clusters may code for biosynthesis pathways in operation
in other yet-to-be-sequenced cyanobacterial strains, and some
clusters we identified here may have already been sequenced in
other organisms, where the focus has been the biosynthesis of
a particular molecule itself. In order to clarify this, and support
any bioinformatics-based genome mining exercise, rigorous
analytical methods need to be employed, especially in those cases
where clusters are reorganised. A major advantage of this
approach is that by changing the search keys (i.e. pfam identi-
fiers) it is possible to quickly identify other gene clusters such as
those encoding toxin assembly without the need to regenerate the
data matrix. For example, mining for biosynthesis clusters
encoding terpenoid pathways is possible, assuming of course that
the biosynthesis genes (and hence protein domains) are clustered.
This is achievable through searching specifically for catalytic
protein domains associated with terpenoid biosynthesis. We
could potentially take this analysis one step further by, for
This journal is ª The Royal Society of Chemistry 2009
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
example, searching specifically for meroterpenoids (e.g. napyra-
diomycin143) using queries based on both terpene biosynthesis
and polyketide biosynthesis. A general drawback of using
‘Domain Teams’ is that the method can really only be applied to
closed genomes, as it requires knowledge of the relative position
of individual genes (and associated domains) on a chromosome.
Therefore, it is inapplicable to large (and mostly unassembled)
metagenomic data. Given the relatively small number of
cyanobacterial genomes sequenced compared with other micro-
organisms, and the relative abundance of biosynthesis gene
clusters in larger cyanobacterial genomes, it is clear that the
yet-to-be-discovered biosynthetic potential of cyanobacteria is
immense.
9 Concluding remarks
Nature has evolved discrete strategies for the assembly of
structurally diverse natural products derived from complex
biosynthetic pathways. Mining these pathways in cyanobacterial
genomes has revealed a greater biosynthetic potential for novel
natural products than anticipated. The challenge now is to
exploit such pathways for the discovery of novel natural prod-
ucts, and at the same time use genomic information to assist in
the development of modern technologies relevant to the phar-
maceutical, agricultural, and biotechnology industries.
10 Acknowledgements
Research on toxin biosynthesis in the author’s laboratory is
funded by the Australian Research Council (ARC), and B.A.N.
is a Federation Fellow of the ARC. J.A.K. and F.M.L. are
supported by the UNSW Environmental Microbiology Initia-
tive. F.M.L. is a UNSW Vice-Chancellor’s Postdoctoral
Fellowship recipient.
11 References
1 J. W. Schopf and B. M. Packer, Science, 1987, 237, 70–73.2 A. C. Allwood, M. R. Walter, B. S. Kamber, C. P. Marshall and
I. W. Burch, Nature, 2006, 441, 714–718.3 S. M. Awramik, Photosynth. Res., 1992, 33, 75–89.4 R. Buick, Philos. Trans. R. Soc. London, Ser. B, 2008, 363, 2731–
2743.5 A. Lazcano and S. L. Miller, J. Mol. Evol., 1994, 39, 546–554.6 L. J. Stal, in The Ecology of Cyanobacteria, eds. B. A. Whitton and
M. Potts, Kluwer Academic Publishers, Dordrecht, 2000,pp. 61–120.
7 R. Castenholz and J. B. Waterbury, in Bergey’s Manual ofSystematic Bacteriology, ed. J. T. Staley, Williams & Wilkins,Sydney, 1989, pp. 1710–1727.
8 H. W. Paerl, J. L. Pinckney and T. F. Steppe, Environ. Microbiol.,2000, 2, 11–26.
9 G. A. Codd, S. G. Bell, K. Kaya, C. J. Ward, K. A. Beattie andJ. S. Metcalf, Eur. J. Phycol., 1999, 34, 405–415.
10 A. M. Burja, B. Banaigs, E. Abou-Mansour, J. G. Burgess andP. C. Wright, Tetrahedron, 2001, 57, 9347–9377.
11 R. E. Moore, J. Ind. Microbiol., 1996, 16, 134–143.12 R. E. Moore, I. Ohtani, B. S. Moore, C. B. Dekoning,
W. Y. Yoshida, M. T. C. Runnegar and W. W. Carmichael, Gazz.Chim. Ital., 1993, 123, 329–336.
13 M. Namikoshi and K. L. Rinehart, J. Ind. Microbiol. Biotechnol.,1996, 17, 373–384.
14 R. M. Van Wagoner, A. K. Drummond and J. L. C. Wright, Adv.Appl. Microbiol., 2007, 61, 89–217.
15 L. T. Tan, Phytochemistry, 2007, 68, 954–979.16 D. J. Faulkner, Nat. Prod. Rep., 1984, 1, 551–598.
This journal is ª The Royal Society of Chemistry 2009
17 D. J. Faulkner, Nat. Prod. Rep., 2002, 19, 1–48.18 J. W. Blunt, B. R. Copp, M. H. G. Munro, P. T. Northcote and
M. R. Prinsep, Nat. Prod. Rep., 2003, 20, 1–48.19 J. W. Blunt, B. R. Copp, W. P. Hu, M. H. G. Munro,
P. T. Northcote and M. R. Prinsep, Nat. Prod. Rep., 2009, 26,170–244.
20 J. Berdy, J. Antibiot., 2005, 58, 1–26.21 A. L. Demain and S. Sanchez, J. Antibiot., 2009, 62, 5–16.22 G. A. Codd, L. F. Morrison and J. S. Metcalf, Toxicol. Appl.
Pharmacol., 2005, 203, 264–272.23 K. Harada, Chem. Pharm. Bull., 2004, 52, 889–899.24 K. M. Botham and J. F. Pennock, Biochem. J., 1971, 122, 127.25 S. C. Bobzin and R. E. Moore, Tetrahedron, 1993, 49, 7615–7626.26 B. S. Moore, I. Ohtani, C. B. Dekoning, R. E. Moore and
W. W. Carmichael, Tetrahedron Lett., 1992, 33, 6595–6598.27 H. Gross, V. O. Stockwell, M. D. Henkels, B. Nowak-Thompson,
J. E. Loper and W. H. Gerwick, Chem. Biol., 2007, 14, 53–63.28 M. A. Fischbach and C. T. Walsh, Chem. Rev., 2006, 106,
3468–3496.29 S. E. O’Connor, Nat. Chem. Biol., 2006, 2, 511–512.30 J. W. Trauger, R. M. Kohli, H. D. Mootz, M. A. Marahiel and
C. T. Walsh, Nature, 2000, 407, 215–218.31 C. T. Walsh, Science, 2004, 303, 1805–1810.32 M. C. Moffitt and B. A. Neilan, Appl. Environ. Microbiol., 2004, 70,
6353–6362.33 D. Tillett, E. Dittmann, M. Erhard, H. von Dohren, T. Borner and
B. A. Neilan, Chem. Biol., 2000, 7, 753–764.34 L. H. Du, C. Sanchez and B. Shen, Metab. Eng., 2001, 3, 78–95.35 E. S. Sattely, M. A. Fischbach and C. T. Walsh, Nat. Prod. Rep.,
2008, 25, 757–793.36 U. Rix, C. Fischer, L. L. Remsing and J. Rohr, Nat. Prod. Rep.,
2002, 19, 542–580.37 L. S. Luo, M. D. Burkart, T. Stachelhaus and C. T. Walsh, J. Am.
Chem. Soc., 2001, 123, 11208–11218.38 D. B. Stein, U. Linne, M. Hahn and M. A. Marahiel, ChemBioChem,
2006, 7, 1807–1814.39 T. Stachelhaus and C. T. Walsh, Biochemistry, 2000, 39,
5775–5787.40 H. Sielaff, E. Dittmann, N. T. De Marsac, C. Bouchier, H. Von
Dohren, T. Borner and T. Schwecke, Biochem. J., 2003, 373,909–916.
41 G. Christiansen, J. Fastner, M. Erhard, T. Borner and E. Dittmann,J. Bacteriol., 2003, 185, 564–572.
42 L. Rouhiainen, T. Vakkilainen, B. L. Siemer, W. Buikema,R. Haselkorn and K. Sivonen, Appl. Environ. Microbiol., 2004, 70,686–692.
43 M. Namikoshi, K. L. Rinehart, R. Sakai, R. R. Stotts,A. M. Dahlem, V. R. Beasley, W. W. Carmichael andW. R. Evans, J. Org. Chem., 1992, 57, 866–872.
44 H. C. Losey, M. W. Peczuh, Z. Chen, U. S. Eggert, S. D. Dong,I. Pelczer, D. Kahne and C. T. Walsh, Biochemistry, 2001, 40,4745–4755.
45 C. T. Walsh, H. C. Losey and C. L. Freel Meyers, Biochem. Soc.Trans., 2003, 31, 487–492.
46 W. Saurin, M. Hofnung and E. Dassa, J. Mol. Evol., 1999, 48,22–41.
47 R. J. Jovell, A. J. L. Macario and E. C. deMacario, Gene, 1996, 174,281–284.
48 P. M. Jones and A. M. George, FEMS Microbiol. Lett., 1999, 179,187–202.
49 D. M. Gardiner, R. S. Jarvis and B. J. Howlett, Fungal Genet. Biol.,2005, 42, 257–263.
50 R. H. Proctor, D. W. Brown, R. D. Plattner and A. E. Desjardins,Fungal Genet. Biol., 2003, 38, 237–249.
51 I. B. Holland and M. A. Blight, J. Mol. Biol., 1999, 293, 381–399.52 J. E. Walker, A. Eberle, N. J. Gay, M. J. Runswick and M. Saraste,
Biochem. Soc. Trans., 1982, 10, 203–206.53 E. Dassa and M. Hofnung, EMBO J., 1985, 4, 2287–2293.54 Y. Quentin, G. Fichant and F. Denizot, J. Mol. Biol., 1999, 287,
467–484.55 L. A. Pearson, M. Hisbergues, T. Borner, E. Dittmann and
B. A. Neilan, Appl. Environ. Microbiol., 2004, 70, 6370–6378.56 M. Yoshida, T. Yoshida, A. Kashima, Y. Takashima, N. Hosoda,
K. Nagasaki and S. Hiroishi, Appl. Environ. Microbiol., 2008, 74,3269–3273.
Nat. Prod. Rep., 2009, 26, 1447–1465 | 1463
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
57 M. Welker and H. von Dohren, FEMS Microbiol. Rev., 2006, 30,530–563.
58 T. Kaneko, A. Tanaka, S. Sato, H. Kotani, T. Sazuka, N. Miyajima,M. Sugiura and S. Tabata, DNA Res., 1995, 2, 153–166.
59 J. C. Meeks, J. Elhai, T. Thiel, M. Potts, F. Larimer, J. Lamerdin,P. Predki and R. Atlas, Photosynth. Res., 2001, 70, 85–106.
60 Y. Nakamura, T. Kaneko, S. Sato, M. Ikeuchi, H. Katoh,S. Sasamoto, A. Watanabe, M. Iriguchi, K. Kawashima,T. Kimura, Y. Kishida, C. Kiyokawa, M. Kohara,M. Matsumoto, A. Matsuno, N. Nakazaki, S. Shimpo,M. Sugimoto, C. Takeuchi, M. Yamada and S. Tabata, DNA Res.,2002, 9, 123–130.
61 T. Kaneko, N. Nakajima, S. Okamoto, I. Suzuki, Y. Tanabe,M. Tamaoki, Y. Nakamura, F. Kasai, A. Watanabe,K. Kawashima, Y. Kishida, A. Ono, Y. Shimizu, C. Takahashi,C. Minami, T. Fujishiro, M. Kohara, M. Katoh, N. Nakazaki,S. Nakayama, M. Yamada, S. Tabata and M. M. Watanabe, DNARes., 2007, 14, 247–256.
62 G. Christiansen, R. Kurmayer, Q. Liu and T. Borner, Appl. Environ.Microbiol., 2006, 72, 117–123.
63 B. Mikalsen, G. Boison, O. M. Skulberg, J. Fastner, W. Davies,T. M. Gabrielsen, K. Rudi and K. S. Jakobsen, J. Bacteriol., 2003,185, 2774–2785.
64 Y. Tanabe, K. Kaya and M. M. Watanabe, J. Mol. Evol., 2004, 58,633–641.
65 A. Rantala, D. P. Fewer, M. Hisbergues, L. Rouhiainen,J. Vaitomaa, T. Borner and K. Sivonen, Proc. Natl. Acad. Sci.U. S. A., 2004, 101, 568–573.
66 J. E. Becker, R. E. Moore and B. S. Moore, Gene, 2004, 325, 35–42.67 D. Hoffmann, J. M. Hevel, R. E. Moore and B. S. Moore, Gene,
2003, 311, 171–180.68 M. Kaebernick, E. Dittmann, T. Borner and B. A. Neilan, Appl.
Environ. Microbiol., 2002, 68, 449–455.69 M. Kaebernick and B. A. Neilan, FEMS Microbiol. Ecol., 2001, 35, 1–9.70 E. Dittmann, M. Erhard, M. Kaebernick, C. Scheler, B. A. Neilan,
H. von Dohren and T. Borner, Microbiology, 2001, 147, 3113–3119.71 M. C. Moffitt and B. A. Neilan, J. Mol. Evol., 2003, 56, 446–457.72 R. Kellmann, T. K. Mihali, Y. J. Jeon, R. Pickford, F. Pomati and
B. A. Neilan, Appl. Environ. Microbiol., 2008, 74, 4044–4053.73 Z. Q. Beck, C. C. Aldrich, N. A. Magarvey, G. I. Georg and
D. H. Sherman, Biochemistry, 2005, 44, 13457–13466.74 N. A. Magarvey, Z. Q. Beck, T. Golakoti, Y. S. Ding, U. Huber,
T. K. Hemscheidt, D. Abelson, R. E. Moore and D. H. Sherman,ACS Chem. Biol., 2006, 1, 766–779.
75 K. Ishida, G. Christiansen, W. Y. Yoshida, R. Kurmayer,M. Welker, N. Valls, J. Bonjoch, C. Hertweck, T. Borner,T. Hemscheidt and E. Dittmann, Chem. Biol., 2007, 14, 565–576.
76 K. Ishida, M. Welker, G. Christiansen, S. Cadel-Six, C. Bouchier,E. Dittmann, C. Hertweck and N. T. de Marsac, Appl. Environ.Microbiol., 2009, 75, 2017–2026.
77 L. Rouhiainen, L. Paulin, S. Suomalainen, H. Hyytiainen,W. Buikema, R. Haselkorn and K. Sivonen, Mol. Microbiol., 2000,37, 156–167.
78 A. Mejean, S. Mann, T. Maldiney, G. Vassiliadis, O. Lequin andO. Ploux, J. Am. Chem. Soc., 2009, 131, 7512–7513.
79 T. B. Rounge, T. Rohrlack, A. Tooming-Klunderud, T. Kristensenand K. S. Jakobsen, Appl. Environ. Microbiol., 2007, 73, 7322–7330.
80 A. Tooming-Klunderud, T. Rohrlack, K. Shalchian-Tabrizi,T. Kristensen and K. S. Jakobsen, Microbiology, 2007, 153, 1382–1393.
81 N. Ziemert, K. Ishida, A. Liaimer, C. Hertweck and E. Dittmann,Angew. Chem., Int. Ed., 2008, 47, 7756–7759.
82 B. Philmus, G. Christiansen, W. Y. Yoshida and T. K. Hemscheidt,ChemBioChem, 2008, 9, 3066–3073.
83 M. S. Donia, J. Ravel and E. W. Schmidt, Nat. Chem. Biol., 2008, 4,341–343.
84 T. Nishizawa, M. Asayama, K. Fujii, K. Harada and M. Shirai,J. Biochem., 1999, 126, 520–529.
85 T. Nishizawa, A. Ueda, M. Asayama, K. Fujii, K. Harada, K. Ochiand M. Shirai, J. Biochem., 2000, 127, 779–789.
86 R. E. Moore, J. L. Chen, B. S. Moore, G. M. L. Patterson andW. W. Carmichael, J. Am. Chem. Soc., 1991, 113, 5083–5084.
87 K. L. Rinehart, M. Namikoshi and B. W. Choi, J. Appl. Phycol.,1994, 6, 159–176.
88 L. M. Hicks, M. C. Moffitt, L. L. Beer, B. S. Moore andN. L. Kelleher, ACS Chem. Biol., 2006, 1, 93–102.
1464 | Nat. Prod. Rep., 2009, 26, 1447–1465
89 H. Luesch, D. Hoffmann, J. M. Hevel, J. E. Becker, T. Golakoti andR. E. Moore, J. Org. Chem., 2003, 68, 83–91.
90 F. Kopp, C. Mahlert, J. Grunewald and M. A. Marahiel, J. Am.Chem. Soc., 2006, 128, 16478–16479.
91 F. Kopp and M. A. Marahiel, Nat. Prod. Rep., 2007, 24, 735–749.92 Y. Shimizu, M. Norte, A. Hori, A. Genenah and M. Kobayashi,
J. Am. Chem. Soc., 1984, 106, 6433–6434.93 L. C. Gu, T. W. Geders, B. Wang, W. H. Gerwick, K. Hakansson,
J. L. Smith and D. H. Sherman, Science, 2007, 318, 970–974.94 D. L. Burgoyne, T. K. Hemscheidt, R. E. Moore and
M. T. C. Runnegar, J. Org. Chem., 2000, 65, 152–156.95 T. K. Mihali, R. Kellmann, J. Muenchhoff, K. D. Barrow and
B. A. Neilan, Appl. Environ. Microbiol., 2008, 74, 716–722.96 A. C. Jones, L. C. Gu, C. M. Sorrels, D. H. Sherman and
W. H. Gerwick, Curr. Opin. Chem. Biol., 2009, 13, 216–223.97 N. Sitachitta, B. L. Marquez, R. T. Williamson, J. Rossi,
M. A. Roberts, W. H. Gerwick, V. A. Nguyen and C. L. Willis,Tetrahedron, 2000, 56, 9103–9113.
98 Z. X. Chang, P. Flatt, W. H. Gerwick, V. A. Nguyen, C. L. Willisand D. H. Sherman, Gene, 2002, 296, 235–247.
99 D. P. Galonic, F. H. Vaillancourt and C. T. Walsh, J. Am. Chem.Soc., 2006, 128, 3900–3901.
100 P. Flatt, J. Gautschi, R. Thacker, M. Musafija-Girt, P. Crews andW. Gerwick, Mar. Biol., 2005, 147, 761–774.
101 Z. X. Chang, N. Sitachitta, J. V. Rossi, M. A. Roberts, P. M. Flatt,J. Y. Jia, D. H. Sherman and W. H. Gerwick, J. Nat. Prod., 2004, 67,1356–1367.
102 T. W. Geders, L. C. Gu, J. C. Mowers, H. C. Liu, W. H. Gerwick,K. Hakansson, D. H. Sherman and J. L. Smith, J. Biol. Chem.,2007, 282, 35954–35963.
103 L. C. Gu, B. Wang, A. Kulkarni, T. W. Geders, R. V. Grindberg,L. Gerwick, K. Hakansson, P. Wipf, J. L. Smith, W. H. Gerwickand D. H. Sherman, Nature, 2009, 459, 731–735.
104 D. J. Edwards, B. L. Marquez, L. M. Nogle, K. McPhail,D. E. Goeger, M. A. Roberts and W. H. Gerwick, Chem. Biol.,2004, 11, 817–833.
105 D. J. Edwards and W. H. Gerwick, J. Am. Chem. Soc., 2004, 126,11432–11433.
106 K. Irie, S. Tomimatsu, Y. Nakagawa, H. Ohigashi and H. Hayashi,Biosci., Biotechnol., Biochem., 1999, 63, 1669–1670.
107 A. W. Schultz, D. C. Oh, J. R. Carney, R. T. Williamson,D. W. Udwary, P. R. Jensen, S. J. Gould, W. Fenical andB. S. Moore, J. Am. Chem. Soc., 2008, 130, 4507–4516.
108 J. A. Read and C. T. Walsh, J. Am. Chem. Soc., 2007, 129,15762–15763.
109 A. V. Ramaswamy, C. M. Sorrels and W. H. Gerwick, J. Nat. Prod.,2007, 70, 1977–1986.
110 C. T. Calderone, S. B. Bumpus, N. L. Kelleher, C. T. Walsh andN. A. Magarvey, Proc. Natl. Acad. Sci. U. S. A., 2008, 105,12809–12814.
111 S. D. Bentley, K. F. Chater, A. M. Cerdeno-Tarraga, G. L. Challis,N. R. Thomson, K. D. James, D. E. Harris, M. A. Quail, H. Kieser,D. Harper, A. Bateman, S. Brown, G. Chandra, C. W. Chen,M. Collins, A. Cronin, A. Fraser, A. Goble, J. Hidalgo,T. Hornsby, S. Howarth, C. H. Huang, T. Kieser, L. Larke,L. Murphy, K. Oliver, S. O’Neil, E. Rabbinowitsch,M. A. Rajandream, K. Rutherford, S. Rutter, K. Seeger,D. Saunders, S. Sharp, R. Squares, S. Squares, K. Taylor,T. Warren, A. Wietzorrek, J. Woodward, B. G. Barrell, J. Parkhilland D. A. Hopwood, Nature, 2002, 417, 141–147.
112 D. W. Udwary, L. Zeigler, R. N. Asolkar, V. Singan, A. Lapidus,W. Fenical, P. R. Jensen and B. S. Moore, Proc. Natl. Acad. Sci.U. S. A., 2007, 104, 10376–10381.
113 G. L. Challis, J. Med. Chem., 2008, 51, 2618–2628.114 M. Zerikly and G. L. Challis, ChemBioChem, 2009, 10, 625–633.115 J. N. Copp, A. A. Roberts, M. A. Marahiel and B. A. Neilan,
J. Bacteriol., 2007, 189, 3133–3139.116 O. A. Koksharova and C. P. Wolk, Appl. Microbiol. Biotechnol.,
2002, 58, 123–137.117 K. Eppelmann, S. Doekel and M. A. Marahiel, J. Biol. Chem., 2001,
276, 34824–34831.118 B. Julien and S. Shah, Antimicrob. Agents Chemother., 2002, 46,
2772–2778.119 T. Stachelhaus, H. D. Mootz and M. A. Marahiel, Chem. Biol., 1999,
6, 493–505.
This journal is ª The Royal Society of Chemistry 2009
Dow
nloa
ded
by U
nive
rsity
of
Wis
cons
in -
Mad
ison
on
20/0
5/20
13 0
1:52
:54.
Pu
blis
hed
on 1
4 Se
ptem
ber
2009
on
http
://pu
bs.r
sc.o
rg |
doi:1
0.10
39/B
8170
74F
View Article Online
120 T. J. McQuade, A. D. Shallop, A. Sheoran, J. E. DelProposto,O. V. Tsodikov and S. Garneau-Tsodikova, Anal. Biochem., 2009,386, 244–250.
121 P. F. Long, W. C. Dunlap, C. N. Battershill and M. Jaspars,ChemBioChem, 2005, 6, 1760–1765.
122 E. W. Schmidt, J. T. Nelson, D. A. Rasko, S. Sudek, J. A. Eisen,M. G. Haygood and J. Ravel, Proc. Natl. Acad. Sci. U. S. A.,2005, 102, 7315–7320.
123 E. W. Schmidt, S. Sudek and M. G. Haygood, J. Nat. Prod., 2004,67, 1341–1345.
124 S. Sudek, M. G. Haygood, D. T. A. Youssef and E. W. Schmidt,Appl. Environ. Microbiol., 2006, 72, 4382–4387.
125 N. Ziemert, K. Ishida, P. Quillardet, C. Bouchier, C. Hertweck,N. T. de Marsac and E. Dittmann, Appl. Environ. Microbiol.,2008, 74, 1791–1797.
126 J. A. McIntosh, M. S. Donia and E. W. Schmidt, Nat. Prod. Rep.,2009, 26, 537–559.
127 P. J. Proteau, W. H. Gerwick, F. Garcia-Pichel and R. Castenholz,Experientia, 1993, 49, 825–829.
128 T. Soule, V. Stout, W. D. Swingley, J. C. Meeks and F. Garcia-Pichel, J. Bacteriol., 2007, 189, 4465–4472.
129 T. Soule, F. Garcia-Pichel and V. Stout, J. Bacteriol., 2009, 191,4639–4646.
130 E. P. Balskus and C. T. Walsh, J. Am. Chem. Soc., 2008, 130, 15260.131 C. M. Sorrels, P. J. Proteau and W. H. Gerwick, Appl. Environ.
Microbiol., 2009, 75, 4861–4869.132 S. Omura, H. Ikeda, J. Ishikawa, A. Hanamoto, C. Takahashi,
M. Shinose, Y. Takahashi, H. Horikawa, H. Nakazawa,
This journal is ª The Royal Society of Chemistry 2009
T. Osonoe, H. Kikuchi, T. Shiba, Y. Sakaki and M. Hattori, Proc.Natl. Acad. Sci. U. S. A., 2001, 98, 12215–12220.
133 M. E. Barrios-Llerena, A. M. Burja and P. C. Wright, J. Ind.Microbiol. Biotechnol., 2007, 34, 443–456.
134 S. Donadio, P. Monciardini and M. Sosio, Nat. Prod. Rep., 2007, 24,1073–1109.
135 S. Pasek, in Methods in Molecular Biology, ed. N. H. Bergman,Humana Press, Totowa, 2007, pp. 17–29.
136 S. Pasek, A. Bergeron, J. L. Risler, A. Louis, E. Ollivier andM. Raffinot, Genome Res., 2005, 15, 867–874.
137 R. D. Finn, J. Tate, J. Mistry, P. C. Coggill, S. J. Sammut,H. R. Hotz, G. Ceric, K. Forslund, S. R. Eddy,E. L. L. Sonnhammer and A. Bateman, Nucleic Acids Res., 2008,36, D281–D288.
138 R. Durbin, S. R. Eddy, A. Krogh and G. J. Mitchison, BiologicalSequence Analysis: Probabilistic Models of Proteins and NucleicAcids, Cambridge University Press, Cambridge, 1998.
139 D. W. Udwary, M. Merski and C. A. Townsend, J. Mol. Biol., 2002,323, 585–598.
140 T. Weber, C. Rausch, P. Lopez, I. Hoof, V. Gaykova, D. H. Husonand W. Wohlleben, J. Biotechnol., 2009, 140, 13–17.
141 G. L. Challis, J. Ravel and C. A. Townsend, Chem. Biol., 2000, 7,211–224.
142 I. M. Ehrenreich, J. B. Waterbury and E. A. Webb, Appl. Environ.Microbiol., 2005, 71, 7401–7413.
143 J. M. Winter, M. C. Moffitt, E. Zazopoulos, J. B. McAlpine,P. C. Dorrestein and B. S. Moore, J. Biol. Chem., 2007, 282,16362–16368.
Nat. Prod. Rep., 2009, 26, 1447–1465 | 1465