Mining cyanobacterial genomes for genes encoding complex biosynthetic pathways

REVIEW www.rsc.org/npr | Natural Product Reports

Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online / Journal Homepage / Table of Contents for this issue

Mining cyanobacterial genomes for genes encoding complex biosyntheticpathways†

John A. Kalaitzis, Federico M. Lauro and Brett A. Neilan*

Received 19th June 2009

First published as an Advance Article on the web 14th September 2009

DOI: 10.1039/b817074f

Covering: up to February 2009

This review describes genome mining of cyanobacteria for natural product discovery and biosynthesis

pathways. Also presented is an overview of the genetic basis of natural product biosynthesis in

cyanobacteria. It includes 143 references.

1 Introduction to cyanobacteria

2 Cyanobacteria-derived natural product diversity

3 Biosynthesis of cyanobacteria-derived natural prod-

ucts

4 Other characteristics of cyanobacterial natural

product biosynthesis

4.1 Epibiosynthetic tailoring

4.2 Natural product transporters

4.3 Transposition and recombination of biosynthesis

genes

4.4 Molecular regulation of toxin biosynthesis in cyano-

bacteria

5 Cloning and characterisation of biosynthesis gene

clusters in cyanobacteria

5.1 Natural product biosynthesis gene clusters from

brackish/freshwater strains

5.1.1 Macrocycles

5.1.2 Toxic alkaloids

5.2 Natural product biosynthesis gene clusters from

marine Lyngbya spp.

5.2.1 Barbamide and curacin A

5.2.2 Jamaicamides

5.2.3 Lyngbyatoxins

5.2.4 Hectochlorin

6 Genome mining

6.1 Introduction to genome mining

6.2 Genome mining techniques

6.3 Illustrative examples of genome mining in cyano-

bacteria

6.3.1 Patellamides

6.3.2 Trichamide

6.3.3 Microcyclamides

6.3.4 Scytonemin

7 Mining sequenced cyanobacterial genomes for

biosynthesis gene clusters

7.1 Our approach

7.2 Summary of our findings

School of Biotechnology and Biomolecular Sciences, The University of NewSouth Wales, Sydney, NSW, 2052, Australia. E-mail: [email protected]; Fax: +61 2 93851483; Tel: +61 2 93853235

† This article is part of a themed issue on genomics.

This journal is ª The Royal Society of Chemistry 2009

7.2.1 Microcystis aeruginosa NIES-843

7.2.2 Nostoc sp. PCC7120

7.2.3 Gloeobacter violaceus PCC7421

7.2.4 Nostoc punctiforme PCC73102

7.2.5 Anabaena variabilis ATCC29413

8 Discussion

9 Concluding remarks

10 Acknowledgements

11 References

1 Introduction to cyanobacteria

Cyanobacteria are among Earth’s oldest life forms and the

stromatolites are the fossilised evidence of cyanobacterial

metabolism.1 Stromatolites are layered deposits of carbonate,

either branching or dome-shaped, which have formed extensive

reef-like structures that date back to the Precambrian period,

the same time as the earliest evidence of atmospheric oxygen

was found.2 The ancestors of cyanobacteria are considered the

inventors of oxygenic photosynthesis, and today cyanobacteria

contribute, through the process of oxygenic photosynthesis, up

to 30% of Earth’s oxygen. Coupled with their photosynthetic

abilities, their roles as primary producers and nitrogen fixers

greatly influence Earth’s carbon and nitrogen cycles and thus the

prokaryotic photoautotrophic cyanobacteria are extremely

important (or essential) for all higher life.3–6

The cyanobacteria are broadly described as being photosyn-

thetic bacteria, containing chlorophyll a and accessory pigments.

They are facultative aerobes, Gram-negative, oxygenic, exist as

single cells, colonies or as filaments, and lack membrane-bound

organelles. Cyanobacterial cells may differentiate into a number

of metabolic or reproductive structures, and actively growing

samples may also possess a sheath that may be pigmented.7

Growth is typically seen in conditions of neutral to alkaline pH

and moderate levels of light and warmth,7 although a long

evolutionary history has allowed cyanobacteria to adapt to, and

inhabit, many extreme and diverse environments,8 such as

arid desert soils, thermal springs, rocks, plants, marine, brackish

and fresh waters, ice, plants and animals. The immense diversity

within this group of microorganisms, apart from the variability

of morphology and range of habitats, is also reflected in the

Nat. Prod. Rep., 2009, 26, 1447–1465 | 1447

http://dx.doi.org/10.1039/b817074f

http://pubs.rsc.org/en/journals/journal/NP

http://pubs.rsc.org/en/journals/journal/NP?issueid=NP026011

Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

extent of their natural production. Cyanobacteria have evolved

to produce a diverse array of secondary metabolites that have

aided species survival in these varied and highly competitive

ecological niches. Cyanobacteria are commonly associated with

the toxic blooms encountered in many eutrophic fresh and

brackish waters and are widely known for their potential to

produce a range of neurotoxic, hepatotoxic, and tumour

promoting-secondary metabolites.9

2 Cyanobacteria-derived natural product diversity

The structural diversity and biological activities of cyano-

bacteria-derived natural products (secondary metabolites) have

been reviewed in great depth elsewhere.10–13 Recent reviews

John A: Kalaitzis

John Kalaitzis obtained

a Bachelor of Applied Science

degree with First Class Honours

in Chemistry from the Univer-

sity of Western Sydney, and

a PhD from Griffith University

(Brisbane, Australia) where he

conducted natural products

research under the supervision of

Prof. R. J. Quinn. During his

postdoctoral studies with Prof.

B. S. Moore at the University of

Arizona and Scripps Institution

of Oceanography, he investi-

gated biosynthetic pathways in

marine actinomycetes and was introduced to genome mining. In

2007 he moved to the University of New South Wales as a Research

Fellow of the Environmental Microbiology Initiative, where he

works closely with Prof. Brett Neilan and Dr. Federico Lauro

exploring biosynthetic pathways in microorganisms from extreme

environments.

Federico M: Lauro

Federico M. Lauro received his

undergraduate education in

Molecular Biology and Micro-

biology from the University of

Padova, Italy, where he

obtained a ‘Laurea in Biologia’.

He then continued working as

a project scientist at the

Department of Microbiology

and Immunology of the Univer-

sity of Padova in the group of

Prof. Giulio Bertoloni. In 2000

he moved to San Diego, Cali-

fornia, to study at the Scripps

Institution of Oceanography. He

received his PhD in oceanography in 2007 characterizing genetic

adaptations to elevated hydrostatic pressure in deep-sea bacteria.

The same year he joined the Environmental Microbiology Initiative

at the University of New South Wales. His research is focussed on

developing bioinformatics approaches to characterize microbial

genomes in the context of evolution, biochemical adaptations, and

ecology.

1448 | Nat. Prod. Rep., 2009, 26, 1447–1465

covering specific topics such as ‘‘Biogenetic diversity of cyano-

bacterial metabolites’’,14 and ‘‘Bioactive natural products from

marine cyanobacteria for drug discovery’’15 should be consulted

for a complete overview of the chemistry of cyanobacteria-

derived natural product discovery. Natural products isolated

specifically from marine cyanobacteria have been reviewed

annually in reports in this journal by Faulkner between 198416

and 2002,17 and by Blunt and co-authors from 200318

onwards.19

As the focus here is biosynthesis and genome mining, it is out

of the scope of this review to detail cyanobacteria-derived natural

products, though it is appropriate to highlight some aspects of

cyanobacterial products by way of a brief overview.

The majority of cyanobacterial natural products fall into

the nitrogen-containing lipopeptide, cyclic peptide and

alkaloid classes. Polyketides, terpenoids, and hybrids of all the

aforementioned classes are also produced by cyanobacteria,

and in general it is these unusual hybrid molecules that

are often the focus of biosynthetic studies. Due to the

immense chemical diversity of cyanobacterial products, further

classifications have been proposed based on specific structural

variants.

In general, bacteria are considered prolific producers of

novel and bioactive chemical entities and a primary source

of compounds with potential therapeutic benefits such as

antibiotics.20,21 Cyanobacteria, though, are better known as

producers of highly toxic compounds (cyanotoxins), and

the majority are commonly grouped according to their physi-

ological effects as either cytotoxins (e.g. cryptophycins, dolas-

tatins, symplostatins), neurotoxins (e.g. anatoxins, saxitoxins),

hepatotoxins (e.g. microcystins, nodularins), or as irritants and

gastrointestinal toxins (e.g. aplysiatoxins and lyngbyatoxin).

Some of these are discussed in more detail in the following

sections.22,23

Brett A: Neilan

Brett Neilan is a molecular

biologist and an expert in the

study of toxic cyanobacteria. He

obtained a Bachelor of Applied

Science degree in Biomedical

Science (1985) at the University

of Technology, Sydney, and then

worked as a medical researcher,

hospital scientist and forensic

biologist. He obtained his PhD

in microbial and molecular

biology from UNSW in 1995

and was awarded an Alexander

von Humboldt Fellowship, which

allowed him to conduct post-

doctoral studies (Berlin) on non-ribosomal peptide biosynthesis

genetics. The continuation of this early work has become the basis

for current studies regarding the search for microbial natural

products in novel environments, including Antarctica, the hyper-

saline lagoon of Shark Bay, Western Australia, and Indonesian

volcanoes. In 2008 he was awarded a prestigious Australian

Research Council Federation Fellowship.



Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

3 Biosynthesis of cyanobacteria-derived naturalproducts

To date, biosynthetic studies of cyanobacterial natural products

have largely focussed upon those compounds that are toxic to

humans and those proposed to be derived from complex

biosynthetic pathways that give rise to their structurally unique

chemical structures. While novel compounds continue to be

reported from various cyanobacteria, reports of biosynthetic

studies are not as forthcoming, due mainly to technical difficul-

ties associated with working with such organisms.

Historically, complex biosynthetic pathways in cyanobacteria

were elucidated by feeding isotopically labelled precursors to

laboratory cultured micro-organisms. While feeding studies do

not always allow for the entire biosynthetic route to be elucidated

or proved conclusively, they were considered standard by those

undertaking biosynthesis-targeted research. One of the earliest

reported (1971) examples of a biosynthesis feeding study

undertaken in cyanobacteria was by Botham and Pennock, who

investigated the biosynthesis of tocopherols in Anabaena variabilis

using radiolabelled precursors.24 The use of NMR spectrometry as

the preferred analytical tool to assist in deciphering biosyntheses of13C-labelled cyanobacteria derived metabolites was pioneered by

Moore and co-workers in the early 1990s when investigating the

biosyntheses of anatoxin A 1 in Anabaena flos-aquae NRC525-17

(Scheme 1) and the [7.7]paracyclophanes nostocyclophane D 2 and

cylindrocyclophane D 3 in Nostoc linkia and Cylindrospermum

licheniforme, respectively (Scheme 2).25,26

Nowadays, molecular genetic techniques coupled with bio-

informatics analyses have greatly aided the elucidation of natural

product biosynthesis pathways and dramatically influenced the

way research is conducted. These techniques along with isotope

feeding studies have allowed not only the elucidation of novel

pathways but also the discovery of novel products of orphan

pathways.27

Before proceeding to a discussion of the modern-day tech-

niques used to elaborate biosynthesis pathways, a brief

Scheme 1

Scheme 2


introduction to the genetic machinery encoding complex

biosynthetic pathways is necessary. This review will largely focus

on pathways derived from thiotemplate modular systems such as

nonribosomal peptide synthetases (NRPS) and polyketide

synthases (PKS), and will not include in-depth discussion of

other biosynthetic pathways.

Nonribosomal peptides, polyketides and hybrids thereof, are

biosynthesised by multifunctional enzyme complexes that

sequentially assemble small carboxylate and amino acid derived

precursor building blocks into their products in an assembly-line-

like fashion.28 These megasynthases generally follow a co-line-

arity rule whereby chemical structures for unknown natural

products can be predicted. Both NRPSs, and PKSs share similar

architectures with their respective modules containing

a minimum of three enzyme domains. NRPS modules contain an

ATP-dependent adenylation (A) domain which activates

a specific or preferred amino acid, a peptidyl carrier protein

(PCP) for tethering substrates during the assembly, and

a condensation (C) domain which catalyses the formation of

amide bonds between PCP-bound substrates. Likewise, the PKS

modules contain an acyl transferase (AT) which selects

a preferred acyl-CoA thioester substrate, acyl carrier protein

(ACP) and a ketosynthase (KS) which catalyses the condensation

of two ACP-bound substrates. In both systems, each module

typically extends the backbone of the molecule by one unit, and

often these modules also contain a number of additional catalytic

domains that function to tailor the assembling molecule and

therefore generate further structural diversity. The assembled

molecule is ultimately released from the enzyme complex, usually

by a thioesterase (TE) domain, which may also function to direct

cyclisation of the final product.29–31

PKSs increase the diversity of NRPS products, and vice versa

when combined in hybrid NRPS/PKS systems such as those

assembling the cyanobacterial toxins microcystin and nodularin.32,33

Hybrid PKS/NRPSs assembling lipopeptides incorporate short

carboxylic acids into the peptidyl compound by transfer and

subsequent condensation of the b-hydroxy acyl group to the

NRPS-bound amino acid.34 Alternatively, a b-amino fatty acid

can be converted to an ACP-bound b-amino acid via an amino-

transferase (AMT)-domain as in microcystin synthesis.33 Specific

examples of hybrid systems are discussed in more detail below.

Recent reviews by Walsh and co-workers should be consulted for

an in-depth discussion of the assembly of NRPS and PKS-derived

products.28,31,35

4 Other characteristics of cyanobacterial naturalproduct biosynthesis

4.1 Epibiosynthetic tailoring

Apart from the assembly of the carbon backbone, further

structural and functional diversity observed in cyanobacterial

secondary metabolites is largely due to pre- or post-synthetic

modifications introduced by tailoring enzymes.

As in other orders of microorganisms, common tailoring

enzymes include oxygenases, ketoreductases, group transferases

such as glycosyl transferases and methyl transferases, cyclases,

and halogenases.36 Such enzymes may be embedded within PKS/

NRPS modules and act in cis during chain elongation, or may

Nat. Prod. Rep., 2009, 26, 1447–1465 | 1449


Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

alternatively function as separate subunits or stand-alone

enzymes and act in trans. The following is a brief overview of

some important tailoring enzymes involved in complex bio-

synthesis pathways of cyanobacterial products.

Oxidoreductase reactions are among the most commonly

observed tailoring reactions in NRPS and PKS biosynthesis

pathways. A diverse range of enzymes catalyse these reactions

and include oxidases, oxygenases, peroxidases, reductases and

dehydrogenases. Oxidoreductase reactions can have a dramatic

impact on the stereo-electronic and physical properties of

molecules. They generate or remove chiral centres, introduce

highly reactive functional groups, and interconvert H-bond

donor/acceptor sites.36 Monoxygenases typically generate

hydroxy groups and epoxides by transferring one oxygen atom,

while dioxygenases usually transfer both atoms of molecular

oxygen. Ketoreductases are commonly integrated within the

modules of PKSs; however, several stand-alone enzymes have

also been identified. While post-PKS ketoreductases are quite

rare, they are excellent candidates for combinatorial biosynthesis

of novel compounds.36

D-Amino acids are a common feature of many non-ribosomal

peptides. These non-proteinogenic building blocks are important

for creating structural diversity, providing resistance to

proteolysis, and in some cases imposing stereochemical

constraints.37,38 Occasionally, stand-alone enzymes generate the

D-amino acids in non-ribosomal pathways; however, they are

often generated from L-amino acids that are epimerised during

the peptide elongation process by an embedded epimerisation (E)

domain. Such domains are approximately 50 kDa in size and are

located adjacent to A-domains at the C-terminal end of NRPS

modules. While E-domains are capable of producing a mixture of

D- and L-isomers, there is evidence that C-domains immediately

downstream of epimerases are D-specific.39 The conversion of

L-amino acids to D-isomers is also achieved by stand-alone

enzymes, as in cyclosporin production, and are alternatively

referred to as racemases. A notable example in cyanobacteria are

racemases in the microcystin biosynthetic pathway.40

A variety of non-ribosomally produced cyanobacterial

compounds contain methylated amino acid residues. Methylation

can greatly alter the structure and biological activity of secondary

metabolites by increasing molecular lipophilicity, and influencing

stereochemistry and introduce chiral centres.36 N-, C- and

O-methyltransferases are all cofactor-dependent, meaning they

Scheme

1450 | Nat. Prod. Rep., 2009, 26, 1447–1465

require the presence of a methyl donor group for catalysis. The

requisite cofactor usually takes the form of the highly reactive

sulfonium ion, S-adenosyl methionine (SAM). Methyltransferases

associated with cyanobacteria secondary metabolite pathways are

encoded within the modules of PKSs and NRPS genes. Methylation

(Scheme 3) is important to the toxicity of the cyanobacterial hep-

atotoxins microcystin 4 (the LR-form is shown) and nodularin 5.

The O-methyl transferase (OMT)-domain is thought to be involved

in the transfer of a methyl group to the hydroxyl group on the

unusual amino acid 3-amino-9-methoxy-2,6,8-trimethyl-10-phenyl-

4,6-decadienoic acid (Adda).33,41,42 Previous studies have shown that

natural microcystin variants lacking the Adda O-methylation have

reduced inhibition of protein phosphatases and hence lower

toxicity.43 Furthermore, the recent disruption of the mcyJ O-methyl

transferase in P. aghardii resulted in the production of a demethy-

lated microcystin variant with similarly reduced toxicity.41 Further

modification of an N-methylated residue in microcystin also occurs

in trans, via dehydration, to incorporate N-methyl-dehydroalanine

from the L-Ser substrate.

Glycosylation is also a common feature of many NRPS and

PKS products. The addition of glycosyl residues to the aglycones

of PKs can dramatically influence their biological activity and is

critically important to the biological activity of many drugs in

clinical use such as the glycopeptide and macrolide anti-

biotics.44,45 Glycosyl transferases are relatively rare in cyano-

bacterial NRPS and PKS pathways, and there are very few

examples of glycosylated cyanobacteria-derived natural products

in the literature.

4.2 Natural product transporters

ABC transporters represent one of the largest, most highly

conserved protein superfamilies in existence.46 These proteins,

found in bacteria, eukaryotes and archaea,47 are responsible for

the ATP-dependent transport of a vast range of molecules and

substrates (allocrites) across intracellular and cell surface bio-

logical membranes.48 ABC transporter genes are encoded within

many NRPS and PKS biosynthesis gene clusters, including those

found in cyanobacterial genomes. Few of the bacterial secondary

metabolite transporters have been functionally characterised;

however, they have been shown to confer self-resistance to the

producing organisms.49,50

3



Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

The superfamily also shares many common structural features,

including a highly conserved ABC-ATPase and at least one

cognate, but much less conserved, membrane domain. Most

ABC-ATPases have an approximate molecular mass of 27 kDa

and share an overall amino acid sequence identity in excess of

30%.51 This identity is concentrated in several key regions,

making ABC-ATPases readily recognisable by at least two

unique, highly conserved motifs, the Walker A site and the

hydrophobic Walker B sites.52 Whilst the prokaryotic import

systems do possess a short conserved ‘‘EAA’’ motif (EAA-G-I-

LP) in their cytoplasmic loop,53 the exporters display no signifi-

cant sequence conservation.54 The most distinguishing feature of

the ABC transporter phylogenetic tree is its division into two

major branches representing the export (ABC-A) and import

(ABC-B) systems.46 Phylogenetic analysis reveals a tendency for

transporters in both import and export subdivisions to cluster

according to allocrite specificity. Thus by simply knowing the

primary structure (peptide sequence) of a newly discovered ABC

protein, it is possible to predict the type of allocrite specified by

the system, and the direction of allocrite transport.

ABC transporters of modified cyclic peptides such as micro-

cystin and nodularin phylogenetically cluster with peroxisomal

membrane proteins (PMPs).55 While this phylogenetic relation-

ship may seem unusual, the long-chain fatty acid substrates of

PMPs, are structurally similar to polyketides, and are produced

in an analogous fashion by modular synthase enzymes. This

suggests that transporters of NRPs, PKs, and structurally similar

compounds are evolutionarily and functionally related; however,

further characterization of such transporters is required.

4.3 Transposition and recombination of biosynthesis genes

Putative transposases have also been identified in association

with secondary metabolite pathways and are adjacent to the

microcystin synthetase gene clusters (Fig. 1) in Microcystis

Fig. 1 Microcystin biosynthesis clusters in cyanobacteria.


aeruginosa PCC7806,33 Planktothrix agardhii CYA126/8

(Genbank accession AJ441056) and Anabaena sp. 90

(AY212249). This suggests that transposition may have been

involved in transfer of microcystin synthetase genes between the

various microcystin-producing cyanobacterial genera. The

presence of a truncated, and hence inactive, putative transposase

downstream of the related nodularin synthetase supports this

hypothesis, as only Nodularia spumigena NSOR10 is capable of

producing nodularin.32 The identification of short sequences

within a Microcystis aeruginosa cyanophage, with identity to

regions flanking the putative microcystin synthetase associated

transposases, also supports this theory and suggests a mechanism

for transposition facilitated by transduction.56 The variable

architecture of the microcystin gene clusters in different cyano-

bacterial genera may also be a result of transposition-mediated

gene rearrangement events. Consequently, these complex hybrid

polyketide/nonribosomal peptides serve as elegant examples of

natural combinatorial biosynthesis that have evolved as

a competitive advantage for the cyanobacteria.57

Sequencing of the Synechocystis sp. PCC6803,58 Nostoc

punctiforme ATCC29133 (also known as N. punctiforme

PCC73102),59 Thermosynechococcus elongatus BP-160 and

Microcystis aeruginosa NIES-84361 genomes, amongst others,

has revealed an abundance of transposases, insertion sequences

and short sequence repeats that may have contributed to the

rapid evolution of these species. Insertion sequences were also

found to disrupt the microcystin synthetase gene cluster in

various Planktothrix species, resulting in strains that were

deficient in toxin production.62

The characterisation of microcystin biosynthesis gene cluster

(mcy) in the genomes of M. aeruginosa, P. agardhii, and

Anabaena sp. has enabled the study of the origins and evolution

of hepatotoxin biosynthesis in cyanobacteria. Identification of

transposases associated with the mcy and nodularin biosynthesis

gene clusters (nda) and subsequent phylogenetic analysis has led

to the theory that horizontal gene transfer and recombination

events are responsible for the sporadic distribution of the mcy

gene cluster throughout the cyanobacteria and the various

microcystin isoforms that have been identified to date.33,63,64

Recent genetic studies suggest that the nda cluster evolved from

the mcy cluster through the deletion of two NRPS modules.32,63,65

Likewise, chemical structure differences between the nosto-

cyclopeptides and the nostopeptolides are predictable based on

comparisons of the architectures of their respective biosynthesis

gene clusters (ncp and nos) in Nostoc spp.66,67 The absence of

a PKS module, a dimodular NRPS, and a TE domain in the

ncp cluster, as compared with the nos cluster, is suggestive of

a genetic rearrangement resulting in two distinct biosynthesis

pathways.

4.4 Molecular regulation of toxin biosynthesis in cyanobacteria

While most toxin regulation studies have focused on direct

measurements of cellular toxin, the description of the mcy gene

cluster33 enabled a closer examination of microcystin regulation

at the molecular level. Transcription of the mcy genes occurs via

two polycistronic operons from a central bi-directional promoter

between mcyA and mcyD. Alternate transcriptional start sites

were identified for both operons when cells were cultured under

Nat. Prod. Rep., 2009, 26, 1447–1465 | 1451


Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

high or low light intensities.68 Kaebernick et al.69 also proposed

that microcystin is constitutively produced under low and

medium light intensities, and exported when a higher threshold

intensity is reached. The recently identified ABC transporter

McyH was later proposed to be responsible for this export.55

Comparative analysis of the proteomes of the wild-type and the

mutant cultures of toxic Microcystis resulted in the identification

of a protein, MrpA, which was expressed only in the wild-type.70

The genomic locus encoding this protein is homologous to RhiA

and RhiB from Rhizobium leguminosarum, which are regulated

via a quorum-sensing mechanism and are up-regulated in

response to blue light.

As observed for the mcy cluster, the nda gene cluster is

transcriptionally regulated by a bidirectional promoter region.

Analysis of transcription of the nda cluster revealed that it is

transcribed as two polycistronic mRNA, ndaAB/ORF1/ORF2,

and ndaC.71 The two genes downstream of ndaAB, ORF1 and

ORF2 encode a putative transposase and a putative high light-

inducible chlorophyll-binding protein homolog, respectively,

however, it is not clear why these proteins are also co-tran-

scribed with the nda gene cluster. ORF2 has been identified in

all strains of toxic Nodularia, and the association between

ORF2 and nodularin biosynthesis may suggest a physiological

function associated with high-light stress. A putative heat shock

repressor protein, encoded by the gene ORF3, was also identi-

fied downstream of ORF2, which may be involved in the

transcriptional regulation of the nda genes in response to heat

stress.32

A genomic region adjacent to the sxt gene cluster in C. raci-

borskii T3 was identified and characterised, putatively encoding

a regulatory two-component system. This system appears to be

involved in the sensing of environmental signals, in particular

depleted phosphate, while activating the transcription of genes

involved in its uptake and transport.72

Table 1 Selected cyanobacteria-derived biosynthesis gene clusters

Gene cluster Source organism

Barbamide (bar) Lyngbya majuscula 19LCryptophycin (crp) Nostoc sp. ATCC53789Cylindrospermopsin (cyr) Cylindrospermopsis raciborskii AWT20Curacin (cur) Lyngbya majuscula 19LHectochlorin (hct) Lyngbya majuscula JHBJamaicamide (jam) Lyngbya majuscula JHBMicrocystin (mcy) Microcystis aeruginosa PCC7806Nodularin (nda) Nodularia. spumigena NSOR10Nostopeptolide (nos) Nostoc sp. GSV224Aeruginosin (aer) Planktothrix agardhii CYA126/8Anabaenopeptilide (apd) Anabaena sp 90Anatoxin A (ana) Oscillatoria PCC 6506Cyanopeptolin (mcn) Microcystis N–C 172/5Nostocyclopeptide (ncp) Nostoc sp. ATCC53789Lyngbyatoxin (ltx) Lyngbya majusculaSaxitoxin (sxt) Cylindrospermopsis raciborskii T3Scytonemin Nostoc punctiforme ATCC29133Microcyclamide (mca) Microcystis aeruginosa NIES298

and PCC7806Microviridin (mdn/mvd) Microcystis aeruginosa NIES298

and P. agardhii CYA126/8Patellamide (pat) Prochloron didemniTrichamide (tri) Trichodesmium erythraeum ISM101Trunkamide (tru) Prochloron sp.

1452 | Nat. Prod. Rep., 2009, 26, 1447–1465

5 Cloning and characterisation of biosynthesis geneclusters in cyanobacteria

The number of characterised biosynthesis gene clusters from

cyanobacteria is limited compared with the numbers reported

from other microorganisms such as the streptomycetes. This is

more a reflection of the number of researchers pursuing such

clusters rather than a lack of interest in the natural products

derived from them. The following sections serve to highlight

some cyanobacteria-derived clusters and the unique biosynthesis

pathways in operation. These examples are included to illustrate

features of biosynthesis gene clusters useful for genome mining.

Other notable cyanobacterial biosynthesis gene clusters not

detailed in this review are presented in Table 1. These include the

cryptophycin (crp),73,74 aeruginosin (aer),75,76 anabaenapeptilide

(apd),77 anatoxin A (ana),78 cyanopeptolin (mcn),79,80 micro-

viridin (mdn/mvd)81,82 and trunkamide (tru)83 biosynthesis gene

clusters.

5.1 Natural product biosynthesis gene clusters from brackish/

freshwater strains

5.1.1 Macrocycles. As highlighted in section 4.3, these complex

molecules serve as elegant examples of natural combinatorial

biosynthesis. The gene clusters encoding their biosynthesis are

associated with prototypical thiotemplate modular systems, and

provide the basis for our genome mining analyses in section 7.2.

Microcystins. The microcystins are potent inhibitors of

eukaryotic protein phosphatases 1 and 2A. The biosynthetic gene

cluster of these cyclic heptapeptides was first characterised in

Microcystis aeruginosa PCC7806.33,84,85 Prior feeding studies

revealed the origin of the carbons in the unusual (2S,3S,8S,9S)-3-

amino-9-methoxy-2,6,8-trimethyl-10-phenyl-4,6-decadienoic acid

Approx. size (kb) Biosynthetic origin

26 Polyketide/peptide40 Polyketide/peptide

5 43 Polyketide/peptide64 Polyketide/peptide38 Polyketide/peptide58 Polyketide/peptide55 Polyketide/peptide48 Polyketide/peptide40 Polyketide/peptide34 Peptide28 Peptide29 Polyketide30 Peptide33 Peptide11 Peptide/terpenoid35 Polyketide/amino acid28 Shikimate11 Ribosomal peptide

7 Ribosomal peptide

11 Ribosomal peptide13 Ribosomal peptide11 Ribosomal peptide/terpenoid



Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

(Adda) and (2R,3S)-3-methylaspartic acid (Masp) residues.

Feeding studies confirmed that Adda was derived from

L-phenylalanine, and suggested phenylacetate as the primer;86,87

however, recent studies have revealed that it is in fact derived

from phenyllactate.88 The microcystin biosynthesis gene cluster

(mcy) spans 55 kb and is composed of ten bidirectionally

transcribed ORFS. The PKS McyG contains an N-terminal

adenylation-peptidyl carrier protein loading didomain that is

predicted to load the starter unit, which then undergoes

subsequent malonate and amino acid extensions (encoded by

mcyGJDEABCL). The C-terminal TE is suspected to facilitate

hydrolysis and cyclisation to yield the final product. Tailoring

enzymes associated with the cluster include an O-methyl-

transferase (McyJ) and a stand-alone dehydratase (McyI).

Another tailoring enzyme of interest is McyF which showed

greatest similarity to an aspartate racemase.40

Nodularin. The cyclic pentapeptide nodularin 5 is assembled,

as expected, in a manner similar to the structurally related

heptapeptide microcystin. The nodularin synthetase gene cluster

in Nodularia spumigena NSOR10 spans 48 kb and consists of

nine ORFS. The nodularin biosynthesis gene cluster (nda)

encodes proteins more or less co-linearly with their respective

catalytic function in the assembly of nodularin.32 Interestingly,

the nda cluster encodes two NRPS domains modules that

activate amino acids (D-Ala and L-Leu) that are not present in

the final structure. Cases such as these need to be considered

when predicting compound structures from gene sequence data

in any genome mining project. The deletion of the two NRPS

modules raises many questions regarding recombination and

shuffling of genes and also the transposition of gene clusters

between organisms. NdaI encodes a protein most similar to

the ABC transporters McyH and NosG associated with

microcystin and nostopeptolide biosynthesis.32 Conserved

encoding sequences such as these could be targeted with the aim

of identifying biosynthesis gene clusters of potentially toxic

molecules.

Nostopeptolides. The cryptophycin-producing74 strain Nostoc

sp. GSV224 also produces other cyclic peptide polyketide

hybrid natural products known as nostopeptolide A1 6 and A2

7. The nos gene cluster includes eight ORFs, spans 40 kb, and

contains most genes required for biosynthesis and transport.

The domain organisation is co-linear with the proposed order

of biosynthetic assembly, with nosA encoding a tetramodular

NRPS, nosC a trimodular NRPS, and nosD a dimodular

NRPS. Located between nosA and nosC is nosB, which encodes

a single PKS module.67 A putative thioesterase is located at the

C-terminal of NosD. NosA3, within the third NRPS module

encoded by nosA, is proposed to adenylate the rare non-

proteinogenic amino acid residue L-4-methylproline.89 Sequence

analysis of the NosA3 adenylation domain would suggest on

first glance that it activated proline; however, a single amino

acid difference in its substrate-binding pocket relative to that of

NosD2 (which adenylates proline), together with the chemical

structure of the nostopeptolides, supported the characterisation

of NosA3. NosE and nosF encode enzymes involved in the

biosynthesis of L-4-methylproline from L-leucine. Other ORFs

associated with the cluster include orf5, located between nosD


and nosE and coding for a 265 amino acid protein of unknown

function, and nosG, which encodes an ATP binding cassette

(ABC) transporter.67

Nostocyclopeptides. The 33 kb nostocyclopeptide (ncp)

biosynthesis gene cluster in Nostoc sp. ATCC53789 has been

sequenced and characterised.66 Like many cyanobacterial NRPS-

derived natural products, the cluster is co-linear with the

proposed order of nostocyclopeptide A1 8, A2 9, and A3 10

assembly. The cluster encodes two proteins NcpA and NcpB

containing three and four NRPS modules, respectively. Like the

nos cluster, genes encoding for L-4-methylproline biosynthesis

and transport enzymes are also present. The cluster architecture

mirrors that of the nos synthetase and interestingly it encodes

a 265 amino acid protein, NcpC, of unknown function. In both

the ncp and nos clusters, the encoding gene (ncpC and orf5) is

located between NRPS modules and L-4-methylproline biosyn-

thesis genes.66 A recent blast search has shed no further light on its

role, if any, in peptide biosynthesis. The most striking feature of

this cluster is the encoded reductase domain at the C-terminal end

of NcpB, which is responsible for the reductive offloading of the

peptide.90 This reductase facilitates both the release and cyclisa-

tion of the linear peptide to the unusual imine-linked macrocyclic

peptide product. NcpB contains a putative NAD(P)H binding

domain, suggesting that the offloading of the peptide is reductive

in nature, therefore generating a linear peptide with a terminal

aldehyde. The aldehyde is then captured intramolecularly with the

amino group of the N-terminal tyrosine to form a stable imine

bond. Identification of similar reductase domains through

genome mining could ultimately lead to the characterisation of

other clusters encoding for imino-linked macrocycles, thus

providing further tools to assist in the complete characterisation

of this unusual biosynthetic pathway. A recent review by Kopp

and Marahiel describes in great depth ‘Macrocylization strategies

in polyketide and nonribosomal peptide biosynthesis’.91

5.1.2 Toxic alkaloids. The biosynthesis gene clusters of these

toxic alkaloids code for the incorporation of unusual substrates,

rare pathways, and numerous tailoring reactions. Also of

interest in genome mining terms are the clustered genes

encoding toxin efflux and transport proteins. These could prove

useful targets for locating biosynthesis clusters encoding such

toxins.

Nat. Prod. Rep., 2009, 26, 1447–1465 | 1453


Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

Saxitoxins. The somewhat cryptic saxitoxin 11 biosynthesis

gene cluster (sxt) in Cylindrospermopsis raciborskii T3 has been

sequenced and characterised.72 Close examination of the cluster

has shed light on the evolution of paralytic shellfish poisoning

(PSP) toxins and has enabled an early warning system for the

detection of potentially fatal paralytic shellfish toxin (PST)

producing algal blooms. The 35 kb cluster contains genes

encoding for enzymes that incorporate the anticipated (based on

feeding studies) substrates arginine, acetate, and a SAM derived

methyl group.92 Salient features of the cluster include a rare

N-acetyl transferase like (GNAT)93 enzyme encoded by sxtA2,

which catalyses the transfer of acetate from its CoA to the thiol

on the ACP (S-acetyl transfer) in a similar manner to the curacin

biosynthesis enzyme CurA. The sxt gene cluster also encodes an

aminotransferase and an amidinotransferase, which facilitate the

incorporation of arginine and an arginine-derived amidino

group, respectively.72 For the purposes of an analysis based on

genome mining, the tailoring enzymes such as the less common

sulfotransferase and carbomoyl transferase encoded by sxtI and

sxtN respectively are notable. The extremely potent saxitoxin is

exported from the cell in C. raciborskii T3, and the sxtM and sxtF

gene products are proposed to play a role in that function. SxtM

and SxtF have high similarities to sodium-driven multidrug and

toxic compound extrusion (MATE) proteins, and in this

cyanobacterium are the likely saxitoxin transporters.72

Cylindrospermopsin. A biosynthetic pathway for cylin-

drospermopsin 12 was proposed based on a feeding study con-

ducted by Moore and co-workers in Cylindrospermopsis

raciborskii in which guanidinoacetate was proposed as the PKS

starter unit.94 This rare starter is generated from glycine and

a guanidine moiety likely derived from arginine. The 43 kb

cylindrospermopsin (cyr) gene cluster in C. raciborskii AWT205

harbours a centrally located gene (cyrA) whose product is most

similar to the human arginine:glycine amidinotransferase.95 This

observation supported the feeding experiment results, which

suggested that the uracil ring in cylindrospermopsin was not

derived from primary metabolism but rather synthesized during

product assembly. The majority of the cluster encodes NRPS and

PKS modules responsible for the formation of the carbon

skeleton, and genes associated with the uracil ring formation,

were also identified. Genes encoding tailoring enzymes in the

1454 | Nat. Prod. Rep., 2009, 26, 1447–1465

pathway include cyrJ, which encodes a 30-phosphoadenyl sulfate

(PAPS)-dependent sulfotransferase-like protein, and the associ-

ated cyrN, which codes for an adenylsulfate kinase protein

(CyrN) that catalyses the formation of PAPS. Aside from

biosynthesis genes, the cluster also encodes proteins involved in

the regulation and export of the toxin. As in the case of saxitoxin,

a sodium-driven multidrug efflux pump type protein (CyrK) is

proposed to be responsible for cylindrospermopsin transport.95

5.2 Natural product biosynthesis gene clusters from marine

Lyngbya spp.

Strains of Lyngbya majuscula are noted producers of chemically

diverse and highly unique metabolites possessing broad ranges of

biological activities.96 The biosyntheses of L. majuscula natural

products are keenly pursued due to their rarely encountered

chemical features. To date the biosynthetic gene clusters of

several Lyngbya-derived metabolites have been functionally

characterised. These examples illustrate the diversity of biosyn-

thesis pathways in operation in different strains of the same

species.

5.2.1 Barbamide and curacin A. The cyanobacterium

L. majuscula strain 19L collected from Curacao produces,

amongst other molecules, both barbamide 13 and curacin A 14.

From a biosynthesis point of view, the mollusicidal chlorinated

lipopeptide barbamide is of great interest due mainly to its

unusual 5,5,5-trichloroleucine-derived moiety.97 The barbamide

(bar) biosynthetic gene cluster was the first reported PKS/NRPS

from a marine cyanobacterium.97,98 Precursor incorporation

studies revealed that barbamide A is derived from the amino

acids L-leucine, L-phenylacetate and L-cysteine, acetate, and two

SAM-derived methyl groups.97,99 Sequence analysis of the

biosynthetically co-linear 12 ORF, 26 kb gene cluster revealed

the presence of two genes, barB1 and barB2, whose products

catalyse the unprecedented chlorination of the pro-R methyl

of leucine to form the intermediate trichloroleucine.98,99 As

exemplified by barbamide, a common characteristic of many

L. majuscula metabolites – and indeed other cyanobacterial

natural products – is that they are halogenated, and these provide

the inspiration for discovery of novel, naturally occurring,

halogenation reactions. The usefulness of characterising bio-

synthesis gene clusters was further exemplified by the use of bar gene

probes to provide direct evidence that barbaleucamide B 15, which

was originally reported from a Philippine marine sponge Dysidea

sp., is of cyanobacterial origin.100 This supported the long-held

notion that symbiont cyanobacteria are the true producers of many

natural products reported to be sponge-derived.

Curacin A 14 is a potent antitubulin natural product derived

from cysteine, ten acetates and two SAM-derived methyls. Its

biosynthetic gene cluster, like that of barbamide, is a hybrid



Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

NRPS/PKS that is highly co-linear with the predicted biochem-

ical steps required for its biosynthesis. The curacin A (cur) gene

cluster spans 64 kb over 14 ORFs and harbours an interesting

terminating motif that is predicted to facilitate product release

from the megasynthetase and catalyse a dehydrative decarboxy-

lation.101 The mechanism of this unique decarboxylation was

deduced using deuterium labelling experiments and NMR spec-

trometry. Another striking feature of curacin A biosynthesis is

the incorporation of a unit derived from 3-hydroxy-3-methyl-

glutaryl-CoA (HMG-CoA). The HMG-CoA synthase plays

a role in the formation of the rare cyclopropyl ring, which is

derived from an isopentenyl-ACP intermediate.102,103 This rare

pathway is encoded for by curA–curF.93,101–103 The protein CurF

also contains the only NRPS domains associated (C, A, PCP)

with the biosynthesis of curacin A, and downstream of curF are

seven PKS modules responsible for seven rounds of condensa-

tion with malonyl-CoA extenders.101 CurM contains a sulfo-

transferase domain, and a TE domain at the C-terminus of the

final PKS monomodule. The role of the sulfotransferase has not

been completely elucidated nor has the product of the proposed

hydrolase curN. It is suspected that CurN plays a role in both

product release and dehydrative decarboxylation, leading to the

formation of the terminal olefin.101

5.2.2 Jamaicamides. The cytotoxic and neurotoxic jamai-

camides A–C were isolated from a culture of L. majuscula JHB.

The rare alkynyl bromide and vinyl chloride groups along with

a pyrrolinone ring warranted an in-depth study of the bio-

synthesis of this family of molecules. A biosynthesis pathway to

jamaicamide A 16 was proposed using 13C-labelled acetates,

alanines, and SAM, and incorporation was measured using

NMR.104 These precursors accounted for the all carbon atoms in

the molecule. The methyl pyrrolinone was proposed to be

derived from Claisen-like condensation and cyclisation of an

acetate and alanine.104 The halogenation pathways are still

unclear. The jam gene cluster was identified using HMG-CoA

synthase based probes to screen PKS-containing fosmids by

Southern hybridisation. From three fosmids, a 70 kb region of

L. majuscula JHB was sequenced, 58 kb of which were assigned


to the jam cluster. The cluster is comprised of 17 ORFs, arranged

in a highly co-linear fashion with respect to its proposed

assembly.104 The uniqueness of the cluster centres around the

mixed nature of this NRPS/PKS. The megasynthase encodes two

switch points between NRPS and PKS segments and another

two ‘reverse’ switch points between NRPS and PKS segments.

No real clues as to the nature of cryptic halogenation pathways

in operation were revealed by sequencing the cluster. The size of

the jam cluster complicates its heterologous expression, and thus

the authors could not unequivocally prove that all of the

components required for jamaicamide biosynthesis were con-

tained within the sequenced cluster.104 Nevertheless, the highly

ordered nature of the jam cluster allows strong correlations

between genetic and structural features to be made, and it is these

direct correlations that form the basis of structure prediction

from genome mining.

5.2.3 Lyngbyatoxins. The lyngbyatoxins A–C 17–19 (Scheme

4) represent yet another structural class of toxic metabolites

biosynthesised by L. majuscula. Though their assembly is less

cryptic than other Lyngbya-derived metabolites, the 11.3 kb

lyngbyatoxin biosynthesis gene cluster (ltx) consisting of four

ORFs (ltxA–ltxB) all transcribed in the same direction, harbours

a novel aromatic prenyl transferase encoded by ltxC.105 As could

be predicted from the structure of the molecule, the cluster

encodes an NRPS (ltxA) responsible for the assembly of the

N-methyl-L-valine-L-tryptophan-derived dipeptide. LtxA also

facilitates the reductive offloading of the dipeptide from the

complex. The cleaved product is then proposed to undergo

subsequent P450 oxygenase (LtxB)-catalysed oxidation and cyc-

lisation reactions to yield the intermediate (�)-indolactam V 20,

which is the substrate for prenylation.105 Although 20 itself has

not been isolated from L. majuscula, it has been isolated from

Streptomyces spp.106 LtxC shows little sequence similarity to other

aromatic prenyl transferases aside from the cyclomarin N-prenyl-

transferase CymD, with which it shares 24% identity.107 In order

to confirm the function of the proposed prenyl transferase, puri-

fied LtxC was incubated with 20 in the presence of geranyl

pyrophosphate (GPP) to yield the expected product lyngbyatoxin

A. LtxD is part of a diverse family of oxidase/reductase-type

proteins and is proposed to play a role in the conversion of the

major metabolite 17 to 18 and 19 (Scheme 4).105 In this instance,

the conversion to the minor metabolites provides some insight

into the substrate specificity of such enzymes. Recent in vitro

studies have established an alternative chain termination and

release mechanism in the context of NRPSs. Probing of LtxA

revealed that the PCP-peptidyl thioester is reduced to a primary

alcohol via an aldehyde prior to substrate prenylation.108

5.2.4 Hectochlorin. The cytotoxic and antifungal agent hec-

tochlorin 21 is another L. majuscula JHB metabolite whose

Nat. Prod. Rep., 2009, 26, 1447–1465 | 1455


Scheme 4

Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

biosynthesis gene cluster is remarkably co-linear with its

product.109 The hectochlorin gene cluster (hct) consists of eight

ORFs and spans 38 kb, and encodes NRPS, PKS, cytochrome

P450, and halogenase enzymes. The hct gene cluster shares

significant sequence similarities with NRPS and PKS elements

from both the jam and bar clusters. HctB codes for a putative

halogenase (as well as an ACP) which shows sequence similarity

at the N-terminal region, to BarB1 and BarB2. The authors

speculate that HctB is involved in the formation of hectochlorins

gem-dichloro group. Other notable features of the hct gene

cluster include a C-methyl transferase encoded within the PKS

module HctD and rarely observed NRPS-embedded ketor-

eductases adjacent to two 2-oxo-isovaleric acid adenylation

domains. These unusual dual function KRs110 reduce 2-oxo-

isovaleric acid-derived moieties to their corresponding

2-hydroxyisovaleric acid moieties, and these are then further

oxidised to generate the 2,3-dihydroxyisovaleric-derived moieties

in hectochlorin. The sequence of the putative transposase

encoding gene hctT is similar to insertion sequence (IS) elements

from other bacteria, and these elements are thought to play a role

in the plasticity of bacterial genomes.109 IS elements are of

interest in terms of genome mining as they are often involved in

the assembly of gene clusters with specialised functions.

As can be gleaned from the above discussion of characterised

biosynthesis gene clusters, a common theme of ‘co-linearity’ is

apparent. The importance of the co-linearity rule with respect to

gene cluster characterisation and natural product structure

prediction will become more apparent in the following sections.

Fig. 2 Completed bacterial genomes to 2008.

6 Genome mining

6.1 Introduction to genome mining

Genome mining is a broad term used to describe several

processes that exploit information which is genetically encoded

1456 | Nat. Prod. Rep., 2009, 26, 1447–1465

within biosynthesis gene clusters, with the ultimate aim of

isolating a novel compound or novel biosynthetic pathway. The

genetically encoded sequence of events which govern a mole-

cule’s assembly not only allows for precise reprogramming of

PKS and NRPS systems, but also for their annotation to provide

predictive chemical structures for unknown products.

Up until recently, a natural product’s biosynthesis gene cluster

was usually characterised after the structure of the natural

product had been determined, and used mainly for the purpose

of investigating the molecule’s biosynthesis. Though this is

still the case today, dramatic breakthroughs in DNA sequencing

technologies has allowed entire bacterial genomes to be

sequenced both quickly and cost-effectively, concomitantly

providing many fully sequenced biosynthesis gene clusters, the

products of which are often unknown. These gene clusters are

referred to as ‘orphans’. Streptomyces coelicolor has been

the focus of biosynthetic studies for many years due mainly to the

diverse array of natural products it produces, and is considered

the model organism for genomics-based investigations. Genome

sequencing of the model streptomycete S. coelicolor A3(2)

revealed many more orphan biosynthesis gene clusters than

anticipated, demonstrating that even well-studied taxa have the

potential to yield novel products.111 These observations sparked

greater interest in genome mining as an alternative route to

discovering novel and potentially bioactive natural products.

Upwards of one thousand bacterial genomes have been

sequenced (of which only a small percentage are cyanobacterial –

Fig. 2) thus providing massive amounts of freely available gene

sequence data for genome mining. Bacterial genomes have been

sequenced in an effort to understand their roles in processes such



Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

as CO2 and N2 fixation, iron acquisition, symbioses and envi-

ronmental adaptation, while some genomes, in particular some

actinomycete genomes, have been sequenced for the sole purpose of

identifying natural product biosynthesis gene clusters. The utility of

sequencing is exemplified by the recent report of the genome of the

marine obligate actinomycete Salinispora tropica.112 Careful anno-

tation of the genome revealed that approximately 10% of the strain’s

5.2 Mb are dedicated to natural product assembly. Of the 17 gene

clusters described, approximately two-thirds could be classed as

orphans. A bioinformatics approach allowed, at very least, struc-

ture types encoded by these orphan clusters to be assigned, while

other clusters were assigned to compounds already identified from

S. tropica. As with all prediction-based methods, rigorous analytical

techniques are required to confirm these predictions. The vital

interplay between genomics-driven structure prediction based on

biosynthetic logic and the isolation of novel natural products

was demonstrated by the discovery of a polyene macrolactam,

salinilactam A.112 A bioinformatics analysis of three ketoreductase

domains associated with the encoding gene clusters allowed the

tentative assignment of absolute stereochemistries to three hydroxyl

groups. Conversely, the repetitive nature of the polyene helped

organise a highly repetitive region of the encoding gene cluster,

which ultimately led to the closure of the genome sequence.112

6.2 Genome mining techniques

Discovery of new natural products derived from orphan gene

clusters using combinations of analytical and molecular

biology techniques is now a reality, and several short reviews

outlining these techniques and have been published

recently.113,114 It is out of the scope of this paper to present

each technique in great detail; however, it is necessary to

highlight those which could be used specifically for genome

mining of cyanobacteria.

The relatively common technique of biosynthesis gene inacti-

vation followed by comparative metabolic profiling of the wild-

type and mutant strains by way of analytical HPLC is used to

relate a gene cluster and its product. Assuming that the strain is

indeed culturable, technical difficulties associated with inacti-

vating genes in cyanobacteria render this method less than

desirable. Heterologous gene expression represents an alternative

technique, but given that many cyanobacterial gene clusters of

interest are of the larger NRPS type, the usual hurdles associated

with expressing these, such as determining the optimal expression

system, first need to be overcome. Small cyanobacterial NRPS/

PKS gene clusters have also been cloned using fosmids, such as in

the cloning of the lyngbyatoxin gene cluster from the cyano-

bacterium Lyngbya majuscula.105 The small size of the lyngbya-

toxin biosynthetic gene cluster permitted this approach; however,

fosmid cloning and other phagemid-based systems have been

largely superceded by BAC cloning, due to the limited insert size.

Promising candidates for the heterologous expression of cyano-

bacterial compounds include the well-characterised cyano-

bacterial genera Synechocystis and Synechococcus. These strains

do not produce secondary metabolites but are relatively fast-

growing and easy to manipulate.115,116 However, mechanisms for

large gene cluster transfer, such as BACs and cosmids, have not

yet been developed for cyanobacteria. Stepwise homologous

recombination for the chromosomal integration of NRPS/PKS


genes, such as in the transfer of the 49 kb bacitracin synthetase

gene cluster in B. subtilis117 and the 65 kb epothilone NRPS/PKS

gene cluster into Myxococcus xanthus,118 may be the only

currently feasible approach for cyanobacterial species. The

technique likely to provide the most direct avenue toward iden-

tification of orphan gene cluster products in cyanobacteria

appears to be the ‘Genomisotopic Approach’. This approach was

first used to identify the novel NRPS-derived molecule orfamide

A 22 from the bacterium Pseudomonas fluorescens Pf-5.27 The

two-pronged approach relies initially upon a bioinformatics-

based prediction of natural product precursors from the gene

cluster sequence. The appropriate 15N or 13C–15N labelled

precursors are fed to the culture under optimised conditions, and

the isolation of the natural product is guided by selective NMR

experiments. This technique is particularly useful for decoding

the products of clusters encoding NRPS adenylation domains.

Models used to predict amino acid recognition by NRPS ade-

nylation domains have been developed based on critical binding-

pocket residues.119 These models were not developed using many

cyanobacterial sequences, and should thus be used with caution

when mining cyanobacterial genomes. Though the predictive

tools could indeed provide accurate data, the relaxed substrate

specificity of adenylation domains encoded by cyanobacterial

genomes means that such tools should be used only as a guide,

and not considered definitive. Predictions could be verified using

the traditional ATP-[32P]pyrophosphate (PPi) exchange assay or

the recently described non-radioactive colorimetric assay that

quantifies orthophosphate (Pi) derived from degraded PPi as

a means of determining activity.120

6.3 Illustrative examples of genome mining in cyanobacteria

To date there have been very few examples of genome mining in

cyanobacteria. The examples that do exist serve to highlight the

utility of genome mining, not only for new natural products but

also for the identification and characterisation of elusive

biosynthesis gene clusters. Aside from the scytonemin bio-

synthesis gene cluster, the gene clusters highlighted below all

code for the biosynthesis of cyclic peptides containing hetero-

cyclized residues. Collectively, these peptides are known as

cyanobactins.83

6.3.1 Patellamides. The patellamides are cyclic peptides often

isolated from didemnid ascidians and are thought to be bio-

synthesised by obligate cyanobacterial symbionts of the genus

Nat. Prod. Rep., 2009, 26, 1447–1465 | 1457


Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

Prochloron. Attempts to culture Prochloron spp. have so far been

unsuccessful, and thus biosynthetic studies have proven difficult.

In an effort to sustain metabolite production, an approach

employing shotgun cloning and heterologous expression of

Prochloron sp. DNA in E. coli was used to confirm Prochloron sp.

as a patellamide producer.121 Independently, and as part of the

Prochloron didemni sequencing project, the draft genome

sequence was mined for patellamide biosynthesis genes.122 The

patellamide biosynthesis gene cluster (pat) was identified;

however, rather than being assembled by an NRPS as first

anticipated,121,123 the patellamides were found to be assembled

ribosomally.83,122 To unequivocally prove that the 11-kb 7-ORF

(patA–patG) gene cluster was indeed responsible for patellamide

biosynthesis, the cluster was heterologously expressed in E. coli.

Analysis of the resulting extract by LC-MS revealed production

of patellamide A 23, thus providing conclusive evidence for

ribosomal assembly.83,122

6.3.2 Trichamide. Sequence similarities to homologs of

patellamide biosynthesis genes in the genome of Trichodesmium

erythraeum ISM101, revealed the strikingly similar trichamide

biosynthesis gene cluster (tri) encompassing eleven ORFs and

spanning 12.5 kb. Structure prediction followed by rigorous

chemical analysis of the culture extract revealed trichamide 24,

a novel natural product and the first reported metabolite from

Trichodesmium erythraeum, thus illustrating the power of

genome mining for natural product discovery.124

6.3.3 Microcyclamides. The discovery of pat biosynthesis

gene cluster led to the identification of ribosomal biosynthesis

genes in Microcystis aeruginosa NIES-298 responsible for the

biosynthesis of microcyclamide using a PCR-directed approach

employing degenerate primers based on pat and tri gene

1458 | Nat. Prod. Rep., 2009, 26, 1447–1465

sequences. Scanning of the M. aeruginosa PCC7806 genome for

similar clusters ultimately led to the isolation and structure

elucidation of two new microcyclamides, 7806A 25 and 7806B

26.81,125 A recent review comparing biosynthesis gene clusters

yielding similar peptides in other cyanobacteria revealed

a global assembly line responsible for cyanobactins. After

PKSs and NRPSs, the cyanobactin biosynthesis assembly

line represents another major route to small molecules in

cyanobacteria.126

6.3.4 Scytonemin. Scytonemin 27 is a UV-absorbing pigment

that plays an important role in protecting cyanobacteria from

harmful exposure.127 Transposon mutagenesis of the scytonemin-

producing strain Nostoc punctiforme ATCC 29133 resulted in

the generation of a mutant strain, SCY 59, unable to produce

scytonemin.128 Genomic analyses of the mutated region sup-

ported a biosynthetic role for the mutated gene, and allowed the

putative assignment of the 18-ORF 28-kb scytonemin biosyn-

thesis gene cluster. A biosynthetic route to scytonemin, however,

was not proposed.128 Subsequent gene expression studies

confirmed that the putative cluster was indeed involved in scy-

tonemin biosynthesis,129 and the proposed initial steps of its

biosynthesis from L-tryptophan and prephenate have been vali-

dated.130 Comparison of scytonemin biosynthesis gene clusters in

six cyanobacterial genomes revealed two major architectures,

and these appear to have evolved through genetic rearrange-

ments and insertions.131

7 Mining sequenced cyanobacterial genomes forbiosynthesis gene clusters

Literature concerning the sequenced genomes of cyanobacteria

lacks any real discussion of genes or gene clusters encoding for



Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

proteins involved in natural product biosynthesis. Unlike those

papers111,112,132 reporting the genomes of the actinomycetes

S. avermitilis, S. coelicolor and S. tropica where detailed analyses

of their biosynthetic potential is presented, cyanobacterial

genome reports focus upon topics such as their evolutionary

adaptation to environmental niches and proposed roles in

complex microbial communities. Some literature describes the

production of vital primary metabolites and the genes respon-

sible for their biosynthesis, as well as genes involved in the

biosynthesis of cofactors and carrier proteins, which may

indeed be linked to secondary metabolite production; however,

there is no global view of the organism’s biosynthetic potential.

In a recent study, cyanobacterial strains were analysed for their

biosynthetic potential using PCR by amplifying NRPS and

PKS genes with degenerate primers. Of the twenty-four test

strains, both NRPS and PKS genes were amplified from

seventeen, while only three showed negative results for both

PKS and NRPS genes.133 While these data may be biased in

that many of the cyanobacteria screened are known toxin

producers, it does reveal the widespread occurrence of PKS and

NRPS genes in cyanobacteria and thus supports the notion that

these genes are indeed useful targets for genome mining.

A genome mining study of 223 bacterial genome sequences

deposited up to 2005 included eight cyanobacteria. The focus of

the study was to identify genes associated with clustered PKSs

and/or NRPSs, or as the authors termed them, thiotemplate

modular systems (TMS).134 TMS genes were detected in only

two of the strains, Anabaena sp. PCC7120 and Gloeobacter

violaceus PCC7421, and they constitute �1.5% and 1% of their

respective genomes.134 In comparison, the same study revealed

that TMS genes constituted �3.9% and 1.5% of the 9.0 Mb

S. avermitilis MA-4680 and 8.7 Mb S. coelicolor A3(2)

genomes, respectively.134 S. avermitilis and S. coelicolor A3(2)

are considered prolific producers of secondary metabolites with

�6.6%132 and 8.0%111 of their respective genomes dedicated to

secondary metabolism. The recently sequenced S. tropica

dedicates �10% of its 5.2 Mb genome to natural product

assembly, with at least 6% dedicated to genes associated

with clusters (not only TMS genes) encoding the biosynthesis of

type I PKS, NRPS, or hybrid PKS/NRPS-derived products,

known and unknown.112 It is interesting to note that the biosyn-

thesis of secondary metabolites is linked to cyanobacteria whose

genomes are generally larger than 4 Mb, and thus the production

of such metabolites can be considered a luxury. To date (as was

the case up to 2005) the majority of published cyanobacterial

genome sequences are Synechococcus and Prochlorococcus spp., all

of which possess small genomes (<4 Mb), and appear to lack

secondary metabolite biosynthesis genes.134 This observation is

primarily true of most bacterial taxa.


7.1 Our approach

Up until August 2008, 34 cyanobacterial genomes (Table 2) had

been completely sequenced, and many of these have never

been investigated for their biosynthetic potential using a purely

bioinformatics-based approach. Like previous studies, we

focussed upon genes encoding NRPSs and PKSs, as these are

responsible for the biosynthesis of the biologically active mole-

cules commonly associated with cyanobacteria and, as described

earlier, the clustered gene architecture makes them amenable to

automated bioinformatics analyses.

Rather than searching for genes and gene clusters, our

approach involved identifying replicated genomic features

across sequenced genomes in our dataset, specifically common

catalytic protein domains essential to NRPSs and PKSs. This

methodology is an extension of the method developed by Pasek

and co-workers.135,136 It adopts the technique of targeting protein

domains such as those found in the Protein Family database

(Pfam),137 and allows for the detection of strings of domains

(termed ‘domain teams’) that are conserved in their content but

not necessarily their order. In designating domain teams, we

essentially decomposed NRPS and PKS genes into the domains

of the proteins they code for. To the best of our knowledge this is

the first report using such methodology to mine sequenced

genomes for conserved protein domains, as domain teams, for

the discovery of natural product biosynthesis gene clusters. As

seen from the gene clusters presented earlier, the highly ordered

and repetitive nature of such NRPS and PKS systems makes

them ideal candidates for domain team analysis.

In short, Pfam is a collection of protein families and domains.

Pfam (version 22.0) contains a total of 9318 protein families

including those involved in natural product assembly, and covers

73% of sequences found in UniProtKB version 9.7.137 In gener-

ating our data, NRPS modules were detected using the Pfam

identifiers PF00501 (AMP-binding enzyme), PF00608 (Conden-

sation domain), and PF00550 (Phosphopantetheine attachment

site), and PF00109/PF02801 (Beta-ketoacyl synthase N- and

C-terminal domains) was used to identify ketosynthase modules.

All sequences and genome coordinates of protein-coding genes

were extracted from the Genbank entries of the available closed

cyanobacterial genomes and searched against the Pfam database

using an automated HMMER wrapper.138 The search results

were tabulated as an input file for Domain Teams using a custom

PERL script. Each individual chromosome was assigned an

identification code, and in cases where the organism had multiple

chromosomes (i.e. plasmids), the genome and plasmids were

assigned individual identification codes. The final dataset

comprised 59 chromosomes and a total of 154 279 domains.

Domain teams were computed using a d-value of 3. That is, in

defining a domain team, we have stipulated that there cannot be

more than 3 domains between any two of the targeted protein

domains on the same chromosome. Domain teams were sorted

using the formula (10 Nchromosomes + 5 Ndomains + Nproteins) where

Nchromosomes, Ndomains and Nproteins are the number of chromo-

somes, domains and proteins, respectively, in a domain team.

Sorting allows those teams containing the highest number of

domains and the larger variety of syntenic chromosomes to be

examined first. In order to predict the total number of clusters

present in each genome and because each putative cluster can

Nat. Prod. Rep., 2009, 26, 1447–1465 | 1459


Table 2 Sequenced cyanobacterial genomes

Strain Genome size (bp) Institution (submission date)

Acaryochloris marina MBIC11017 8 361 599d Translational Genomics Research Institute (27 Aug 2007)Anabaena variabilis ATCC29413 7 068 601d US DOE Joint Genome Institute (12 Sep 2005)Crocosphaera watsonii WH8501 6 238 156 US DOE Joint Genome Institute (13 Jun 2005)Gloeobacter violaceus PCC7421 4 659 019 Kazusa DNA Research Institute, Chiba, Japan (15 Aug 2003)Lyngbya aestuarii CCY9616 7 087 904 J. Craig Venter Institute (14 Dec 2006)Microcystis aeruginosa NIES-843 5 842 795 Kazusa DNA Research Institute, Chiba, Japan (22 Nov 2007)Nodularia spumigena CCY9414 5 357 061 J. Craig Venter Institute (25 Oct 2005)Nostoc punctiforme PCC73102 9 059 191d US DOE Joint Genome Institute (07 Apr 2008)Nostoc sp. PCC7120a 7 211 789d Kazusa DNA Research Institute, Chiba, Japan (02 May 2001)Prochlorococcus marinus AS9601 1 669 886 J. Craig Venter Institute (06 Nov 2006)Prochlorococcus marinus CCMP1375 1 751 080 Genoscope – Centre National de Sequencage, Evry, France (28 May 2003)Prochlorococcus marinus MED4 1 657 990 DOE Joint Genome Institute (03 Jul 2003)Prochlorococcus marinus MIT 9211 1 688 963 Massachusetts Institute of Technology (24 Oct 2007)Prochlorococcus marinus MIT 9215 1 738 790 US DOE Joint Genome Institute (07 Sep 2007)Prochlorococcus marinus MIT 9301 1 641 879 J. Craig Venter Institute (16 Feb 2007)Prochlorococcus marinus MIT9303 2 682 675 J. Craig Venter Institute (10 Jan 2007)Prochlorococcus marinus MIT 9312 1 709 204 US DOE Joint Genome Institute (27 Jul 2005)Prochlorococcus marinus MIT 9313 2 410 873 US DOE Joint Genome Institute (03 Jul 2003)Prochlorococcus marinus MIT 9515 1 704 176 J. Craig Venter Institute (06 Nov 2006)Prochlorococcus marinus NATL1A 1 864 731 J. Craig Venter Institute (06 Nov 2006)Prochlorococcus marinus NATL2A 1 842 899 US DOE Joint Genome Institute (08 Aug 2005)Synechococcus sp. CC9311 2 606 748 The Institute for Genomic Research (04 Aug 2006)Synechococcus sp. CC9605 2 510 659 US DOE Joint Genome Institute (27 Jul 2005)Synechococcus sp. CC9902 2 234 828 US DOE Joint Genome Institute (08 Aug 2005)Synechococcus elongatus PCC6301 2 696 255d Center for Gene Research, Nagoya University, Japan (10 Dec 2004)Synechococcus elongatus PCC7942 2 742 269 US DOE Joint Genome Institute (08 Aug 2005)Synechococcus sp. RCC307 2 224 914 Genoscope – Centre National de Sequencage, Evry, France (19 May 2006)Synechococcus sp. WH8102 2 434 428 US DOE Joint Genome Institute (25 Jun 2001)Synechococcus sp. WH7803 2 366 980 Genoscope – Centre National de Sequencage, Evry, France (19 May 2006)Synechococcus sp. JA-3-3A0bb 2 932 766 The Institute for Genomic Research (10 Mar 2006)Synechococcus sp. JA-2-3B0ac 3 046 682 The Institute for Genomic Research (21 Mar 2006)Synechocystis sp. PCC6803 3 947 019d Kazusa DNA Research Institute, Chiba, Japan (01 Nov 2001)Thermosynechococcus elongatus BP-1 2 593 857 Kazusa DNA Research Institute, Chiba, Japan (05 Jun 2002)Trichodesmium erythraeum IMS101 7 750 108 US DOE Joint Genome Institute (21 Jun 2006)

a Also known as Anabaena sp. PCC7120. b Also known as Cyanobacteria Yellowstone A-Prime. c Also known as Cyanobacteria Yellowstone B-Prime.d Plasmid inclusive.

Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

appear in multiple domain teams, we also developed a custom

PERL script to dereplicate the set of unique PKS/NRPS clusters.

The major limitation of this method is that the amount of

computational time required to determine domain teams grows

exponentially with the size of the input dataset.

7.2 Summary of our findings

Our preliminary genome mining data is displayed in Table 3. The

table shows the numbers of domain teams of PKS, NRPS, and

hybrid PKS/NRPS protein domains detected from sequenced

cyanobacterial genomes (and associated plasmids) using the

approach described above. In order to simplify the table,

biosynthesis clusters (that is clusters of protein domains rather

than the classical gene clusters) detected on plasmids of an

organism are included in brackets. An in-depth discussion of

each biosynthesis cluster (or even each genome) is out of the

scope of this review.

Like any genome mining exercise, we encountered some data

which could be considered falsely positive. That is, any PKS and/

or NRPS elements (Pfam identifiers) detected were considered to

be a cluster if the arrangement of domains was detected in

a larger, or for that matter smaller, cluster from any one of the

other chromosomes in the data set. The data included in the table

1460 | Nat. Prod. Rep., 2009, 26, 1447–1465

are the raw, dereplicated, output of our genome mining analysis.

In analysing the data generated, the logical first step is to

determine whether the output can be truly considered a bona fide

biosynthesis cluster. This can be considered selective mining and

be achieved simply by scanning the data, a task simplified by the

graphical output of the Domain Teams analysis. An example is

shown in Fig. 3. Another consideration is one which still causes

some consternation, and that is determining gene cluster

boundaries and linker regions between genes. While our domain

teams based approach doesn’t allow us to unequivocally deter-

mine these, automated bioinformatics tools such as UMA

(Udwary–Merski algorithm)139 and CLUSEAN (CLUster

SEquence ANalyzer)140 can be used for multimodular systems

such as PKSs and NRPSs and assist with the complete anno-

tation of a biosynthesis gene cluster.

The graphical output of Domain Teams is highly interactive.

All identified protein domains are clickable links to the Pfam

website, where a summary of the protein family and its function

can be found. This allows for other domains which were not

initially searched for in the genome mining stage of the analysis

to be quickly determined, such as PF00975 (thioesterase domain)

and PF08242 (methyl transferase domain) as shown in Fig. 3.

Protein domains are grouped according to their respective coding

genes, and the clickable NCBI linked locus tag becomes a query



Table 3 Numbers of domain teams identified in sequenced cyano-bacterial strainsd

Strain PKS NRPS Hybrid

Acaryochloris marina MBIC11017 3a 2b (2)b 1 (1)Anabaena variabilis ATCC29413 5 1 (1) 4 (1)Gloeobacter violaceus PCC7421 4 3 2Microcystis aeruginosa NIES-843 4 3 3Nostoc punctiforme PCC73102 7 6 (1) 8 (1)Nostoc sp. PCC7120 6 1 5Prochlorococcus marinus AS9601 2a 1b 1c

Prochlorococcus marinusCCMP1375

2a 1b 1c

Prochlorococcus marinus MED4 2a 1b 1c

Prochlorococcus marinus MIT9211 2a 1b 1c



Prochlorococcus marinus MIT9303 2a 1 2c


Prochlorococcus marinus MIT9313 1 1b 1c


Prochlorococcus marinus NATL1A 2a 1b 1c

Prochlorococcus marinus NATL2A 2a 1b 1c

Synechococcus sp. CC9311 3a 1b 2c

Synechococcus sp. CC9605 3a 1 1c

Synechococcus sp. CC9902 2 1b 1c

Synechococcus elongatus PCC6301 2a 1b 1c

Synechococcus elongatus PCC7942 2a 1b 1c

Synechococcus sp. RCC307 3a 2b 1c

Synechococcus sp. WH7803 2a 1b 1c

Synechococcus sp. WH8102 3a 1b 1c

Synechococcus sp. JA-3-3A0b 3a 1b 1Synechococcus sp. JA-2-3B0a 2a 1 1c

Synechocystis sp. PCC6803 3 1b 1c

Thermosynechococcus elongatusBP-1

3a 1b 1c

Trichodesmium erythraeum IMS101 3 2 1

a Chromosomes share a common PKS domain team. b Chromosomesshare a common NRPS domain team. c Chromosomes sharea common hybrid domain team. d Numbers in brackets refer todomain teams detected on plasmids.

Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

term for an Entrez cross-database search, revealing copious

amounts of useful information, particularly gene sequence

data and locations. This provides an efficient means of der-

eplicating previously sequenced gene clusters and dereplicating

highly similar clusters across sequenced genomes. With a unique

(orphan) cluster at hand, bioinformatics tools such as the

NRPS/PKS predictive BLAST141 can be used as a first step

Fig. 3 Domain Teams graphical output of the microcys


toward predicting the backbone of the molecule prior to

employing rigorous chemical and molecular techniques required

to elucidate the natural product’s structure. Full details of this

new method and derived results will be published elsewhere.

The following sections highlight some of our findings by way

of comparison of our data with those reported from previous

cyanobacterial genome mining studies.

7.2.1 Microcystis aeruginosa NIES-843. The original report

and annotation of the Microcystis aeruginosa NIES-843 genome

revealed, as expected, gene clusters for the biosynthesis of the

microcystins (mcyA–J) and the cyanopeptolins (A 28) (mcnA–C

and mcnE–G), as well as one other orphan NRPS spanning 17 kb

reportedly located between co-ordinates 5202708–5219745.61

Using the domain teams approach, all three of these clusters were

detected along with an additional hybrid NRPS/PKS spanning at

least 17 kb over five ORFs (MAE_27800–MAE_27840) located

between co-ordinates 2508556–2525761.

7.2.2 Nostoc sp. PCC7120. The domain teams analysis sup-

ported Donadio’s findings in Nostoc sp. PCC7120.134 Major

clusters include a small PKS likely responsible for heterocyst

glycolipid synthesis, a hybrid biosynthesis cluster encoding four

PKSs and one NRPS, a monomodular NRPS, and a large cluster

encoding two PKSs and ten NRPSs. This cluster spans at least

55 kb over fourteen ORFs and codes for adenylation domains

activating serine (�2), glycine (�4), cysteine, valine, as well as

two others whose substrate could not be unequivocally deter-

mined using the NRPS/PKS BLAST tool. Adjacent to the

terminal NRPS module is a gene encoding a putative ABC

transporter. The domain teams approach revealed a small cluster

coding for two PKS modules, both of which code for a methyl

tin biosynthesis cluster in M. aeruginosa NIES-843.

Nat. Prod. Rep., 2009, 26, 1447–1465 | 1461


Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

transferase. Interestingly, a �3 kb gene located nearby to this

PKS on the genome was annotated as an ABC transporter linked

to toxin secretion.

7.2.3 Gloeobacter violaceus PCC7421. Domain teams anal-

ysis revealed a �30 kb region of the genome containing three

thioesterase domains. Two of these could be assigned to a tri-

modular PKS likely involved in glycolipid biosynthesis134 and

a bimodular PKS. The remaining thioesterase domain is located

on a gene that does not appear to be directly linked to any

biosynthesis cluster. Two small modular PKSs (�11 and 14 kb)

and numerous isolated ketosynthase domains were also detected.

7.2.4 Nostoc punctiforme PCC73102. A bioinformatics-

based overview of the �9 Mb Nostoc punctiforme PCC73102

(also known as ATCC 29133) revealed 62 ORFs encoding

proteins involved in secondary metabolite biosynthesis.59 While

the authors state that these are involved in microcystins

biosynthesis, it is likely that they are involved in the assembly of

other non-microcystin NRPS- and PKS-derived products, as to

date there is no chemical evidence of microcystin production by

terrestrial Nostoc spp. Two clusters of twelve and fourteen

biosynthesis genes spanning �47 and 49 kb, respectively, were

identified along with a host of smaller gene sets. A subsequent

BLAST-based genomic survey of N. punctiforme ATCC29133

revealed 17 genes encoding NRPSs (encompassing 42

A-domains) and 10 genes encoding PKSs (encompassing

22 KS-domains).142 These data suggest that N. punctiforme’s

biosynthetic potential is greater than any other sequenced

cyanobacterium and supports the notion that organisms with

larger genomes encode a greater number of secondary metabolite

pathways than those organisms with smaller genomes. Our

analyses revealed several noteworthy biosynthesis gene clusters,

many of which encoded archetypal PKSs and/or NRPSs. The

seven major biosynthesis clusters (the smallest being 16 kb)

combined encoded 45 A-domains and 24 KS-domains. In total,

these seven gene clusters span at least 250 kb, or 2.5 times the

length of TMS gene sequence as determined by Donadio for

Anabaena sp. PCC7120.134 One of these clusters, located between

co-ordinates 2667205–2708401 (Npun_F2181-NpunF_2188),

codes for nostopeptolide biosynthesis, as previously reported.67

An NRPS/PKS hybrid cluster spanning at least 57 kb over five

genes includes one gene 16 kb in length encoding three NRPSs

and one PKS. An even larger cluster (62 kb) spanning 27 genes

encodes four NRPSs and five PKSs, as well as two methyl-

transferases and an aminotransferase. Of the seven major

clusters, six are hybrids, and the other encodes an NRPS with six

adenylation domains.

7.2.5 Anabaena variabilis ATCC29413. The 8.4 Mb

Anabaena variabilis ATCC29413 genome codes for several small

to mid-sized hybrid NRPS/PKSs including a �36 kb cluster

encoding seven adenylation domains and a single PKS which we

determined to be the largest cluster from this organism. A 27 kb

locus coding for four ketosynthase domains and two adenylation

domains was the only other cluster identified greater than 20 kb.

The remainder of the�130 kb TMS sequence from A. variabilis is

accounted for by a small PKS/NRPS (16 kb) containing four

adenylation domains and one ketosynthase, a trimodular NRPS

1462 | Nat. Prod. Rep., 2009, 26, 1447–1465

(13 kb), a small PKS (16 kb) and two 11 kb fragments in close

proximity to each other, each encoding two NRPSs.134 An

unanticipated finding resulted from the analysis of a plasmid

from this strain. We discovered an NRPS containing seven

adenylation domains and a terminal thioesterase encoded by

a 25 kb gene cluster. This discovery nicely illustrates the power of

this approach we used for genome mining.

These selected examples of genome mining using the ‘Domain

Teams’ approach reveal a rich array of potential non-ribosomal

peptide and polyketide biosynthesis gene clusters in cyanobac-

teria. Here, we have found that TMS genes account for at least

127 kb (or 2.2%) of the Microcystis aeruginosa NIES-843

genome, 92 kb (1.3%) of the Nostoc sp. PCC7120 genome, 56 kb

(1.2%) of the Gloeobacter violaceus PCC7421 genome, 283 kb

(3.1%) of the Nostoc punctiforme PCC73102 genome and 157 kb

(2.2%) of the Anabaena variabilis ATCC29413 genome. It is

tempting to speculate on the structures of the natural products

these gene clusters encode, or even assign these clusters to known

chemical entities. However, it is out of the scope of this review to

detail all the different permutations, and our efforts in assigning

these clusters to either new or known natural products are

ongoing.

8 Discussion

We have clearly shown that the ‘Domain Teams’ approach that

we have developed for genome mining is indeed a valid and

robust method for identifying, in this case, biosynthesis clusters

encoding PKSs, NRPSs, and hybrids thereof. Our data support

Donadio’s findings that in general small genomes are devoid of

TMS genes,134 though we did detect protein domains commonly

associated with PKSs (likely involved in fatty acid biosynthesis)

and NRPSs (AMP binding domains), agreeing with Webb’s

observation that there is a significant correlation between

numbers of A and KS domains and genome size.142 In general,

cyanobacteria from the order Nostocales, which includes

Anabaena, Nodularia and Nostoc species, are amongst the more

prolific of the small-molecule-producing microorganisms, and it

is these that possess larger genomes. We successfully mined

genomes for clusters not previously identified from sequenced

cyanobacteria or reported from other genome mining surveys;

however, we are yet to determine whether these are novel. Some

of these clusters may code for biosynthesis pathways in operation

in other yet-to-be-sequenced cyanobacterial strains, and some

clusters we identified here may have already been sequenced in

other organisms, where the focus has been the biosynthesis of

a particular molecule itself. In order to clarify this, and support

any bioinformatics-based genome mining exercise, rigorous

analytical methods need to be employed, especially in those cases

where clusters are reorganised. A major advantage of this

approach is that by changing the search keys (i.e. pfam identi-

fiers) it is possible to quickly identify other gene clusters such as

those encoding toxin assembly without the need to regenerate the

data matrix. For example, mining for biosynthesis clusters

encoding terpenoid pathways is possible, assuming of course that

the biosynthesis genes (and hence protein domains) are clustered.

This is achievable through searching specifically for catalytic

protein domains associated with terpenoid biosynthesis. We

could potentially take this analysis one step further by, for



Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

example, searching specifically for meroterpenoids (e.g. napyra-

diomycin143) using queries based on both terpene biosynthesis

and polyketide biosynthesis. A general drawback of using

‘Domain Teams’ is that the method can really only be applied to

closed genomes, as it requires knowledge of the relative position

of individual genes (and associated domains) on a chromosome.

Therefore, it is inapplicable to large (and mostly unassembled)

metagenomic data. Given the relatively small number of

cyanobacterial genomes sequenced compared with other micro-

organisms, and the relative abundance of biosynthesis gene

clusters in larger cyanobacterial genomes, it is clear that the

yet-to-be-discovered biosynthetic potential of cyanobacteria is

immense.

9 Concluding remarks

Nature has evolved discrete strategies for the assembly of

structurally diverse natural products derived from complex

biosynthetic pathways. Mining these pathways in cyanobacterial

genomes has revealed a greater biosynthetic potential for novel

natural products than anticipated. The challenge now is to

exploit such pathways for the discovery of novel natural prod-

ucts, and at the same time use genomic information to assist in

the development of modern technologies relevant to the phar-

maceutical, agricultural, and biotechnology industries.

10 Acknowledgements

Research on toxin biosynthesis in the author’s laboratory is

funded by the Australian Research Council (ARC), and B.A.N.

is a Federation Fellow of the ARC. J.A.K. and F.M.L. are

supported by the UNSW Environmental Microbiology Initia-

tive. F.M.L. is a UNSW Vice-Chancellor’s Postdoctoral

Fellowship recipient.

11 References

1 J. W. Schopf and B. M. Packer, Science, 1987, 237, 70–73.2 A. C. Allwood, M. R. Walter, B. S. Kamber, C. P. Marshall and

I. W. Burch, Nature, 2006, 441, 714–718.3 S. M. Awramik, Photosynth. Res., 1992, 33, 75–89.4 R. Buick, Philos. Trans. R. Soc. London, Ser. B, 2008, 363, 2731–

2743.5 A. Lazcano and S. L. Miller, J. Mol. Evol., 1994, 39, 546–554.6 L. J. Stal, in The Ecology of Cyanobacteria, eds. B. A. Whitton and

M. Potts, Kluwer Academic Publishers, Dordrecht, 2000,pp. 61–120.

7 R. Castenholz and J. B. Waterbury, in Bergey’s Manual ofSystematic Bacteriology, ed. J. T. Staley, Williams & Wilkins,Sydney, 1989, pp. 1710–1727.

8 H. W. Paerl, J. L. Pinckney and T. F. Steppe, Environ. Microbiol.,2000, 2, 11–26.

9 G. A. Codd, S. G. Bell, K. Kaya, C. J. Ward, K. A. Beattie andJ. S. Metcalf, Eur. J. Phycol., 1999, 34, 405–415.

10 A. M. Burja, B. Banaigs, E. Abou-Mansour, J. G. Burgess andP. C. Wright, Tetrahedron, 2001, 57, 9347–9377.

11 R. E. Moore, J. Ind. Microbiol., 1996, 16, 134–143.12 R. E. Moore, I. Ohtani, B. S. Moore, C. B. Dekoning,

W. Y. Yoshida, M. T. C. Runnegar and W. W. Carmichael, Gazz.Chim. Ital., 1993, 123, 329–336.

13 M. Namikoshi and K. L. Rinehart, J. Ind. Microbiol. Biotechnol.,1996, 17, 373–384.

14 R. M. Van Wagoner, A. K. Drummond and J. L. C. Wright, Adv.Appl. Microbiol., 2007, 61, 89–217.

15 L. T. Tan, Phytochemistry, 2007, 68, 954–979.16 D. J. Faulkner, Nat. Prod. Rep., 1984, 1, 551–598.


17 D. J. Faulkner, Nat. Prod. Rep., 2002, 19, 1–48.18 J. W. Blunt, B. R. Copp, M. H. G. Munro, P. T. Northcote and

M. R. Prinsep, Nat. Prod. Rep., 2003, 20, 1–48.19 J. W. Blunt, B. R. Copp, W. P. Hu, M. H. G. Munro,

P. T. Northcote and M. R. Prinsep, Nat. Prod. Rep., 2009, 26,170–244.

20 J. Berdy, J. Antibiot., 2005, 58, 1–26.21 A. L. Demain and S. Sanchez, J. Antibiot., 2009, 62, 5–16.22 G. A. Codd, L. F. Morrison and J. S. Metcalf, Toxicol. Appl.

Pharmacol., 2005, 203, 264–272.23 K. Harada, Chem. Pharm. Bull., 2004, 52, 889–899.24 K. M. Botham and J. F. Pennock, Biochem. J., 1971, 122, 127.25 S. C. Bobzin and R. E. Moore, Tetrahedron, 1993, 49, 7615–7626.26 B. S. Moore, I. Ohtani, C. B. Dekoning, R. E. Moore and

W. W. Carmichael, Tetrahedron Lett., 1992, 33, 6595–6598.27 H. Gross, V. O. Stockwell, M. D. Henkels, B. Nowak-Thompson,

J. E. Loper and W. H. Gerwick, Chem. Biol., 2007, 14, 53–63.28 M. A. Fischbach and C. T. Walsh, Chem. Rev., 2006, 106,

3468–3496.29 S. E. O’Connor, Nat. Chem. Biol., 2006, 2, 511–512.30 J. W. Trauger, R. M. Kohli, H. D. Mootz, M. A. Marahiel and

C. T. Walsh, Nature, 2000, 407, 215–218.31 C. T. Walsh, Science, 2004, 303, 1805–1810.32 M. C. Moffitt and B. A. Neilan, Appl. Environ. Microbiol., 2004, 70,

6353–6362.33 D. Tillett, E. Dittmann, M. Erhard, H. von Dohren, T. Borner and

B. A. Neilan, Chem. Biol., 2000, 7, 753–764.34 L. H. Du, C. Sanchez and B. Shen, Metab. Eng., 2001, 3, 78–95.35 E. S. Sattely, M. A. Fischbach and C. T. Walsh, Nat. Prod. Rep.,

2008, 25, 757–793.36 U. Rix, C. Fischer, L. L. Remsing and J. Rohr, Nat. Prod. Rep.,

2002, 19, 542–580.37 L. S. Luo, M. D. Burkart, T. Stachelhaus and C. T. Walsh, J. Am.

Chem. Soc., 2001, 123, 11208–11218.38 D. B. Stein, U. Linne, M. Hahn and M. A. Marahiel, ChemBioChem,

2006, 7, 1807–1814.39 T. Stachelhaus and C. T. Walsh, Biochemistry, 2000, 39,

5775–5787.40 H. Sielaff, E. Dittmann, N. T. De Marsac, C. Bouchier, H. Von

Dohren, T. Borner and T. Schwecke, Biochem. J., 2003, 373,909–916.

41 G. Christiansen, J. Fastner, M. Erhard, T. Borner and E. Dittmann,J. Bacteriol., 2003, 185, 564–572.

42 L. Rouhiainen, T. Vakkilainen, B. L. Siemer, W. Buikema,R. Haselkorn and K. Sivonen, Appl. Environ. Microbiol., 2004, 70,686–692.

43 M. Namikoshi, K. L. Rinehart, R. Sakai, R. R. Stotts,A. M. Dahlem, V. R. Beasley, W. W. Carmichael andW. R. Evans, J. Org. Chem., 1992, 57, 866–872.

44 H. C. Losey, M. W. Peczuh, Z. Chen, U. S. Eggert, S. D. Dong,I. Pelczer, D. Kahne and C. T. Walsh, Biochemistry, 2001, 40,4745–4755.

45 C. T. Walsh, H. C. Losey and C. L. Freel Meyers, Biochem. Soc.Trans., 2003, 31, 487–492.

46 W. Saurin, M. Hofnung and E. Dassa, J. Mol. Evol., 1999, 48,22–41.

47 R. J. Jovell, A. J. L. Macario and E. C. deMacario, Gene, 1996, 174,281–284.

48 P. M. Jones and A. M. George, FEMS Microbiol. Lett., 1999, 179,187–202.

49 D. M. Gardiner, R. S. Jarvis and B. J. Howlett, Fungal Genet. Biol.,2005, 42, 257–263.

50 R. H. Proctor, D. W. Brown, R. D. Plattner and A. E. Desjardins,Fungal Genet. Biol., 2003, 38, 237–249.

51 I. B. Holland and M. A. Blight, J. Mol. Biol., 1999, 293, 381–399.52 J. E. Walker, A. Eberle, N. J. Gay, M. J. Runswick and M. Saraste,

Biochem. Soc. Trans., 1982, 10, 203–206.53 E. Dassa and M. Hofnung, EMBO J., 1985, 4, 2287–2293.54 Y. Quentin, G. Fichant and F. Denizot, J. Mol. Biol., 1999, 287,

467–484.55 L. A. Pearson, M. Hisbergues, T. Borner, E. Dittmann and

B. A. Neilan, Appl. Environ. Microbiol., 2004, 70, 6370–6378.56 M. Yoshida, T. Yoshida, A. Kashima, Y. Takashima, N. Hosoda,

K. Nagasaki and S. Hiroishi, Appl. Environ. Microbiol., 2008, 74,3269–3273.

Nat. Prod. Rep., 2009, 26, 1447–1465 | 1463


Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

57 M. Welker and H. von Dohren, FEMS Microbiol. Rev., 2006, 30,530–563.

58 T. Kaneko, A. Tanaka, S. Sato, H. Kotani, T. Sazuka, N. Miyajima,M. Sugiura and S. Tabata, DNA Res., 1995, 2, 153–166.

59 J. C. Meeks, J. Elhai, T. Thiel, M. Potts, F. Larimer, J. Lamerdin,P. Predki and R. Atlas, Photosynth. Res., 2001, 70, 85–106.

60 Y. Nakamura, T. Kaneko, S. Sato, M. Ikeuchi, H. Katoh,S. Sasamoto, A. Watanabe, M. Iriguchi, K. Kawashima,T. Kimura, Y. Kishida, C. Kiyokawa, M. Kohara,M. Matsumoto, A. Matsuno, N. Nakazaki, S. Shimpo,M. Sugimoto, C. Takeuchi, M. Yamada and S. Tabata, DNA Res.,2002, 9, 123–130.

61 T. Kaneko, N. Nakajima, S. Okamoto, I. Suzuki, Y. Tanabe,M. Tamaoki, Y. Nakamura, F. Kasai, A. Watanabe,K. Kawashima, Y. Kishida, A. Ono, Y. Shimizu, C. Takahashi,C. Minami, T. Fujishiro, M. Kohara, M. Katoh, N. Nakazaki,S. Nakayama, M. Yamada, S. Tabata and M. M. Watanabe, DNARes., 2007, 14, 247–256.

62 G. Christiansen, R. Kurmayer, Q. Liu and T. Borner, Appl. Environ.Microbiol., 2006, 72, 117–123.

63 B. Mikalsen, G. Boison, O. M. Skulberg, J. Fastner, W. Davies,T. M. Gabrielsen, K. Rudi and K. S. Jakobsen, J. Bacteriol., 2003,185, 2774–2785.

64 Y. Tanabe, K. Kaya and M. M. Watanabe, J. Mol. Evol., 2004, 58,633–641.

65 A. Rantala, D. P. Fewer, M. Hisbergues, L. Rouhiainen,J. Vaitomaa, T. Borner and K. Sivonen, Proc. Natl. Acad. Sci.U. S. A., 2004, 101, 568–573.

66 J. E. Becker, R. E. Moore and B. S. Moore, Gene, 2004, 325, 35–42.67 D. Hoffmann, J. M. Hevel, R. E. Moore and B. S. Moore, Gene,

2003, 311, 171–180.68 M. Kaebernick, E. Dittmann, T. Borner and B. A. Neilan, Appl.

Environ. Microbiol., 2002, 68, 449–455.69 M. Kaebernick and B. A. Neilan, FEMS Microbiol. Ecol., 2001, 35, 1–9.70 E. Dittmann, M. Erhard, M. Kaebernick, C. Scheler, B. A. Neilan,

H. von Dohren and T. Borner, Microbiology, 2001, 147, 3113–3119.71 M. C. Moffitt and B. A. Neilan, J. Mol. Evol., 2003, 56, 446–457.72 R. Kellmann, T. K. Mihali, Y. J. Jeon, R. Pickford, F. Pomati and

B. A. Neilan, Appl. Environ. Microbiol., 2008, 74, 4044–4053.73 Z. Q. Beck, C. C. Aldrich, N. A. Magarvey, G. I. Georg and

D. H. Sherman, Biochemistry, 2005, 44, 13457–13466.74 N. A. Magarvey, Z. Q. Beck, T. Golakoti, Y. S. Ding, U. Huber,

T. K. Hemscheidt, D. Abelson, R. E. Moore and D. H. Sherman,ACS Chem. Biol., 2006, 1, 766–779.

75 K. Ishida, G. Christiansen, W. Y. Yoshida, R. Kurmayer,M. Welker, N. Valls, J. Bonjoch, C. Hertweck, T. Borner,T. Hemscheidt and E. Dittmann, Chem. Biol., 2007, 14, 565–576.

76 K. Ishida, M. Welker, G. Christiansen, S. Cadel-Six, C. Bouchier,E. Dittmann, C. Hertweck and N. T. de Marsac, Appl. Environ.Microbiol., 2009, 75, 2017–2026.

77 L. Rouhiainen, L. Paulin, S. Suomalainen, H. Hyytiainen,W. Buikema, R. Haselkorn and K. Sivonen, Mol. Microbiol., 2000,37, 156–167.

78 A. Mejean, S. Mann, T. Maldiney, G. Vassiliadis, O. Lequin andO. Ploux, J. Am. Chem. Soc., 2009, 131, 7512–7513.

79 T. B. Rounge, T. Rohrlack, A. Tooming-Klunderud, T. Kristensenand K. S. Jakobsen, Appl. Environ. Microbiol., 2007, 73, 7322–7330.

80 A. Tooming-Klunderud, T. Rohrlack, K. Shalchian-Tabrizi,T. Kristensen and K. S. Jakobsen, Microbiology, 2007, 153, 1382–1393.

81 N. Ziemert, K. Ishida, A. Liaimer, C. Hertweck and E. Dittmann,Angew. Chem., Int. Ed., 2008, 47, 7756–7759.

82 B. Philmus, G. Christiansen, W. Y. Yoshida and T. K. Hemscheidt,ChemBioChem, 2008, 9, 3066–3073.

83 M. S. Donia, J. Ravel and E. W. Schmidt, Nat. Chem. Biol., 2008, 4,341–343.

84 T. Nishizawa, M. Asayama, K. Fujii, K. Harada and M. Shirai,J. Biochem., 1999, 126, 520–529.

85 T. Nishizawa, A. Ueda, M. Asayama, K. Fujii, K. Harada, K. Ochiand M. Shirai, J. Biochem., 2000, 127, 779–789.

86 R. E. Moore, J. L. Chen, B. S. Moore, G. M. L. Patterson andW. W. Carmichael, J. Am. Chem. Soc., 1991, 113, 5083–5084.

87 K. L. Rinehart, M. Namikoshi and B. W. Choi, J. Appl. Phycol.,1994, 6, 159–176.

88 L. M. Hicks, M. C. Moffitt, L. L. Beer, B. S. Moore andN. L. Kelleher, ACS Chem. Biol., 2006, 1, 93–102.

1464 | Nat. Prod. Rep., 2009, 26, 1447–1465

89 H. Luesch, D. Hoffmann, J. M. Hevel, J. E. Becker, T. Golakoti andR. E. Moore, J. Org. Chem., 2003, 68, 83–91.

90 F. Kopp, C. Mahlert, J. Grunewald and M. A. Marahiel, J. Am.Chem. Soc., 2006, 128, 16478–16479.

91 F. Kopp and M. A. Marahiel, Nat. Prod. Rep., 2007, 24, 735–749.92 Y. Shimizu, M. Norte, A. Hori, A. Genenah and M. Kobayashi,

J. Am. Chem. Soc., 1984, 106, 6433–6434.93 L. C. Gu, T. W. Geders, B. Wang, W. H. Gerwick, K. Hakansson,

J. L. Smith and D. H. Sherman, Science, 2007, 318, 970–974.94 D. L. Burgoyne, T. K. Hemscheidt, R. E. Moore and

M. T. C. Runnegar, J. Org. Chem., 2000, 65, 152–156.95 T. K. Mihali, R. Kellmann, J. Muenchhoff, K. D. Barrow and

B. A. Neilan, Appl. Environ. Microbiol., 2008, 74, 716–722.96 A. C. Jones, L. C. Gu, C. M. Sorrels, D. H. Sherman and

W. H. Gerwick, Curr. Opin. Chem. Biol., 2009, 13, 216–223.97 N. Sitachitta, B. L. Marquez, R. T. Williamson, J. Rossi,

M. A. Roberts, W. H. Gerwick, V. A. Nguyen and C. L. Willis,Tetrahedron, 2000, 56, 9103–9113.

98 Z. X. Chang, P. Flatt, W. H. Gerwick, V. A. Nguyen, C. L. Willisand D. H. Sherman, Gene, 2002, 296, 235–247.

99 D. P. Galonic, F. H. Vaillancourt and C. T. Walsh, J. Am. Chem.Soc., 2006, 128, 3900–3901.

100 P. Flatt, J. Gautschi, R. Thacker, M. Musafija-Girt, P. Crews andW. Gerwick, Mar. Biol., 2005, 147, 761–774.

101 Z. X. Chang, N. Sitachitta, J. V. Rossi, M. A. Roberts, P. M. Flatt,J. Y. Jia, D. H. Sherman and W. H. Gerwick, J. Nat. Prod., 2004, 67,1356–1367.

102 T. W. Geders, L. C. Gu, J. C. Mowers, H. C. Liu, W. H. Gerwick,K. Hakansson, D. H. Sherman and J. L. Smith, J. Biol. Chem.,2007, 282, 35954–35963.

103 L. C. Gu, B. Wang, A. Kulkarni, T. W. Geders, R. V. Grindberg,L. Gerwick, K. Hakansson, P. Wipf, J. L. Smith, W. H. Gerwickand D. H. Sherman, Nature, 2009, 459, 731–735.

104 D. J. Edwards, B. L. Marquez, L. M. Nogle, K. McPhail,D. E. Goeger, M. A. Roberts and W. H. Gerwick, Chem. Biol.,2004, 11, 817–833.

105 D. J. Edwards and W. H. Gerwick, J. Am. Chem. Soc., 2004, 126,11432–11433.

106 K. Irie, S. Tomimatsu, Y. Nakagawa, H. Ohigashi and H. Hayashi,Biosci., Biotechnol., Biochem., 1999, 63, 1669–1670.

107 A. W. Schultz, D. C. Oh, J. R. Carney, R. T. Williamson,D. W. Udwary, P. R. Jensen, S. J. Gould, W. Fenical andB. S. Moore, J. Am. Chem. Soc., 2008, 130, 4507–4516.

108 J. A. Read and C. T. Walsh, J. Am. Chem. Soc., 2007, 129,15762–15763.

109 A. V. Ramaswamy, C. M. Sorrels and W. H. Gerwick, J. Nat. Prod.,2007, 70, 1977–1986.

110 C. T. Calderone, S. B. Bumpus, N. L. Kelleher, C. T. Walsh andN. A. Magarvey, Proc. Natl. Acad. Sci. U. S. A., 2008, 105,12809–12814.

111 S. D. Bentley, K. F. Chater, A. M. Cerdeno-Tarraga, G. L. Challis,N. R. Thomson, K. D. James, D. E. Harris, M. A. Quail, H. Kieser,D. Harper, A. Bateman, S. Brown, G. Chandra, C. W. Chen,M. Collins, A. Cronin, A. Fraser, A. Goble, J. Hidalgo,T. Hornsby, S. Howarth, C. H. Huang, T. Kieser, L. Larke,L. Murphy, K. Oliver, S. O’Neil, E. Rabbinowitsch,M. A. Rajandream, K. Rutherford, S. Rutter, K. Seeger,D. Saunders, S. Sharp, R. Squares, S. Squares, K. Taylor,T. Warren, A. Wietzorrek, J. Woodward, B. G. Barrell, J. Parkhilland D. A. Hopwood, Nature, 2002, 417, 141–147.

112 D. W. Udwary, L. Zeigler, R. N. Asolkar, V. Singan, A. Lapidus,W. Fenical, P. R. Jensen and B. S. Moore, Proc. Natl. Acad. Sci.U. S. A., 2007, 104, 10376–10381.

113 G. L. Challis, J. Med. Chem., 2008, 51, 2618–2628.114 M. Zerikly and G. L. Challis, ChemBioChem, 2009, 10, 625–633.115 J. N. Copp, A. A. Roberts, M. A. Marahiel and B. A. Neilan,

J. Bacteriol., 2007, 189, 3133–3139.116 O. A. Koksharova and C. P. Wolk, Appl. Microbiol. Biotechnol.,

2002, 58, 123–137.117 K. Eppelmann, S. Doekel and M. A. Marahiel, J. Biol. Chem., 2001,

276, 34824–34831.118 B. Julien and S. Shah, Antimicrob. Agents Chemother., 2002, 46,

2772–2778.119 T. Stachelhaus, H. D. Mootz and M. A. Marahiel, Chem. Biol., 1999,

6, 493–505.



Dow

nloa

ded

by U

nive

rsity

of

Wis

cons

in -

Mad

ison

on

20/0

5/20

13 0

1:52

:54.

Pu

blis

hed

on 1

4 Se

ptem

ber

2009

on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/B

8170

74F

View Article Online

120 T. J. McQuade, A. D. Shallop, A. Sheoran, J. E. DelProposto,O. V. Tsodikov and S. Garneau-Tsodikova, Anal. Biochem., 2009,386, 244–250.

121 P. F. Long, W. C. Dunlap, C. N. Battershill and M. Jaspars,ChemBioChem, 2005, 6, 1760–1765.

122 E. W. Schmidt, J. T. Nelson, D. A. Rasko, S. Sudek, J. A. Eisen,M. G. Haygood and J. Ravel, Proc. Natl. Acad. Sci. U. S. A.,2005, 102, 7315–7320.

123 E. W. Schmidt, S. Sudek and M. G. Haygood, J. Nat. Prod., 2004,67, 1341–1345.

124 S. Sudek, M. G. Haygood, D. T. A. Youssef and E. W. Schmidt,Appl. Environ. Microbiol., 2006, 72, 4382–4387.

125 N. Ziemert, K. Ishida, P. Quillardet, C. Bouchier, C. Hertweck,N. T. de Marsac and E. Dittmann, Appl. Environ. Microbiol.,2008, 74, 1791–1797.

126 J. A. McIntosh, M. S. Donia and E. W. Schmidt, Nat. Prod. Rep.,2009, 26, 537–559.

127 P. J. Proteau, W. H. Gerwick, F. Garcia-Pichel and R. Castenholz,Experientia, 1993, 49, 825–829.

128 T. Soule, V. Stout, W. D. Swingley, J. C. Meeks and F. Garcia-Pichel, J. Bacteriol., 2007, 189, 4465–4472.

129 T. Soule, F. Garcia-Pichel and V. Stout, J. Bacteriol., 2009, 191,4639–4646.

130 E. P. Balskus and C. T. Walsh, J. Am. Chem. Soc., 2008, 130, 15260.131 C. M. Sorrels, P. J. Proteau and W. H. Gerwick, Appl. Environ.

Microbiol., 2009, 75, 4861–4869.132 S. Omura, H. Ikeda, J. Ishikawa, A. Hanamoto, C. Takahashi,

M. Shinose, Y. Takahashi, H. Horikawa, H. Nakazawa,


T. Osonoe, H. Kikuchi, T. Shiba, Y. Sakaki and M. Hattori, Proc.Natl. Acad. Sci. U. S. A., 2001, 98, 12215–12220.

133 M. E. Barrios-Llerena, A. M. Burja and P. C. Wright, J. Ind.Microbiol. Biotechnol., 2007, 34, 443–456.

134 S. Donadio, P. Monciardini and M. Sosio, Nat. Prod. Rep., 2007, 24,1073–1109.

135 S. Pasek, in Methods in Molecular Biology, ed. N. H. Bergman,Humana Press, Totowa, 2007, pp. 17–29.

136 S. Pasek, A. Bergeron, J. L. Risler, A. Louis, E. Ollivier andM. Raffinot, Genome Res., 2005, 15, 867–874.

137 R. D. Finn, J. Tate, J. Mistry, P. C. Coggill, S. J. Sammut,H. R. Hotz, G. Ceric, K. Forslund, S. R. Eddy,E. L. L. Sonnhammer and A. Bateman, Nucleic Acids Res., 2008,36, D281–D288.

138 R. Durbin, S. R. Eddy, A. Krogh and G. J. Mitchison, BiologicalSequence Analysis: Probabilistic Models of Proteins and NucleicAcids, Cambridge University Press, Cambridge, 1998.

139 D. W. Udwary, M. Merski and C. A. Townsend, J. Mol. Biol., 2002,323, 585–598.

140 T. Weber, C. Rausch, P. Lopez, I. Hoof, V. Gaykova, D. H. Husonand W. Wohlleben, J. Biotechnol., 2009, 140, 13–17.

141 G. L. Challis, J. Ravel and C. A. Townsend, Chem. Biol., 2000, 7,211–224.

142 I. M. Ehrenreich, J. B. Waterbury and E. A. Webb, Appl. Environ.Microbiol., 2005, 71, 7401–7413.

143 J. M. Winter, M. C. Moffitt, E. Zazopoulos, J. B. McAlpine,P. C. Dorrestein and B. S. Moore, J. Biol. Chem., 2007, 282,16362–16368.

Nat. Prod. Rep., 2009, 26, 1447–1465 | 1465


Mining cyanobacterial genomes for genes encoding complex biosynthetic pathways

Documents

Transcript of Mining cyanobacterial genomes for genes encoding complex biosynthetic pathways