Research Article Transcript Polymorphism Rates in...

13
Research Article Transcript Polymorphism Rates in Soybean Seed Tissue Are Increased in a Single Transformant of Glycine max Kevin C. Lambirth, 1 Adam M. Whaley, 2 Jessica A. Schlueter, 2 Kenneth J. Piller, 3 and Kenneth L. Bost 3 1 Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, North Carolina Research Campus, Kannapolis, NC 28081, USA 2 Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA 3 Department of Biological Sciences, University of North Carolina at Charlotte, Charlotte, NC 28223, USA Correspondence should be addressed to Kevin C. Lambirth; [email protected] Received 29 June 2016; Revised 5 October 2016; Accepted 17 October 2016 Academic Editor: Pierre Sourdille Copyright © 2016 Kevin C. Lambirth et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Transgenic crops have been utilized for decades to enhance agriculture and more recently have been applied as bioreactors for manufacturing pharmaceuticals. Recently, we investigated the gene expression profiles of several in-house transgenic soybean events, finding one transformant group to be consistently different from our controls. In the present study, we examined polymorphisms and sequence variations in the exomes of the same transgenic soybean events. We found that the previously dissimilar soybean line also exhibited markedly increased levels of polymorphisms within mRNA transcripts from seed tissue, many of which are classified as gene expression modifiers. e results from this work will direct future investigations to examine novel SNPs controlling traits of great interest for breeding and improving transgenic soybean crops. Further, this study marks the first work to investigate SNP rates in transgenic soybean seed tissues and demonstrates that while transgenesis may induce abundant unanticipated changes in gene expression and nucleotide variation, phenotypes and overall health of the plants examined remained unaltered. 1. Introduction Over the past three decades, transgenic plant biotechnology has become integrated into the agriculture of nearly all societies worldwide. Genetically modified food crops have revolutionized conventional farming methods by increasing yields [1], bolstering resistance to pests and herbicides [2, 3], and enhancing resistance to environmental stresses [4]. Furthermore, plant systems have been employed for cost- effective production of biological therapeutics such as oral vaccine candidates [5], monoclonal antibody production [6], and accumulation of therapeutic and diagnostic proteins [7, 8]. Coupled with self-regenerating properties and the natural ability to stably store proteins in varied environmental condi- tions, plants offer a multitude of advantages over traditional cell culture systems for protein generation [9]. Over the past 10 years, our laboratory has focused on Glycine max as a model system for the generation and storage of recombinant proteins that are currently expensive to manufacture and difficult to procure or synthesize in bacterial and cell-based cultures and as oral vaccine candidates by targeting gut- associated lymphoid tissues (GALT) for immune stimulation or suppression [5, 8, 10–13]. Advantages for using soybean as an expression system include high protein content and yield, self-fertilization to simplify the generation of homozygous transformants, and the stability of endogenous proteins in seed tissues, which we have reviewed in depth previously [14]. Selective farming of agricultural food sources has been conducted to propagate advantageous phenotypes such as improved germplasm size and quantity, particularly in crop plants of maize, soybean, rice, cereals, and potato [15]. Backcrossing of parents allows for further control over traits seen in progeny, although this process is time consuming and oſten unpredictable, particularly in species that are Hindawi Publishing Corporation International Journal of Plant Genomics Volume 2016, Article ID 1562041, 12 pages http://dx.doi.org/10.1155/2016/1562041

Transcript of Research Article Transcript Polymorphism Rates in...

Page 1: Research Article Transcript Polymorphism Rates in …downloads.hindawi.com/journals/ijpg/2016/1562041.pdfResearch Article Transcript Polymorphism Rates in Soybean Seed Tissue Are Increased

Research ArticleTranscript Polymorphism Rates in Soybean Seed Tissue AreIncreased in a Single Transformant of Glycine max

Kevin C Lambirth1 Adam M Whaley2 Jessica A Schlueter2

Kenneth J Piller3 and Kenneth L Bost3

1Department of Bioinformatics and Genomics University of North Carolina at Charlotte North Carolina Research CampusKannapolis NC 28081 USA2Department of Bioinformatics and Genomics University of North Carolina at Charlotte Charlotte NC 28223 USA3Department of Biological Sciences University of North Carolina at Charlotte Charlotte NC 28223 USA

Correspondence should be addressed to Kevin C Lambirth kclambi1unccedu

Received 29 June 2016 Revised 5 October 2016 Accepted 17 October 2016

Academic Editor Pierre Sourdille

Copyright copy 2016 Kevin C Lambirth et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Transgenic crops have been utilized for decades to enhance agriculture and more recently have been applied as bioreactors formanufacturing pharmaceuticals Recently we investigated the gene expression profiles of several in-house transgenic soybeanevents finding one transformant group to be consistently different from our controls In the present study we examinedpolymorphisms and sequence variations in the exomes of the same transgenic soybean events We found that the previouslydissimilar soybean line also exhibited markedly increased levels of polymorphisms within mRNA transcripts from seed tissuemany of which are classified as gene expression modifiers The results from this work will direct future investigations to examinenovel SNPs controlling traits of great interest for breeding and improving transgenic soybean crops Further this study marks thefirst work to investigate SNP rates in transgenic soybean seed tissues anddemonstrates that while transgenesismay induce abundantunanticipated changes in gene expression and nucleotide variation phenotypes and overall health of the plants examined remainedunaltered

1 Introduction

Over the past three decades transgenic plant biotechnologyhas become integrated into the agriculture of nearly allsocieties worldwide Genetically modified food crops haverevolutionized conventional farming methods by increasingyields [1] bolstering resistance to pests and herbicides [23] and enhancing resistance to environmental stresses [4]Furthermore plant systems have been employed for cost-effective production of biological therapeutics such as oralvaccine candidates [5] monoclonal antibody production [6]and accumulation of therapeutic and diagnostic proteins [78] Coupled with self-regenerating properties and the naturalability to stably store proteins in varied environmental condi-tions plants offer a multitude of advantages over traditionalcell culture systems for protein generation [9] Over the past10 years our laboratory has focused on Glycine max as a

model system for the generation and storage of recombinantproteins that are currently expensive to manufacture anddifficult to procure or synthesize in bacterial and cell-basedcultures and as oral vaccine candidates by targeting gut-associated lymphoid tissues (GALT) for immune stimulationor suppression [5 8 10ndash13] Advantages for using soybean asan expression system include high protein content and yieldself-fertilization to simplify the generation of homozygoustransformants and the stability of endogenous proteins inseed tissues whichwe have reviewed in depth previously [14]

Selective farming of agricultural food sources has beenconducted to propagate advantageous phenotypes such asimproved germplasm size and quantity particularly in cropplants of maize soybean rice cereals and potato [15]Backcrossing of parents allows for further control over traitsseen in progeny although this process is time consumingand often unpredictable particularly in species that are

Hindawi Publishing CorporationInternational Journal of Plant GenomicsVolume 2016 Article ID 1562041 12 pageshttpdxdoiorg10115520161562041

2 International Journal of Plant Genomics

not self-pollinating Segregation of alleles does not alwaysproduce desired functionality in overall phenotype and dueto individual variability it may come at the cost of overallfitness [16]

Glycine max or the modern cultivated soybean is alegume branched from the wild species Glycine soja and isclassified as a diploidized tetraploid (2119899 = 40) Soybean has arelatively lengthy growth cycle (sim6 months) and is primarilycultivated for its seed which is high in both protein and oilcontent [17] It is also primarily a self-pollinating dicot asthe anthers and stigma mature together in the same flower ofmost cultivars preventing cross-pollination with other plants[18] This makes soybean desirable for several reasons Thefirst one is the simplicity of inheritance selections whenbreeding and the other is the high protein content (sim40by weight) of the soybean seed itself which through millionsof years of evolution has been designed to maintain thedurability and stability of its internal cargo until optimumconditions are met for germination Soluble protein extrac-tion from soy seed is also straightforward and has provento be efficient for the generation and purification of vaccineantigens therapeutic proteins and diagnostic peptides [7ndash9] This makes soybean an ideal candidate platform as abioreactor for the generation accumulation and long-termstorage of plant-based biologics [14] removing the trade-offdichotomyof either high yield (leaf tissue expression) or long-term stability (seed tissue)

Single nucleotide polymorphisms (SNPs) and inser-tionsdeletions (INDELS) are major driving forces behindspecies evolution and diversity either changing a singlenucleotide base in the former or shifting the transcriptionalreading frame in the latter The end results depending onthe position of the base changes may alter protein prod-ucts controlling certain traits SNPs may result in benignvariations in humans such as eye color skin color [19] andfinger length changes [20] or alternatively may be responsiblefor debilitating disorders such as lung cancer and multiplesclerosis [21] INDELS and nonsense polymorphisms canhave large effects on gene products disrupting transcriptionstart sites and essential cellular processes Soybean has expe-rienced a dramatic reduction in genomic variability followingthe development of worldwide cultivation practices of thewild ancestor Glycine soja all but eliminating rare alleliccombinations in domesticated varieties [22] Due to thislong-term domestication Glycine max shares only 9765genomic sequence similarity with Glycine soja and contains425 novel genes that are not present in the wild cultivar thatdirectly control seed development oil content and proteinconcentration [23]

Recently we investigated the pleiotropic effects on thetranscriptome of soybean seed tissues expressing and accu-mulating high levels of recombinant protein [24] Resultsrevealed no correlation between transgene or recombinantprotein expression level and the quantity of differentially reg-ulated genes although one of the transgenic lines containedover 3000 differentially expressed genes We concluded thatthe gene expression differences observed may have been dueto the specific properties of the recombinant proteins them-selves on the homeostatic environment of the seed or due to

random mutagenesis however molecular characterizationsof transcript structure and internal modifications remainedunknown Given the ability of single base mutations tocause severe effectual repercussions in conjunction with thehigh number of differentially expressed genes we previouslydetected in our transgenic soybean line we explored whethertranscript polymorphism rates were influenced in the samemanner

In order to elucidate possible transcript modifications asa result of transformation we utilized the publicly availableRNA-seq datasets generated by our previous transcriptomesequencing study [24] to assess exome INDELs and SNPrates in all of the transgenic samples investigated previouslyThis allowed us to examine gene variances exclusive to thetransgenic plants and to characterize the possible resultingeffects of these polymorphisms in conjunction with thedifferentially expressed genes we previously characterized Toour knowledge this is the first effort to characterize exomepolymorphisms in transgenic soybean seed tissues and willserve to further interpret alterations and their possible effectsresulting from transgenesis

2 Materials and Methods

In this study we utilized datasets obtained from whole-transcriptome sequencing of transgenic soybean seed tissuesexpressing three different transgenes to detect exomic poly-morphisms Each experimental group consisted of nine seedsfrom each of the three transgenic events (designated ST111ST77 and 764) as well as the wild type controls All transgenicplants were generated using Agrobacterium transformationand taken to homozygosity over multiple generations priorto sequence acquisition The transgenes and recombinantproteins expressed and accumulated in the seed tissues havebeen described previously by our lab [24] Quality controlfiltering and assembly were conducted as part of the previousstudy prior to polymorphism detection and annotation withsamtools and bcftools

21 Exon SNPINDEL Analysis Soybean cultivation RNAextraction and differential gene expression analyses werecarried out using RNA-seq as previously described in Lam-birth et al [24] In order to facilitate exome SNP callssamtools version 12 [25 26] was used to index the soybeanreference genome version 275 sequence file obtained fromPhytozome [27] which was amended to contain scaffoldsof all three T-DNA sequences using the -faidx commandto allow alignment of nonreference transgene transcriptsAlignment files previously generated by TopHat [28] forall previously reported samples [24] in bam format wereconverted to bcf files using the -mpileup command with-g and -f parameters to specify the output format andto use the indexed reference fasta file The bcftools callcommand was subsequently used on the indexed bcf fileswith the -c parameter to invoke the original consensus callingmethod enabling SNP and INDEL identification The statand plot-vcfstats commands were used to generate statisticalsummaries for each sample Total SNP counts for eachsample were averaged to calculate the standard deviation and

International Journal of Plant Genomics 3

standard error for each experimental group of wild type andtransgenic seeds and unpaired two-tailed 119905-tests were usedto compare each transgenic group with wild type Individualnucleotide base changes transition and transversion ratesINDELS and single and multiallele SNPs were also recordedand compared between groups To identify SNPs present ineach experimental group the intersections of each variantfile for all nine individual replicates in the ST77 ST111 764and WT groups were taken SNPs in these combined filesoverlapping between all three transgenic groups as well asthose unique to each event when compared to WT werepulled from the vcf files using the -vcf-isec command invcftools [29] SNPs shared between the transgenics and WTcontrol group were removed to generate vcf files containingSNPs found exclusively in each transformation event

Visualization of SNPs and INDELS was conducted usingCircos version 069-2 [30] Variance files in bcf format fromthe samtoolsSNPeff output were converted to vcf files withthe -bcftools view -o command then using the cut -f 126command base calls quality score metrics and chromo-somal locations were extracted Formatted tab delimitedfiles were converted to the input text file for Circos using anin-house script generated in Python (version 2711) whichfiltered the calls to exclude all polymorphisms with qualityscores below 15The script is included in the Additional Filesof this manuscript (in Supplementary Material availableonline at httpdxdoiorg10115520161562041) Plots werethen generated using the -circos -conf circusconf commandto display SNP distributions of the transgenic and wild typesamples for all 20 soybean genomic chromosomes SNPswere spaced in the plots every 20 bases and layered vertically

To predict any possible translational effects resultingfrom detected SNPs and INDELS snpEff version 41i [31]was utilized on the resulting variance call files generatedby bcftools A custom database for snpEff was constructedusing the -build command consisting of the soybean genomeFASTA reference file described previously containing our T-DNA sequences as well as the gene model gff3 file fromthe Cufflinks output from our previous RNA-seq data [24]obtained from PhytozomeThe gff3 file provided a referenceindex for gene positions and identifiers as well as intronexonmodels and untranslated regions No codon table configura-tion was necessary as Glycine max utilizes standard codontriplets allowing snpEff to run with the default parametersemploying the SNPINDEL call vcf file from bcftools asinput Multithreaded processing using the -t option was notused as this removes statistical calculations and the resultingreports from the output file Statistical comparisons of SNPrates and effects between each group were conducted usingtwo-tailed unpaired 119905-tests in Microsoft Excel Functionalannotation of genes containing variants was accomplished byloading the gene output list from snpEff into the agriculturalgene ontology (GO) enrichment tool AgriGO [32] using theintegrated single enrichment analysis tool with the Glycinemax Wm82a2v1 background reference Significant termswere calculated using Fisherrsquos exact test with a corrected falsediscovery rate 119901 value threshold of 005 All computationswere conducted on an Apple Macbook Pro with a 27GHzquad-core Intel i7 processor and 16GB of DDR3 RAM

3 Results

31 SNP Rates in Transgenic Soybean Exons Total SNPsdetected by samtools and bcftools across the wild typecontrol group reported a mean of 20707 SNPs per sample(Figure 1(a)) with a mean SNP rate of one polymorphismfor every 42561 nucleotides (Figure 1(b)) Of these 48 onaverage are single allele polymorphisms with 35 multiallelesites containing 23 multiallelic SNPs An overall transition (apurine base changed to another purine base or pyrimidinebase changed to another pyrimidine base) to transversion (apurine base to pyrimidine base or pyrimidine base to purinebase alteration) ratio (TsTv) of 162 (Figure 1(d)) and anaverage of 1862 INDELS were identified per sample in thewild type group (Figure 1(c)) Mean base changes detectedwere as follows 888 ArarrC 3458 ArarrG 1428 ArarrT 793CrarrA769 CrarrG 2977 CrarrT 2941 GrarrA 787 GrarrC 818 GrarrT 1495TrarrA 3433 TrarrC and 944 TrarrG (Figure 2(a))

The ST111 experimental group reported an average of20208 SNPs per individual sample (Figure 1(a)) with apolymorphism rate of one SNP for every 44323 nucleotides(Figure 1(b)) Similar to wild type 48 of the SNPs in theST111 line were single allele alterations with an average of 34multiallele sites and 25 multiallele SNPs The TsTv ratio wasagain very similar to wild type at 163 (Figure 1(d)) with anaverage of 1750 INDELS per sample (Figure 1(c)) Per basechanges reported for the ST111 group were 933 ArarrC 3457ArarrG 1382 ArarrT 747 CrarrA 710 CrarrG 2868 CrarrT 2830GrarrA 754 GrarrC 779 GrarrT 1420 TrarrA 3415 TrarrC and 968TrarrG (Figure 2(b))

The ST77 experimental group was also comparable towild type with 21666 average SNPs per sample (Figure 1(a))at a rate of 41225 nucleotides for every polymorphism(Figure 1(b)) Forty-nine percent of the detected SNPs werereported as singletons with 33 identified multiallele SNPsites and 28 multiallele SNPs The TsTv ratio for the ST77group was slightly lower than wild type at 156 (Figure 1(d))indicating a slightly higher transversion rate in the ST77Dand ST77F siblings INDELS were nearly identical in numberand size to the wild type control group with an average totalof 1829 per sample (Figure 1(c)) Nucleotide base changesinclude 1119 ArarrC 3650 ArarrG 1468 ArarrT 837 CrarrA 772CrarrG 3015 CrarrT 2952 GrarrA 796 GrarrC 842 GrarrT 1520TrarrA 3589 TrarrC and 1135 TrarrG (Figure 2(c))

Lastly the 764 experimental group contained more sub-stantial nucleotide alterations when compared to wild typeand the other two experimental groups (ST111 and ST77)revealing an average SNP count of 38188 per individual(Figure 1(a)) nearly double the quantity of any other groupSNPs were also encountered on average at nearly doublethe frequency of the other samples with one SNP recordedevery 24281 bases (Figure 1(b)) In this group only 27 ofSNPs were reported as singletons with 78 multiallele sitesand 44 multiallele SNPs The TsTv ratio was the lowestof all four groups at 153 (Figure 1(d)) indicating the 764group had the highest overall rate of nucleotide transversionsINDELS also increased to 2390with a larger deviation spreadbetween samples (Figure 1(c)) Base changes included 1972ArarrC 6094 ArarrG 2596 ArarrT 1603 CrarrA 1386 CrarrG

4 International Journal of Plant Genomics

Total SNPs

nsns

000500000

100000015000002000000250000030000003500000400000045000005000000

Wild typeST111

ST77764

p = 4e minus 5lowastlowastlowast

(a)

Bases per SNPns

ns

000500000

100000015000002000000250000030000003500000400000045000005000000

Wild typeST111

ST77764

p = 2e minus 7lowastlowastlowast

(b)

INDELS

ns ns

000

50000

100000

150000

200000

250000

300000

Wild typeST111

ST77764

p = 00181lowast

(c)

Transitiontransversion rations

ns

144146148150152154156158160162164166

Wild typeST111

ST77764

p = 2e minus 6

lowastlowastlowast

(d)

Figure 1 Summary of SNP and INDEL counts Means of total polymorphisms (a) polymorphism rates (b) insertionsdeletions (c) andtransition to transversion ratios (d) were calculated all four experimental groups Standard error bars aswell as statistical results fromunpaired119905-tests between each group and wild type are shown Groups marked with an asterisk indicate significance (119901 lt 005 = lowast 119901 lt 0001 = lowastlowastlowast)Wild type is shown in blue ST111 in red ST77 in green and 764 in purple

5445 CrarrT 5461 GrarrA 1417 GrarrC 1639 GrarrT 2576 TrarrA6104 TrarrC and 1940 TrarrG (Figure 2(d))

Polymorphic sites unique to each transgenic event wereidentified by comparing each transgenic variant list to bothWT and the remaining two transgenic groups The ST77transgenic group shared 2196 common gene variants withWT and exhibited 981 unique polymorphisms The ST111transgenic line revealed similar values sharing 1983 genevariants with the controls while the 764 group shared 1912variants The ST111 and 764 lines each reported 927 and7717 unique gene variants respectively Transgenic linesalso revealed 127 gene variants shared between the threetransgenic groups

Overall SNP calls clearly illustrated that the 764 exper-imental group contained significantly increased levels ofpolymorphisms over the wild type controls and other trans-genic experimental groups while concurrently demonstrat-

ing an increased quantity of more acute transversion typemutations Average total SNPs INDELS SNP rates andtransitiontransversion ratios for each experimental groupare shown in Figure 1 and average individual nucleotidechanges for each group are shown in Figures 2(a)ndash2(d)

32 Classification of Predicted SNP Effects and ImpactsFollowing SNP and INDEL detection with samtools snpEffversion 41i [31] was used to evaluate potential alterations tothe exome resulting from these changes SnpEff annotatesvariants based on their genomic locations including intronsexons upstreamor downstream splice sites and untranslatedregions at 51015840 and 31015840 sequence ends Effects were groupedand sorted according to the variant type potential level ofeffect impact functional class type and region No multi-nucleotide polymorphisms were detected in any individualof all four experimental groups and modifications were

International Journal of Plant Genomics 5

000

100000

200000

300000

400000

500000

600000

700000

Wild type

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Silent3892

Nonsense311

Missense5797

Other051

Downstream2253

Exon1980

Intergenic263

Intron2776

Splice siteregion199

Upstream1537

510431

UTR 5998400

UTR 3998400

000

100000

200000

300000

400000

500000

600000

700000

Transgenic

39

3

58

046

2332

1968

283

2675183

1578

521

414

Silent

Nonsense

Missense

Other

Downstream

Exon

Intergenic

Intron

Splice siteregion

Upstream

UTR 5998400

UTR 3998400

000

100000

200000

300000

400000

500000

600000

700000

041

2329

2137

291

2491213

1516

555427

3877

3095814

Silent

NonsenseMissense

Other

Downstream

Exon

Intergenic

Intron

Splice siteregion

Upstream

UTR 5998400

UTR 3998400

000

100000

200000

300000

400000

500000

600000

700000

4448

181

5371

003

2505

1994

377

2357125

1619

Silent

Nonsense

Missense

Other

Downstream

Exon

Intergenic

Intron

Splice siteregion

Upstream

UTR 5998400

UTR 3998400

409

584

ArarrC

ArarrG

ArarrT

CrarrA

CrarrG

CrarrT

GrarrA

GrarrC

GrarrT

TrarrA

TrarrC

TrarrG

Figure 2 Polymorphism nucleotide base changes and predicted effects Mean rates of change for each nucleotide base are shown for wildtype (a) ST111 (b) ST77 (c) and 764 (d) with bars indicating the standard error of the mean Regions containing detected polymorphismsand their functional classifications are shown for wild type (e) ST111 (f) ST77 (g) and 764 (h)

6 International Journal of Plant Genomics

limited to SNPs insertions and deletions of bases Acrossall experimental groups polymorphisms classified as loweffect comprise sim10 of the total while moderate effects andhigh effects average 12 and 2 respectively The largestcategory of impact classification is the genomic modifiercategory comprising 72ndash78 of detected polymorphisms inall groups and 60 of shared variants between all transgenics(see snpEff reports in Supporting Data) Missense mutationsare highly prevalent in all four groups averaging sim60in wild type ST111 and ST77 events with a missense tosilent mutation ratio of sim16 (Figures 2(e)ndash2(g)) The 764group exhibited a slightly lower missense percentage ofsim53 however it also contained a higher percentage ofsilent mutations at sim45 compared to the other three groups(sim37) generating a missensesilent mutation ratio of sim120(Figure 2(h)) Nonsense mutations comprised sim3 of thetotal reported SNP effects for the wild type ST111 and ST77groups and sim18 in the 764 group (Figures 2(e)ndash2(h))

All four experimental groups showed a high prevalenceof SNPs of common types namely resulting in down-stream gene variants (sim24) intron variants (sim25) mis-sense variants (sim11) synonymous variants (sim7) andupstream gene variants (sim16)The most common locationsfor these variants are intron regions (sim25) the down-stream transcription start site (sim23) exons (sim23) introns(sim23) and the upstream transcription start site (sim16)The 764 group distribution of polymorphism types andaffected regions seem to mirror those of the wild typeST111 and ST77 groups with a slight reduction of SNPs inexonic regions All individuals also reported variants within51015840 and 31015840 untranslated regions (sim4 and sim6 resp) aswell as minimal detection of frameshifts stop and startvariants (lt1) Regions and predicted effects of detectedpolymorphisms are shown in Figures 2(e)ndash2(h)

Transition base changes of AharrG and CharrT were mostprevalent in all four experimental groups while the transver-sion AharrT was sim50 more common than AharrC CharrG orGharrT transversions Codon changes varied most commonlyat the wobble base position typically resulting from a tran-sition in both wild type and transgenic samples FollowingSNP patterns previously described the prevailing modifiedcodons appeared commonly shared between all individualsmeasured Base changes in the first and second positionsof the codon were much less frequent across all groupsHeat maps of the codon base changes representative for eachexperimental group are shown in Supplemental Figure 1Predicted amino acid changes resulting from codon alter-ations predominantly consist of synonymous substitutionshowever other frequently altered residues include valineto alanine alanine to valine leucine to phenylalanine andproline glutamate to glycine and lysine serine to proline andphenylalanine to leucine Supplemental Figure 2 shows thedistribution heat maps of detected amino acid substitutionaverages representative for all events

Minor changes were also detected within the additionalgenomic scaffolds containing the transgene sequences sup-plemented to the alignment reference file Specifically snpEffreported SNPs in the transgenes of samples 764B3 764H3and 764K1 at position 80 ST77D2 at position 1400 ST77F1

at positions 500 and 700 ST77J3 at position 600 ST111B3 atposition 800 ST111I3 at position 190 and ST111K1 ST111B1and ST111K2 at position 1010 Read alignments were unableto verify all called variants in the transgenes likely due toareas of low coverage however one area was corroborated asa consistent variant in the fillerDNAof all constructs betweenthe right border and promoter region (Supplemental Figure3) which was included as a control Supplemental Table 1summarizes all SNP calls and between group statistical testsSummaries of all SNPINDEL calls from bcftools and theirrespective effects from snpEff along with chromosomal dis-tribution plots additional statistics and lists of SNPs sharedbetween groups and within each group are available from theiPlant collaborative Discovery Environment [33] Links to therespective datasets are provided in the SupportingData of thismanuscript

33 Gene Ontology Analysis of Genes Containing SNPs Giventhe extent of detected polymorphic sites and the possibilitiesof effectual contributions to gene expression changes previ-ously documented in the 764 transgenic line it was necessaryto investigate genes containing SNPs and INDELS withgene ontology to identify the possible functional pathwayscontrolled by the affected transcripts Complete gene listscontaining SNPs provided by the output of snpEff for thewild type (36959 transcripts) ST77 (38691 transcripts)ST111 (34142 transcripts) and 764 (43426 transcripts) lineswere loaded into the AgriGO online gene ontology analysistool and subjected to a single enrichment analysis withmultiple corrections to deduce the functional roles of poly-morphic genes Out of the total Glycine maxV21 backgroundset of 29501 GO terms wild type matched 11245 ST77matched 11660 ST111 matched 10326 and 764 matched13321 GO categories were similar between the wild typeST77 and ST111 groups with minor variations with themajority of terms putatively annotated to translational orRNA processes (Figures 3(a) 3(b) and 3(d)) Intracellulartransport also constituted a large portion of the identifiedcategories (sim12 of total annotations) and in the case ofthe 764 transgenic line was the only other significant GOterm following RNA processing and binding (Figure 3(c))Three categories including ATP-dependent helicase activityubiquitin-dependent protein catabolic processes and ncRNAmetabolic processes were all unique to the ST77 and ST111lines (Figures 3(b) and 3(d)) GTP binding was exclusive tothe ST111 line comprising 21 of the total significant GOterms for the event Figure 4 shows all detected SNPs in achromosomal density map for all four experimental groups

Annotated GO terms for SNPs exclusive to each trans-genic event displayed similar functions with ST77 reporting295 gene variants involved with intracellular organellesphotosystem II cytoskeletal components and protein com-plexes ST111 displayed similar numbers of terms totaling 273annotations including photosynthesis and photosystem IIintracellular organelles ribonucleoprotein complexes thy-lakoids protein complexes and oxidoreductase activity The764 line exhibited the highest number of unique annotatedvariants with 2366 involving translation gene expressionbiosynthetic processes ribosomal structures and protein

International Journal of Plant Genomics 7

Wild typeaminoacylation

tRNA

for proteintranslation

5RNA binding

23

Vesicle-mediatedtransport

13Intracellularprotein transport

12

mRNA metabolic process

4

RNA processing13

Translation30

(a)

ST77ATP-dependenthelicase activity

10

RNA binding19

Ribonucleoproteincomplex biogenesis

4

Intracellularprotein transport

11

ncRNAmetabolic process

10

RNA processing12

Translation27

Ubiquitin-dependentprotein catabolic

process7

(b)

764

Intracellulartransport

25

RNA processing27

RNA binding48

(c)

ST111 ATP-dependenthelicase activity

13Intracellular

proteintransport

13

Ubiquitin-dependent

protein catabolicprocess

10

ncRNA metabolicprocess

13

mRNA processing4

RNA binding26

GTP binding21

(d)

Figure 3 AgriGO enriched gene ontology categories of SNP-containing genes Single enrichment analyses for all genes containing SNPs areshown for wild type (a) ST77 (b) 764 (c) and ST111 (d) Percentages are reflective of how many terms matched the indicated GO group outof the total matches to the background reference

complexes Of the 127 polymorphic genes shared by alltransgenic specimens 40 annotations returned 10 GO termspertaining to intracellular organelles macromolecular com-plexes and protein complexes Specific GO terms for eachgroup are shown in Figure 3 and complete AgriGO featurerelationship trees are shown in Supplemental Figure 4

4 Discussion

SNP rates have been previously evaluated in soybean usingmultiple fragment analysis on several genotypes of culti-vated varieties including Lincoln Mandarin Peking andRichland [34 35] From this fragment analysis transitionand transversion rates were reported to be nearly identicalbetween cultivars at 48 and 52 respectively Furthermorenucleotide diversity rates of cultivated soybean varietieswere estimated to be 5ndash8-fold less than the wild varietyGlycine soja and occur at an even lower frequency than thehighly characterizedArabidopsis thaliana self-crossingmodel[34] More recent investigations of SNPs and INDELS in

soybean using next generation resequencing have revealedgenome-wide polymorphisms in an effort to identify nativedisease resistant and favorable trait loci [36 37] Furtherstudies demonstrate the extensive narrowing of soybeangenomic variation due to selective pressures resulting fromdomestication practices which favor more valued traits suchas increased seed mass and oil content quantitative traitloci [38] In addition the self-pollinating nature of Glycinemax also contributes evolutionarily to the constriction ofgenomic variances in domesticated cultivars Thus withsuch selective pressures bottlenecking natural variations incultivated soybean varieties it is quite intriguing to seesuch consistently increased numbers of SNPs in only onetransgenic experimental group with no definitive initiatingevent

SNPs detected in our datasets were divulged from seedtranscriptome sequences which represent sim65 of the totalsoybean genome and 65 of total genomic protein codingsequences The SNPs described here are not considered tobe representative of the frequencies that may be detected in

8 International Journal of Plant Genomics

Wild type

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20

(a)

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20 1

chr2

chr3

chr4

ch6

chrhr7

chr8

chr9ch 1

chr12

chr13

chr14

chr17

chr18

chr19ch

ST77

(b)chr1

chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20 1

chr2

chr3

chr4

ch6

chr7

chr8

chr9ch 1

chr12

chr13

chr14

chr17

chr18

chr19ch

764

(c)

ST111

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20

(d)

Figure 4 Overview of the chromosomal distribution of detected transcript SNPs Aligned SNPs are shown for each soybean chromosomein the wild type controls (a) ST77 (b) 764 (c) and ST111 (d) transgenics The colored bars indicate each chromosome scaled to length witha bold outer hash mark representing every 10Mb of sequence and each nonbolded outer hash mark placed every 1Mb Each internal tickrepresents a detected SNP on which subsequent SNPs are stacked vertically from each transcript at that region

other soybean tissues or in different developmental stagesof these specific transgenic plants rather they demonstrateincreased transcript polymorphism rateswitnessed in a singlesoybean transformant out of several different experimentalgroups Although RNA-seq is focused on the functionalsegments of the genome and does not always capture regu-latory regions such as promoters or nontranscribed regions(eg methylated bases) one aspect of the resulting effectsof polymorphisms in these regions can be directly witnessedthrough examination of gene expression levels Because thesoybean expression systemdescribed previously by our lab [5

7 8 13 14 39] specifically targets seed tissue for recombinantprotein accumulation it was of great interest to identifypossible sequence alterations that may have resulted in geneexpression or protein structure variations in seeds which todate has not been investigated

Base changes for wild type ST77 and ST111 events wereall comparable while the 764 event consistently reportedSNP rates nearly double that of the other groups (1 SNPdetected every sim22000 bases) which is still well below thepreviously reported SNP rate of 1 SNP per sim1400 nucleotidesin soybean seeds [40] Nevertheless SNP base changes

International Journal of Plant Genomics 9

appeared to follow the same pattern of commonality withall groups demonstrating high percentages of transition basechanges with relatively low transversion counts The 764 linedid demonstrate a lower TsTv ratio than the other groups(Figure 1(d)) although progeny from two independent par-ents in the ST77 line (ST77D and ST77F) exhibited lowerratios as well indicating that this characteristic alone cannot reliably indicate internal stresses or overall divergencefrom our controls The majority of detected base changeswere located in the third base of the codon likely gener-ating synonymous (silent) polymorphisms which were themajority of predicted SNP classifications in all events Lesscommon changes occurred in the first or second bases of thecodon generating a nonsynonymous or missense mutationlikely resulting in an amino acid change (Figures 2(e)ndash2(h))Missense to silent ratios detected in all experimental groupsappear to be higher than previously published results forsoybean indicated [40] although the higher nonsynonymousto synonymous mutations may be due to high linkagedisequilibrium exhibited in soy [41]

Interestingly the 764 line had the lowest overall missenseto silent polymorphism ratio demonstrating that while the764 line contained the highest overall number of SNPs ahigher percentage of the total polymorphisms were silentmutations (sim45) compared to the other three experimentalgroups (Figure 2(h)) This reveals the possibility that SNPsand INDELS detected here had originated from the origi-nal transformation event and have been highly conservedthrough self-crossed generations of progeny particularlyconsidering how remarkably well conserved the attributesof detected SNPs were between different biological samplesof the same transformation event Without sequencing datafrom prior generations of each parental line tested herefor comparison this can not be confirmed with completeconfidence however this does pose an interesting inquirythat even silent SNPsmay have the ability to alter gene expres-sion and peptide structure Indeed it has been suggestedthat although synonymous polymorphisms code for identicalamino acids subtle changes resulting from the utilization ofnonoptimal synonymous codons can modulate downstreamtranscription and translation rates [42] thereby altering thefolding conformations and proposed function of the finalpeptides This may explain why the 764 transgenic eventexhibited the highest rates of differentially expressed genesin conjunction with polymorphisms but still maintained ahealthy phenotype

While many SNP studies have excluded synonymousmutations and those present in noncoding regions recentworks have examined these sequences in light of theirpossible epigenetic effects on transcription and translationhowever more advanced tools for the prediction of possibleeffects are still in development and require further testingfor reliable forecasts [43] Nonsynonymous polymorphismsappear at similar rates across the experimental groupsvarying between 50 and 60 of total functional SNP clas-sifications While nonsynonymous mutations nearly alwaysalter amino acid sequences that can potentially yield non-functional protein products this phenomenonmay produce aneutral effect on the organism or protein or activate signaling

cascades in redundant systems [43] The effects of thesedetected nonsynonymous polymorphisms in each transgenicevent are not known or identifiable without extensive pro-teomic investigations into the altered downstream transcriptproductsThe 764 line which previously displayed the largestdegree of differential gene expression exhibited the lowestpercentage of missense and nonsense polymorphisms there-fore these alone are not likely to be the root cause of observedchanges in this transgenic line

Spontaneous mutations may also occur as a result of apreviously occurring stressful event such as parental lineAgrobacterium transformation Epigenetic alterations andnucleotide transpositions have been detected up to fivegenerations forward from transformation in Arabidopsis [4445] however all seeds examined here were bred to orbeyond the 5th generation of progeny The causes for thesemutations are unknown however each transformant was anindependent transformation event and was likely exposedto a varying degree of stress during the process This couldpotentially have introduced genomic variations that havebeen carried forward to the transcriptome as even largecopy number of variations have been reported in plants asa response to stress [46] Segregation of SNPs from existingparental heterozygous loci is expected in progeny howeverall samples examined here were remarkably consistent inSNP numbers location distributions and types within theirexperimental groups further demonstrating the genomicstability of these events This further solidifies the possibilitythat the detected variations in all transgenic lines arose froman early parental event possibly occurring during the originalAgrobacterium transformation or planting event and havebeen stably integrated and carried forward to the generationsexamined Considering the consistency of the SNP patternsthese polymorphisms have likely originated at the genomiclevel and are not likely to appear in such large quantities dueto transcriptional proofreading errors

Detected alterations occurred in a ldquobowtierdquo-shapeddistribution across the 20 chromosomes with lowerinstances of SNPs near the centromeres and the highestincidence of SNPs near the euchromatic telomeric ends of thechromosomal arms This is where gene density is the highestwhich has been illustrated previously by several worksconducted on soybean polymorphism rates [36 38 47] OursnpEff dataset summaries available in the Supporting Dataof this manuscript show the same distribution pattern withno significant alterations of SNP rates between biologicalreplicates of progeny Without complete characterization ofthe targeted SNPs in this work future directions can invokefunctional characterizations to cross-reference detectedpolymorphisms with possible connections to differentiallyexpressed genes and also to improve existing annotatedWilliams 82 cultivar SNPs In order to fully deduce the originof SNP variations with high confidence multiple generationanalyses will need to be conducted Approximately 70 ofthe genes that were differentially expressed also containedSNPs or INDELS however due to the extreme disparityin size between each gene list (a maximum of sim3000 andsim40000 genes resp) this could simply be due to chanceand can not definitively hold concrete biological significance

10 International Journal of Plant Genomics

without further investigation This suggests that the SNPsexamined here may directly impact native gene expressionlevels possibly due to their locations within gene regulatoryregions or enhancer elements located near the telomericends of the chromosomal arms

Gene ontology terms from SNP gene lists of all experi-mental groups were very similar returning many enrichedterms regarding protein transport and localization as wellas RNA and transcriptional processes even in the wild typegroup (Figure 3 and Supplemental Figure 4) Interestinglythese relationships are highly similar to the enriched GOterms derived from the differentially expressed gene sets ofthe transgenic lines indicating the majority of functionalrelationships between these polymorphisms are likely arisingfrom intercultivar variations and are not specifically relatedto transgenesis effects However given the possibility thatmutations from transgenesis could occur within QTLs thatcontrol traits such as plant size flower count and seedquantity identification of novel polymorphisms in theseregions of soybeanrsquos transcriptome is of great importanceto both plant breeders and in biotechnology applicationsIn fact recent investigations have revealed an excess of9 million unique SNPs that occur between 106 differentcultivated soybean varieties [48] Here we have providedlists of exonic SNPs unique to each of the three transgenicgroups and specific to transformation as well as novel SNPsdetected in our Williams 82 wild type controls Furtherinvestigations of the events described will identify the sitesof T-DNA integration and characterize the surroundingsequences to determine whether increased incidences ofpolymorphisms occur within the immediate vicinities of theinsertions Unique SNPs identified and reported here may beused to aide these investigations of crucial trait loci in thefuture to maximize yields and assist in breeding programs oftransgenic specimens

5 Conclusions

This study describes the first direct comparison of poly-morphism rates present in the seed transcriptome of threeindependent transgenic lines of Glycine max with a non-transgenic wild type line all derived from the Williams 82soybean cultivar Previously we examined gene expressionchanges in these transgenic events finding only the 764line to be significantly altered when compared to the othertransgenic lines In the same manner we have expanded ourprevious work and reported transcriptome single nucleotidepolymorphism rates for the ST77 ST111 and 764 trans-genic events compared to wild type Mirroring the resultsfrom our differential expression study the samples of the764 transformation event exhibited transcript SNP ratesnearly double that of the wild type ST77 and ST111 eventswhile also demonstrating higher amounts of transversionmutations While the exact cause for both the large num-ber of differentially expressed genes and polymorphismspresent in the 764 event is not known it is highly likelythat the two phenomena are correlated Our results showthat in addition to differential expression transgenesis mayresult in unpredicted base changes within gene transcripts

and regulatory regions without impacting plant pheno-type

Availability of Supporting Data

The RNA sequencing data described herein may be accessedat httpwwwncbinlmnihgovgeoqueryacccgiacc=GS-E64620 through NCBIrsquos Gene Expression Omnibus underaccession number GSE64620 Summary files for bcftoolsand snpEff outputs encompassing variant calls and predictedvariant effects are available from the iPlant CollaborativeDiscovery Environment directory available at httpdeiplan-tcollaborativeorgdld44044BEE-69B8-4129-A6B5-4C9291-B6B4A1snpeffsummarieszip and httpdeiplantcollabora-tiveorgdldAD464040-9169-4E52-87D5-DEF7466A2C1Evariantsummarieszip Lists of genes containing detectedvariants from all samples are available from iPlant in acompressed zip archive available at httpdeiplantcollabor-ativeorgdld533570A3-1EFB-4864-B9A9-9D82F17E09A8snpeffgeneszip

Competing Interests

Theauthors of this manuscript declare no conflict of interests

Authorsrsquo Contributions

Kevin C Lambirth designed the experiment conducted theSNP detection and effect prediction and generated figuresAdamMWhaley compiled and installed software formattedsequencing data files and was consulted on analysis KevinC Lambirth Kenneth J Piller and Kenneth L Bost wrote themanuscript with input from all authors

Acknowledgments

The authors would like to acknowledge the sequencingefforts of Dr Mike Wang (David H Murdock ResearchInstitute) and Dr Laura Reichenberg (Pfeiffer University) forassistance with plant propagation and selection and Dr KanWang (Iowa State University Agronomy Department) for theestablishment of the ST111 transgenic soybean line

References

[1] A Ziemienowicz ldquoAgrobacterium-mediated plant transforma-tion factors applications and recent advancesrdquoBiocatalysis andAgricultural Biotechnology vol 3 no 4 pp 95ndash102 2014

[2] M S Koch J M Ward S L Levine J A Baum J L Viciniand B G Hammond ldquoThe food and environmental safety of Btcropsrdquo Frontiers in Plant Science vol 6 article 283 2015

[3] G M Dill C A CaJacob and S R Padgette ldquoGlyphosate-resistant crops adoption use and future considerationsrdquo PestManagement Science vol 64 no 4 pp 326ndash331 2008

[4] M A S Valente J A Q A Faria J R L Soares-Ramos et alldquoThe ER luminal binding protein (BiP) mediates an increase indrought tolerance in soybean and delays drought-induced leafsenescence in soybean and tobaccordquo Journal of ExperimentalBotany vol 60 no 2 pp 533ndash546 2009

[5] L C Hudson B S Seabolt J Odle K L Bost C H Stahl andK J Piller ldquoSublethal staphylococcal enterotoxin B challenge

International Journal of Plant Genomics 11

model in pigs to evaluate protection following immunizationwith a soybean-derived vaccinerdquoClinical andVaccine Immunol-ogy vol 20 no 1 pp 24ndash32 2013

[6] Y Zhang D Li X Jin and Z Huang ldquoFighting Ebola withZMapp spotlight on plant-made antibodyrdquo Science China LifeSciences vol 57 no 10 pp 987ndash988 2014

[7] R Powell L C Hudson K C Lambirth et al ldquoRecombinantexpression of homodimeric 660 kDa human thyroglobulin insoybean seeds an alternative source of human thyroglobulinrdquoPlant Cell Reports vol 30 no 7 pp 1327ndash1338 2011

[8] K L Bost K C Lambirth L C Hudson and K J PillerldquoSoybean-derived thyroglobulin as an analyte specific reagentfor in vitro diagnostic tests and devicesrdquo Advances in Medicineand Biology vol 80 pp 23ndash40 2014

[9] R Abiri A Valdiani M Maziah et al ldquoA critical review ofthe concept of transgenic plants insights into pharmaceuticalbiotechnology andmolecular farmingrdquo Current Issues in Molec-ular Biology vol 18 no 1 pp 21ndash42 2016

[10] J L Oakes K L Bost and K J Piller ldquoStability of a soybeanseed-derived vaccine antigen following long-term storage pro-cessing and transport in the absence of a cold chainrdquo Journal ofthe Science of Food andAgriculture vol 89 no 13 pp 2191ndash21992009

[11] R Garg M Tolbert J L Oakes T E Clemente K L Bostand K J Piller ldquoChloroplast targeting of FanC the majorantigenic subunit of Escherichia coli K99 fimbriae in transgenicsoybeanrdquo Plant Cell Reports vol 26 no 7 pp 1011ndash1023 2007

[12] K J Piller T E Clemente S M Jun et al ldquoExpression andimmunogenicity of an Escherichia coli K99 fimbriae subunitantigen in soybeanrdquo Planta vol 222 no 1 pp 6ndash18 2005

[13] L C Hudson R Garg K L Bost and K J Piller ldquoSoybeanseeds a practical host for the production of functional subunitvaccinesrdquo BioMed Research International vol 2014 Article ID340804 13 pages 2014

[14] K L Bost and K J Piller ldquoProtein expression systems whysoybean seedsrdquo in SoybeanmdashMolecular Aspects of Breeding ASudaric Ed Intech Open 2011

[15] D Tilman K G Cassman P A Matson R Naylor and SPolasky ldquoAgricultural sustainability and intensive productionpracticesrdquo Nature vol 418 no 6898 pp 671ndash677 2002

[16] G N De La FuenteU K Frei and T Lubberstedt ldquoAcceleratingplant breedingrdquo Trends in Plant Science vol 18 no 12 pp 667ndash672 2013

[17] L C Hudson K C Lambirth K L Bost and K L PillerAdvancements in Transgenic Soy From Field to Bedside InTech2013

[18] J D Ray T C Kilen C A Abel and R L Paris ldquoSoybeannatural cross-pollination rates under field conditionsrdquo Environ-mental Biosafety Research vol 2 no 2 pp 133ndash138 2003

[19] K L Hart S L Kimura V Mushailov Z M BudimlijaM Prinz and E Wurmbach ldquoImproved eye- and skin-colorprediction based on 8 SNPsrdquo Croatian Medical Journal vol 54no 3 pp 248ndash256 2013

[20] S E Medland T Zayats B Glaser et al ldquoA variant inLIN28B is associatedwith 2D 4D finger-length ratio a putativeretrospective biomarker of prenatal testosterone exposurerdquoTheAmerican Journal of HumanGenetics vol 86 no 4 pp 519ndash5252010

[21] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[22] Y-H Li S-C Zhao J-X Ma et al ldquoMolecular footprints ofdomestication and improvement in soybean revealed by wholegenome re-sequencingrdquo BMC Genomics vol 14 article 5792013

[23] T Joshi B Valliyodan J-H Wu S-H Lee D Xu and H TNguyen ldquoGenomic differences between cultivated soybean Gmax and its wild relative G sojardquo BMCGenomics vol 14 articleS5 2013

[24] K C Lambirth AMWhaley I C Blakley et al ldquoA comparisonof transgenic andwild type soybean seeds analysis of transcrip-tome profiles using RNA-Seqrdquo BMC Biotechnology vol 15 no1 article 89 2015

[25] H Li ldquoA statistical framework for SNP callingmutation discov-ery association mapping and population genetical parameterestimation from sequencing datardquo Bioinformatics vol 27 no 21pp 2987ndash2993 2011

[26] H Li B Handsaker A Wysoker et al ldquoThe sequence align-mentMap format and SAMtoolsrdquo Bioinformatics vol 25 no16 pp 2078ndash2079 2009

[27] D M Goodstein S Shu R Howson et al ldquoPhytozome acomparative platform for green plant genomicsrdquo Nucleic AcidsResearch vol 40 no 1 pp D1178ndashD1186 2012

[28] C Trapnell A Roberts L Goff et al ldquoDifferential gene andtranscript expression analysis of RNA-seq experiments withTopHat and Cufflinksrdquo Nature Protocols vol 7 no 3 pp 562ndash578 2012

[29] PDanecek A AutonG Abecasis et al ldquoThe variant call formatand VCFtoolsrdquo Bioinformatics vol 27 no 15 pp 2156ndash21582011

[30] M Krzywinski J Schein I Birol et al ldquoCircos an informationaesthetic for comparative genomicsrdquo Genome Research vol 19no 9 pp 1639ndash1645 2009

[31] P Cingolani A Platts L L Wang et al ldquoA program forannotating and predicting the effects of single nucleotidepolymorphisms SnpEff SNPs in the genome of Drosophilamelanogaster strain w 1118 iso-2 iso-3rdquo Fly vol 6 no 2 pp80ndash92 2012

[32] Z Du X Zhou Y Ling Z Zhang and Z Su ldquoagriGO A GOanalysis toolkit for the agricultural communityrdquo Nucleic AcidsResearch vol 38 no 2 Article ID gkq310 pp W64ndashW70 2010

[33] S AGoffM Vaughn SMcKay et al ldquoThe iPlant collaborativecyberinfrastructure for plant biologyrdquo Frontiers in Plant Sciencevol 2 article 34 2011

[34] Y L Zhu Q J Song D L Hyten et al ldquoSingle-nucleotidepolymorphisms in soybeanrdquo Genetics vol 163 no 3 pp 1123ndash1134 2003

[35] Y Shu Y Li Z Zhu et al ldquoSNPs discovery and CAPS markerconversion in soybeanrdquo Molecular Biology Reports vol 38 no3 pp 1841ndash1846 2011

[36] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[37] X Wu C Ren T Joshi T Vuong D Xu and H T NguyenldquoSNP discovery by high-throughput sequencing in soybeanrdquoBMC Genomics vol 11 article 469 2010

[38] Y-H Li J C Reif S A Jackson Y-S Ma R-Z Chang and L-J Qiu ldquoDetecting SNPs underlying domestication-related traitsin soybeanrdquo BMC Plant Biology vol 14 article 251 2014

12 International Journal of Plant Genomics

[39] L C Hudson K L Bost and K J Piller ldquoOptimizing recom-binant protein expression in soybeanrdquo in SoybeanmdashMolecularAspects of Breeding InTech Rijeka Croatia 2011

[40] W Goettel E Xia R Upchurch M-L Wang P Chen andY-Q C An ldquoIdentification and characterization of transcriptpolymorphisms in soybean lines varying in oil composition andcontentrdquo BMC Genomics vol 15 article 299 2014

[41] C Chan X Qi M-W Li F-L Wong and H-M Lam ldquoRecentdevelopments of genomic research in soybeanrdquo Journal ofGenetics and Genomics vol 39 no 7 pp 317ndash324 2012

[42] A A Komar ldquoSNPs silent but not invisiblerdquo Science vol 315no 5811 pp 466ndash467 2007

[43] P Katsonis A Koire S J Wilson et al ldquoSingle nucleotidevariations biological impact and theoretical interpretationrdquoProtein Science vol 23 no 12 pp 1650ndash1666 2014

[44] W J Haun D L Hyten W W Xu et al ldquoThe composition andorigins of genomic variation among individuals of the soybeanreference cultivar williams 82rdquo Plant Physiology vol 155 no 2pp 645ndash655 2011

[45] S DeBolt ldquoCopy number variation shapes genome diversityin Arabidopsis over immediate family generational scalesrdquoGenome Biology and Evolution vol 2 no 1 pp 441ndash453 2010

[46] A Zmienko A Samelak P Kozłowski and M FiglerowiczldquoCopy number polymorphism in plant genomesrdquo Theoreticaland Applied Genetics vol 127 no 1 pp 1ndash18 2014

[47] Y-G Lee N Jeong J H Kim et al ldquoDevelopment validationand genetic analysis of a large soybean SNP genotyping arrayrdquoPlant Journal vol 81 no 4 pp 625ndash636 2015

[48] B Valliyodan Q Dan G Patil et al ldquoLandscape of genomicdiversity and trait discovery in soybeanrdquo Scientific Reports vol6 article 23598 2016

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Research Article Transcript Polymorphism Rates in …downloads.hindawi.com/journals/ijpg/2016/1562041.pdfResearch Article Transcript Polymorphism Rates in Soybean Seed Tissue Are Increased

2 International Journal of Plant Genomics

not self-pollinating Segregation of alleles does not alwaysproduce desired functionality in overall phenotype and dueto individual variability it may come at the cost of overallfitness [16]

Glycine max or the modern cultivated soybean is alegume branched from the wild species Glycine soja and isclassified as a diploidized tetraploid (2119899 = 40) Soybean has arelatively lengthy growth cycle (sim6 months) and is primarilycultivated for its seed which is high in both protein and oilcontent [17] It is also primarily a self-pollinating dicot asthe anthers and stigma mature together in the same flower ofmost cultivars preventing cross-pollination with other plants[18] This makes soybean desirable for several reasons Thefirst one is the simplicity of inheritance selections whenbreeding and the other is the high protein content (sim40by weight) of the soybean seed itself which through millionsof years of evolution has been designed to maintain thedurability and stability of its internal cargo until optimumconditions are met for germination Soluble protein extrac-tion from soy seed is also straightforward and has provento be efficient for the generation and purification of vaccineantigens therapeutic proteins and diagnostic peptides [7ndash9] This makes soybean an ideal candidate platform as abioreactor for the generation accumulation and long-termstorage of plant-based biologics [14] removing the trade-offdichotomyof either high yield (leaf tissue expression) or long-term stability (seed tissue)

Single nucleotide polymorphisms (SNPs) and inser-tionsdeletions (INDELS) are major driving forces behindspecies evolution and diversity either changing a singlenucleotide base in the former or shifting the transcriptionalreading frame in the latter The end results depending onthe position of the base changes may alter protein prod-ucts controlling certain traits SNPs may result in benignvariations in humans such as eye color skin color [19] andfinger length changes [20] or alternatively may be responsiblefor debilitating disorders such as lung cancer and multiplesclerosis [21] INDELS and nonsense polymorphisms canhave large effects on gene products disrupting transcriptionstart sites and essential cellular processes Soybean has expe-rienced a dramatic reduction in genomic variability followingthe development of worldwide cultivation practices of thewild ancestor Glycine soja all but eliminating rare alleliccombinations in domesticated varieties [22] Due to thislong-term domestication Glycine max shares only 9765genomic sequence similarity with Glycine soja and contains425 novel genes that are not present in the wild cultivar thatdirectly control seed development oil content and proteinconcentration [23]

Recently we investigated the pleiotropic effects on thetranscriptome of soybean seed tissues expressing and accu-mulating high levels of recombinant protein [24] Resultsrevealed no correlation between transgene or recombinantprotein expression level and the quantity of differentially reg-ulated genes although one of the transgenic lines containedover 3000 differentially expressed genes We concluded thatthe gene expression differences observed may have been dueto the specific properties of the recombinant proteins them-selves on the homeostatic environment of the seed or due to

random mutagenesis however molecular characterizationsof transcript structure and internal modifications remainedunknown Given the ability of single base mutations tocause severe effectual repercussions in conjunction with thehigh number of differentially expressed genes we previouslydetected in our transgenic soybean line we explored whethertranscript polymorphism rates were influenced in the samemanner

In order to elucidate possible transcript modifications asa result of transformation we utilized the publicly availableRNA-seq datasets generated by our previous transcriptomesequencing study [24] to assess exome INDELs and SNPrates in all of the transgenic samples investigated previouslyThis allowed us to examine gene variances exclusive to thetransgenic plants and to characterize the possible resultingeffects of these polymorphisms in conjunction with thedifferentially expressed genes we previously characterized Toour knowledge this is the first effort to characterize exomepolymorphisms in transgenic soybean seed tissues and willserve to further interpret alterations and their possible effectsresulting from transgenesis

2 Materials and Methods

In this study we utilized datasets obtained from whole-transcriptome sequencing of transgenic soybean seed tissuesexpressing three different transgenes to detect exomic poly-morphisms Each experimental group consisted of nine seedsfrom each of the three transgenic events (designated ST111ST77 and 764) as well as the wild type controls All transgenicplants were generated using Agrobacterium transformationand taken to homozygosity over multiple generations priorto sequence acquisition The transgenes and recombinantproteins expressed and accumulated in the seed tissues havebeen described previously by our lab [24] Quality controlfiltering and assembly were conducted as part of the previousstudy prior to polymorphism detection and annotation withsamtools and bcftools

21 Exon SNPINDEL Analysis Soybean cultivation RNAextraction and differential gene expression analyses werecarried out using RNA-seq as previously described in Lam-birth et al [24] In order to facilitate exome SNP callssamtools version 12 [25 26] was used to index the soybeanreference genome version 275 sequence file obtained fromPhytozome [27] which was amended to contain scaffoldsof all three T-DNA sequences using the -faidx commandto allow alignment of nonreference transgene transcriptsAlignment files previously generated by TopHat [28] forall previously reported samples [24] in bam format wereconverted to bcf files using the -mpileup command with-g and -f parameters to specify the output format andto use the indexed reference fasta file The bcftools callcommand was subsequently used on the indexed bcf fileswith the -c parameter to invoke the original consensus callingmethod enabling SNP and INDEL identification The statand plot-vcfstats commands were used to generate statisticalsummaries for each sample Total SNP counts for eachsample were averaged to calculate the standard deviation and

International Journal of Plant Genomics 3

standard error for each experimental group of wild type andtransgenic seeds and unpaired two-tailed 119905-tests were usedto compare each transgenic group with wild type Individualnucleotide base changes transition and transversion ratesINDELS and single and multiallele SNPs were also recordedand compared between groups To identify SNPs present ineach experimental group the intersections of each variantfile for all nine individual replicates in the ST77 ST111 764and WT groups were taken SNPs in these combined filesoverlapping between all three transgenic groups as well asthose unique to each event when compared to WT werepulled from the vcf files using the -vcf-isec command invcftools [29] SNPs shared between the transgenics and WTcontrol group were removed to generate vcf files containingSNPs found exclusively in each transformation event

Visualization of SNPs and INDELS was conducted usingCircos version 069-2 [30] Variance files in bcf format fromthe samtoolsSNPeff output were converted to vcf files withthe -bcftools view -o command then using the cut -f 126command base calls quality score metrics and chromo-somal locations were extracted Formatted tab delimitedfiles were converted to the input text file for Circos using anin-house script generated in Python (version 2711) whichfiltered the calls to exclude all polymorphisms with qualityscores below 15The script is included in the Additional Filesof this manuscript (in Supplementary Material availableonline at httpdxdoiorg10115520161562041) Plots werethen generated using the -circos -conf circusconf commandto display SNP distributions of the transgenic and wild typesamples for all 20 soybean genomic chromosomes SNPswere spaced in the plots every 20 bases and layered vertically

To predict any possible translational effects resultingfrom detected SNPs and INDELS snpEff version 41i [31]was utilized on the resulting variance call files generatedby bcftools A custom database for snpEff was constructedusing the -build command consisting of the soybean genomeFASTA reference file described previously containing our T-DNA sequences as well as the gene model gff3 file fromthe Cufflinks output from our previous RNA-seq data [24]obtained from PhytozomeThe gff3 file provided a referenceindex for gene positions and identifiers as well as intronexonmodels and untranslated regions No codon table configura-tion was necessary as Glycine max utilizes standard codontriplets allowing snpEff to run with the default parametersemploying the SNPINDEL call vcf file from bcftools asinput Multithreaded processing using the -t option was notused as this removes statistical calculations and the resultingreports from the output file Statistical comparisons of SNPrates and effects between each group were conducted usingtwo-tailed unpaired 119905-tests in Microsoft Excel Functionalannotation of genes containing variants was accomplished byloading the gene output list from snpEff into the agriculturalgene ontology (GO) enrichment tool AgriGO [32] using theintegrated single enrichment analysis tool with the Glycinemax Wm82a2v1 background reference Significant termswere calculated using Fisherrsquos exact test with a corrected falsediscovery rate 119901 value threshold of 005 All computationswere conducted on an Apple Macbook Pro with a 27GHzquad-core Intel i7 processor and 16GB of DDR3 RAM

3 Results

31 SNP Rates in Transgenic Soybean Exons Total SNPsdetected by samtools and bcftools across the wild typecontrol group reported a mean of 20707 SNPs per sample(Figure 1(a)) with a mean SNP rate of one polymorphismfor every 42561 nucleotides (Figure 1(b)) Of these 48 onaverage are single allele polymorphisms with 35 multiallelesites containing 23 multiallelic SNPs An overall transition (apurine base changed to another purine base or pyrimidinebase changed to another pyrimidine base) to transversion (apurine base to pyrimidine base or pyrimidine base to purinebase alteration) ratio (TsTv) of 162 (Figure 1(d)) and anaverage of 1862 INDELS were identified per sample in thewild type group (Figure 1(c)) Mean base changes detectedwere as follows 888 ArarrC 3458 ArarrG 1428 ArarrT 793CrarrA769 CrarrG 2977 CrarrT 2941 GrarrA 787 GrarrC 818 GrarrT 1495TrarrA 3433 TrarrC and 944 TrarrG (Figure 2(a))

The ST111 experimental group reported an average of20208 SNPs per individual sample (Figure 1(a)) with apolymorphism rate of one SNP for every 44323 nucleotides(Figure 1(b)) Similar to wild type 48 of the SNPs in theST111 line were single allele alterations with an average of 34multiallele sites and 25 multiallele SNPs The TsTv ratio wasagain very similar to wild type at 163 (Figure 1(d)) with anaverage of 1750 INDELS per sample (Figure 1(c)) Per basechanges reported for the ST111 group were 933 ArarrC 3457ArarrG 1382 ArarrT 747 CrarrA 710 CrarrG 2868 CrarrT 2830GrarrA 754 GrarrC 779 GrarrT 1420 TrarrA 3415 TrarrC and 968TrarrG (Figure 2(b))

The ST77 experimental group was also comparable towild type with 21666 average SNPs per sample (Figure 1(a))at a rate of 41225 nucleotides for every polymorphism(Figure 1(b)) Forty-nine percent of the detected SNPs werereported as singletons with 33 identified multiallele SNPsites and 28 multiallele SNPs The TsTv ratio for the ST77group was slightly lower than wild type at 156 (Figure 1(d))indicating a slightly higher transversion rate in the ST77Dand ST77F siblings INDELS were nearly identical in numberand size to the wild type control group with an average totalof 1829 per sample (Figure 1(c)) Nucleotide base changesinclude 1119 ArarrC 3650 ArarrG 1468 ArarrT 837 CrarrA 772CrarrG 3015 CrarrT 2952 GrarrA 796 GrarrC 842 GrarrT 1520TrarrA 3589 TrarrC and 1135 TrarrG (Figure 2(c))

Lastly the 764 experimental group contained more sub-stantial nucleotide alterations when compared to wild typeand the other two experimental groups (ST111 and ST77)revealing an average SNP count of 38188 per individual(Figure 1(a)) nearly double the quantity of any other groupSNPs were also encountered on average at nearly doublethe frequency of the other samples with one SNP recordedevery 24281 bases (Figure 1(b)) In this group only 27 ofSNPs were reported as singletons with 78 multiallele sitesand 44 multiallele SNPs The TsTv ratio was the lowestof all four groups at 153 (Figure 1(d)) indicating the 764group had the highest overall rate of nucleotide transversionsINDELS also increased to 2390with a larger deviation spreadbetween samples (Figure 1(c)) Base changes included 1972ArarrC 6094 ArarrG 2596 ArarrT 1603 CrarrA 1386 CrarrG

4 International Journal of Plant Genomics

Total SNPs

nsns

000500000

100000015000002000000250000030000003500000400000045000005000000

Wild typeST111

ST77764

p = 4e minus 5lowastlowastlowast

(a)

Bases per SNPns

ns

000500000

100000015000002000000250000030000003500000400000045000005000000

Wild typeST111

ST77764

p = 2e minus 7lowastlowastlowast

(b)

INDELS

ns ns

000

50000

100000

150000

200000

250000

300000

Wild typeST111

ST77764

p = 00181lowast

(c)

Transitiontransversion rations

ns

144146148150152154156158160162164166

Wild typeST111

ST77764

p = 2e minus 6

lowastlowastlowast

(d)

Figure 1 Summary of SNP and INDEL counts Means of total polymorphisms (a) polymorphism rates (b) insertionsdeletions (c) andtransition to transversion ratios (d) were calculated all four experimental groups Standard error bars aswell as statistical results fromunpaired119905-tests between each group and wild type are shown Groups marked with an asterisk indicate significance (119901 lt 005 = lowast 119901 lt 0001 = lowastlowastlowast)Wild type is shown in blue ST111 in red ST77 in green and 764 in purple

5445 CrarrT 5461 GrarrA 1417 GrarrC 1639 GrarrT 2576 TrarrA6104 TrarrC and 1940 TrarrG (Figure 2(d))

Polymorphic sites unique to each transgenic event wereidentified by comparing each transgenic variant list to bothWT and the remaining two transgenic groups The ST77transgenic group shared 2196 common gene variants withWT and exhibited 981 unique polymorphisms The ST111transgenic line revealed similar values sharing 1983 genevariants with the controls while the 764 group shared 1912variants The ST111 and 764 lines each reported 927 and7717 unique gene variants respectively Transgenic linesalso revealed 127 gene variants shared between the threetransgenic groups

Overall SNP calls clearly illustrated that the 764 exper-imental group contained significantly increased levels ofpolymorphisms over the wild type controls and other trans-genic experimental groups while concurrently demonstrat-

ing an increased quantity of more acute transversion typemutations Average total SNPs INDELS SNP rates andtransitiontransversion ratios for each experimental groupare shown in Figure 1 and average individual nucleotidechanges for each group are shown in Figures 2(a)ndash2(d)

32 Classification of Predicted SNP Effects and ImpactsFollowing SNP and INDEL detection with samtools snpEffversion 41i [31] was used to evaluate potential alterations tothe exome resulting from these changes SnpEff annotatesvariants based on their genomic locations including intronsexons upstreamor downstream splice sites and untranslatedregions at 51015840 and 31015840 sequence ends Effects were groupedand sorted according to the variant type potential level ofeffect impact functional class type and region No multi-nucleotide polymorphisms were detected in any individualof all four experimental groups and modifications were

International Journal of Plant Genomics 5

000

100000

200000

300000

400000

500000

600000

700000

Wild type

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Silent3892

Nonsense311

Missense5797

Other051

Downstream2253

Exon1980

Intergenic263

Intron2776

Splice siteregion199

Upstream1537

510431

UTR 5998400

UTR 3998400

000

100000

200000

300000

400000

500000

600000

700000

Transgenic

39

3

58

046

2332

1968

283

2675183

1578

521

414

Silent

Nonsense

Missense

Other

Downstream

Exon

Intergenic

Intron

Splice siteregion

Upstream

UTR 5998400

UTR 3998400

000

100000

200000

300000

400000

500000

600000

700000

041

2329

2137

291

2491213

1516

555427

3877

3095814

Silent

NonsenseMissense

Other

Downstream

Exon

Intergenic

Intron

Splice siteregion

Upstream

UTR 5998400

UTR 3998400

000

100000

200000

300000

400000

500000

600000

700000

4448

181

5371

003

2505

1994

377

2357125

1619

Silent

Nonsense

Missense

Other

Downstream

Exon

Intergenic

Intron

Splice siteregion

Upstream

UTR 5998400

UTR 3998400

409

584

ArarrC

ArarrG

ArarrT

CrarrA

CrarrG

CrarrT

GrarrA

GrarrC

GrarrT

TrarrA

TrarrC

TrarrG

Figure 2 Polymorphism nucleotide base changes and predicted effects Mean rates of change for each nucleotide base are shown for wildtype (a) ST111 (b) ST77 (c) and 764 (d) with bars indicating the standard error of the mean Regions containing detected polymorphismsand their functional classifications are shown for wild type (e) ST111 (f) ST77 (g) and 764 (h)

6 International Journal of Plant Genomics

limited to SNPs insertions and deletions of bases Acrossall experimental groups polymorphisms classified as loweffect comprise sim10 of the total while moderate effects andhigh effects average 12 and 2 respectively The largestcategory of impact classification is the genomic modifiercategory comprising 72ndash78 of detected polymorphisms inall groups and 60 of shared variants between all transgenics(see snpEff reports in Supporting Data) Missense mutationsare highly prevalent in all four groups averaging sim60in wild type ST111 and ST77 events with a missense tosilent mutation ratio of sim16 (Figures 2(e)ndash2(g)) The 764group exhibited a slightly lower missense percentage ofsim53 however it also contained a higher percentage ofsilent mutations at sim45 compared to the other three groups(sim37) generating a missensesilent mutation ratio of sim120(Figure 2(h)) Nonsense mutations comprised sim3 of thetotal reported SNP effects for the wild type ST111 and ST77groups and sim18 in the 764 group (Figures 2(e)ndash2(h))

All four experimental groups showed a high prevalenceof SNPs of common types namely resulting in down-stream gene variants (sim24) intron variants (sim25) mis-sense variants (sim11) synonymous variants (sim7) andupstream gene variants (sim16)The most common locationsfor these variants are intron regions (sim25) the down-stream transcription start site (sim23) exons (sim23) introns(sim23) and the upstream transcription start site (sim16)The 764 group distribution of polymorphism types andaffected regions seem to mirror those of the wild typeST111 and ST77 groups with a slight reduction of SNPs inexonic regions All individuals also reported variants within51015840 and 31015840 untranslated regions (sim4 and sim6 resp) aswell as minimal detection of frameshifts stop and startvariants (lt1) Regions and predicted effects of detectedpolymorphisms are shown in Figures 2(e)ndash2(h)

Transition base changes of AharrG and CharrT were mostprevalent in all four experimental groups while the transver-sion AharrT was sim50 more common than AharrC CharrG orGharrT transversions Codon changes varied most commonlyat the wobble base position typically resulting from a tran-sition in both wild type and transgenic samples FollowingSNP patterns previously described the prevailing modifiedcodons appeared commonly shared between all individualsmeasured Base changes in the first and second positionsof the codon were much less frequent across all groupsHeat maps of the codon base changes representative for eachexperimental group are shown in Supplemental Figure 1Predicted amino acid changes resulting from codon alter-ations predominantly consist of synonymous substitutionshowever other frequently altered residues include valineto alanine alanine to valine leucine to phenylalanine andproline glutamate to glycine and lysine serine to proline andphenylalanine to leucine Supplemental Figure 2 shows thedistribution heat maps of detected amino acid substitutionaverages representative for all events

Minor changes were also detected within the additionalgenomic scaffolds containing the transgene sequences sup-plemented to the alignment reference file Specifically snpEffreported SNPs in the transgenes of samples 764B3 764H3and 764K1 at position 80 ST77D2 at position 1400 ST77F1

at positions 500 and 700 ST77J3 at position 600 ST111B3 atposition 800 ST111I3 at position 190 and ST111K1 ST111B1and ST111K2 at position 1010 Read alignments were unableto verify all called variants in the transgenes likely due toareas of low coverage however one area was corroborated asa consistent variant in the fillerDNAof all constructs betweenthe right border and promoter region (Supplemental Figure3) which was included as a control Supplemental Table 1summarizes all SNP calls and between group statistical testsSummaries of all SNPINDEL calls from bcftools and theirrespective effects from snpEff along with chromosomal dis-tribution plots additional statistics and lists of SNPs sharedbetween groups and within each group are available from theiPlant collaborative Discovery Environment [33] Links to therespective datasets are provided in the SupportingData of thismanuscript

33 Gene Ontology Analysis of Genes Containing SNPs Giventhe extent of detected polymorphic sites and the possibilitiesof effectual contributions to gene expression changes previ-ously documented in the 764 transgenic line it was necessaryto investigate genes containing SNPs and INDELS withgene ontology to identify the possible functional pathwayscontrolled by the affected transcripts Complete gene listscontaining SNPs provided by the output of snpEff for thewild type (36959 transcripts) ST77 (38691 transcripts)ST111 (34142 transcripts) and 764 (43426 transcripts) lineswere loaded into the AgriGO online gene ontology analysistool and subjected to a single enrichment analysis withmultiple corrections to deduce the functional roles of poly-morphic genes Out of the total Glycine maxV21 backgroundset of 29501 GO terms wild type matched 11245 ST77matched 11660 ST111 matched 10326 and 764 matched13321 GO categories were similar between the wild typeST77 and ST111 groups with minor variations with themajority of terms putatively annotated to translational orRNA processes (Figures 3(a) 3(b) and 3(d)) Intracellulartransport also constituted a large portion of the identifiedcategories (sim12 of total annotations) and in the case ofthe 764 transgenic line was the only other significant GOterm following RNA processing and binding (Figure 3(c))Three categories including ATP-dependent helicase activityubiquitin-dependent protein catabolic processes and ncRNAmetabolic processes were all unique to the ST77 and ST111lines (Figures 3(b) and 3(d)) GTP binding was exclusive tothe ST111 line comprising 21 of the total significant GOterms for the event Figure 4 shows all detected SNPs in achromosomal density map for all four experimental groups

Annotated GO terms for SNPs exclusive to each trans-genic event displayed similar functions with ST77 reporting295 gene variants involved with intracellular organellesphotosystem II cytoskeletal components and protein com-plexes ST111 displayed similar numbers of terms totaling 273annotations including photosynthesis and photosystem IIintracellular organelles ribonucleoprotein complexes thy-lakoids protein complexes and oxidoreductase activity The764 line exhibited the highest number of unique annotatedvariants with 2366 involving translation gene expressionbiosynthetic processes ribosomal structures and protein

International Journal of Plant Genomics 7

Wild typeaminoacylation

tRNA

for proteintranslation

5RNA binding

23

Vesicle-mediatedtransport

13Intracellularprotein transport

12

mRNA metabolic process

4

RNA processing13

Translation30

(a)

ST77ATP-dependenthelicase activity

10

RNA binding19

Ribonucleoproteincomplex biogenesis

4

Intracellularprotein transport

11

ncRNAmetabolic process

10

RNA processing12

Translation27

Ubiquitin-dependentprotein catabolic

process7

(b)

764

Intracellulartransport

25

RNA processing27

RNA binding48

(c)

ST111 ATP-dependenthelicase activity

13Intracellular

proteintransport

13

Ubiquitin-dependent

protein catabolicprocess

10

ncRNA metabolicprocess

13

mRNA processing4

RNA binding26

GTP binding21

(d)

Figure 3 AgriGO enriched gene ontology categories of SNP-containing genes Single enrichment analyses for all genes containing SNPs areshown for wild type (a) ST77 (b) 764 (c) and ST111 (d) Percentages are reflective of how many terms matched the indicated GO group outof the total matches to the background reference

complexes Of the 127 polymorphic genes shared by alltransgenic specimens 40 annotations returned 10 GO termspertaining to intracellular organelles macromolecular com-plexes and protein complexes Specific GO terms for eachgroup are shown in Figure 3 and complete AgriGO featurerelationship trees are shown in Supplemental Figure 4

4 Discussion

SNP rates have been previously evaluated in soybean usingmultiple fragment analysis on several genotypes of culti-vated varieties including Lincoln Mandarin Peking andRichland [34 35] From this fragment analysis transitionand transversion rates were reported to be nearly identicalbetween cultivars at 48 and 52 respectively Furthermorenucleotide diversity rates of cultivated soybean varietieswere estimated to be 5ndash8-fold less than the wild varietyGlycine soja and occur at an even lower frequency than thehighly characterizedArabidopsis thaliana self-crossingmodel[34] More recent investigations of SNPs and INDELS in

soybean using next generation resequencing have revealedgenome-wide polymorphisms in an effort to identify nativedisease resistant and favorable trait loci [36 37] Furtherstudies demonstrate the extensive narrowing of soybeangenomic variation due to selective pressures resulting fromdomestication practices which favor more valued traits suchas increased seed mass and oil content quantitative traitloci [38] In addition the self-pollinating nature of Glycinemax also contributes evolutionarily to the constriction ofgenomic variances in domesticated cultivars Thus withsuch selective pressures bottlenecking natural variations incultivated soybean varieties it is quite intriguing to seesuch consistently increased numbers of SNPs in only onetransgenic experimental group with no definitive initiatingevent

SNPs detected in our datasets were divulged from seedtranscriptome sequences which represent sim65 of the totalsoybean genome and 65 of total genomic protein codingsequences The SNPs described here are not considered tobe representative of the frequencies that may be detected in

8 International Journal of Plant Genomics

Wild type

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20

(a)

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20 1

chr2

chr3

chr4

ch6

chrhr7

chr8

chr9ch 1

chr12

chr13

chr14

chr17

chr18

chr19ch

ST77

(b)chr1

chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20 1

chr2

chr3

chr4

ch6

chr7

chr8

chr9ch 1

chr12

chr13

chr14

chr17

chr18

chr19ch

764

(c)

ST111

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20

(d)

Figure 4 Overview of the chromosomal distribution of detected transcript SNPs Aligned SNPs are shown for each soybean chromosomein the wild type controls (a) ST77 (b) 764 (c) and ST111 (d) transgenics The colored bars indicate each chromosome scaled to length witha bold outer hash mark representing every 10Mb of sequence and each nonbolded outer hash mark placed every 1Mb Each internal tickrepresents a detected SNP on which subsequent SNPs are stacked vertically from each transcript at that region

other soybean tissues or in different developmental stagesof these specific transgenic plants rather they demonstrateincreased transcript polymorphism rateswitnessed in a singlesoybean transformant out of several different experimentalgroups Although RNA-seq is focused on the functionalsegments of the genome and does not always capture regu-latory regions such as promoters or nontranscribed regions(eg methylated bases) one aspect of the resulting effectsof polymorphisms in these regions can be directly witnessedthrough examination of gene expression levels Because thesoybean expression systemdescribed previously by our lab [5

7 8 13 14 39] specifically targets seed tissue for recombinantprotein accumulation it was of great interest to identifypossible sequence alterations that may have resulted in geneexpression or protein structure variations in seeds which todate has not been investigated

Base changes for wild type ST77 and ST111 events wereall comparable while the 764 event consistently reportedSNP rates nearly double that of the other groups (1 SNPdetected every sim22000 bases) which is still well below thepreviously reported SNP rate of 1 SNP per sim1400 nucleotidesin soybean seeds [40] Nevertheless SNP base changes

International Journal of Plant Genomics 9

appeared to follow the same pattern of commonality withall groups demonstrating high percentages of transition basechanges with relatively low transversion counts The 764 linedid demonstrate a lower TsTv ratio than the other groups(Figure 1(d)) although progeny from two independent par-ents in the ST77 line (ST77D and ST77F) exhibited lowerratios as well indicating that this characteristic alone cannot reliably indicate internal stresses or overall divergencefrom our controls The majority of detected base changeswere located in the third base of the codon likely gener-ating synonymous (silent) polymorphisms which were themajority of predicted SNP classifications in all events Lesscommon changes occurred in the first or second bases of thecodon generating a nonsynonymous or missense mutationlikely resulting in an amino acid change (Figures 2(e)ndash2(h))Missense to silent ratios detected in all experimental groupsappear to be higher than previously published results forsoybean indicated [40] although the higher nonsynonymousto synonymous mutations may be due to high linkagedisequilibrium exhibited in soy [41]

Interestingly the 764 line had the lowest overall missenseto silent polymorphism ratio demonstrating that while the764 line contained the highest overall number of SNPs ahigher percentage of the total polymorphisms were silentmutations (sim45) compared to the other three experimentalgroups (Figure 2(h)) This reveals the possibility that SNPsand INDELS detected here had originated from the origi-nal transformation event and have been highly conservedthrough self-crossed generations of progeny particularlyconsidering how remarkably well conserved the attributesof detected SNPs were between different biological samplesof the same transformation event Without sequencing datafrom prior generations of each parental line tested herefor comparison this can not be confirmed with completeconfidence however this does pose an interesting inquirythat even silent SNPsmay have the ability to alter gene expres-sion and peptide structure Indeed it has been suggestedthat although synonymous polymorphisms code for identicalamino acids subtle changes resulting from the utilization ofnonoptimal synonymous codons can modulate downstreamtranscription and translation rates [42] thereby altering thefolding conformations and proposed function of the finalpeptides This may explain why the 764 transgenic eventexhibited the highest rates of differentially expressed genesin conjunction with polymorphisms but still maintained ahealthy phenotype

While many SNP studies have excluded synonymousmutations and those present in noncoding regions recentworks have examined these sequences in light of theirpossible epigenetic effects on transcription and translationhowever more advanced tools for the prediction of possibleeffects are still in development and require further testingfor reliable forecasts [43] Nonsynonymous polymorphismsappear at similar rates across the experimental groupsvarying between 50 and 60 of total functional SNP clas-sifications While nonsynonymous mutations nearly alwaysalter amino acid sequences that can potentially yield non-functional protein products this phenomenonmay produce aneutral effect on the organism or protein or activate signaling

cascades in redundant systems [43] The effects of thesedetected nonsynonymous polymorphisms in each transgenicevent are not known or identifiable without extensive pro-teomic investigations into the altered downstream transcriptproductsThe 764 line which previously displayed the largestdegree of differential gene expression exhibited the lowestpercentage of missense and nonsense polymorphisms there-fore these alone are not likely to be the root cause of observedchanges in this transgenic line

Spontaneous mutations may also occur as a result of apreviously occurring stressful event such as parental lineAgrobacterium transformation Epigenetic alterations andnucleotide transpositions have been detected up to fivegenerations forward from transformation in Arabidopsis [4445] however all seeds examined here were bred to orbeyond the 5th generation of progeny The causes for thesemutations are unknown however each transformant was anindependent transformation event and was likely exposedto a varying degree of stress during the process This couldpotentially have introduced genomic variations that havebeen carried forward to the transcriptome as even largecopy number of variations have been reported in plants asa response to stress [46] Segregation of SNPs from existingparental heterozygous loci is expected in progeny howeverall samples examined here were remarkably consistent inSNP numbers location distributions and types within theirexperimental groups further demonstrating the genomicstability of these events This further solidifies the possibilitythat the detected variations in all transgenic lines arose froman early parental event possibly occurring during the originalAgrobacterium transformation or planting event and havebeen stably integrated and carried forward to the generationsexamined Considering the consistency of the SNP patternsthese polymorphisms have likely originated at the genomiclevel and are not likely to appear in such large quantities dueto transcriptional proofreading errors

Detected alterations occurred in a ldquobowtierdquo-shapeddistribution across the 20 chromosomes with lowerinstances of SNPs near the centromeres and the highestincidence of SNPs near the euchromatic telomeric ends of thechromosomal arms This is where gene density is the highestwhich has been illustrated previously by several worksconducted on soybean polymorphism rates [36 38 47] OursnpEff dataset summaries available in the Supporting Dataof this manuscript show the same distribution pattern withno significant alterations of SNP rates between biologicalreplicates of progeny Without complete characterization ofthe targeted SNPs in this work future directions can invokefunctional characterizations to cross-reference detectedpolymorphisms with possible connections to differentiallyexpressed genes and also to improve existing annotatedWilliams 82 cultivar SNPs In order to fully deduce the originof SNP variations with high confidence multiple generationanalyses will need to be conducted Approximately 70 ofthe genes that were differentially expressed also containedSNPs or INDELS however due to the extreme disparityin size between each gene list (a maximum of sim3000 andsim40000 genes resp) this could simply be due to chanceand can not definitively hold concrete biological significance

10 International Journal of Plant Genomics

without further investigation This suggests that the SNPsexamined here may directly impact native gene expressionlevels possibly due to their locations within gene regulatoryregions or enhancer elements located near the telomericends of the chromosomal arms

Gene ontology terms from SNP gene lists of all experi-mental groups were very similar returning many enrichedterms regarding protein transport and localization as wellas RNA and transcriptional processes even in the wild typegroup (Figure 3 and Supplemental Figure 4) Interestinglythese relationships are highly similar to the enriched GOterms derived from the differentially expressed gene sets ofthe transgenic lines indicating the majority of functionalrelationships between these polymorphisms are likely arisingfrom intercultivar variations and are not specifically relatedto transgenesis effects However given the possibility thatmutations from transgenesis could occur within QTLs thatcontrol traits such as plant size flower count and seedquantity identification of novel polymorphisms in theseregions of soybeanrsquos transcriptome is of great importanceto both plant breeders and in biotechnology applicationsIn fact recent investigations have revealed an excess of9 million unique SNPs that occur between 106 differentcultivated soybean varieties [48] Here we have providedlists of exonic SNPs unique to each of the three transgenicgroups and specific to transformation as well as novel SNPsdetected in our Williams 82 wild type controls Furtherinvestigations of the events described will identify the sitesof T-DNA integration and characterize the surroundingsequences to determine whether increased incidences ofpolymorphisms occur within the immediate vicinities of theinsertions Unique SNPs identified and reported here may beused to aide these investigations of crucial trait loci in thefuture to maximize yields and assist in breeding programs oftransgenic specimens

5 Conclusions

This study describes the first direct comparison of poly-morphism rates present in the seed transcriptome of threeindependent transgenic lines of Glycine max with a non-transgenic wild type line all derived from the Williams 82soybean cultivar Previously we examined gene expressionchanges in these transgenic events finding only the 764line to be significantly altered when compared to the othertransgenic lines In the same manner we have expanded ourprevious work and reported transcriptome single nucleotidepolymorphism rates for the ST77 ST111 and 764 trans-genic events compared to wild type Mirroring the resultsfrom our differential expression study the samples of the764 transformation event exhibited transcript SNP ratesnearly double that of the wild type ST77 and ST111 eventswhile also demonstrating higher amounts of transversionmutations While the exact cause for both the large num-ber of differentially expressed genes and polymorphismspresent in the 764 event is not known it is highly likelythat the two phenomena are correlated Our results showthat in addition to differential expression transgenesis mayresult in unpredicted base changes within gene transcripts

and regulatory regions without impacting plant pheno-type

Availability of Supporting Data

The RNA sequencing data described herein may be accessedat httpwwwncbinlmnihgovgeoqueryacccgiacc=GS-E64620 through NCBIrsquos Gene Expression Omnibus underaccession number GSE64620 Summary files for bcftoolsand snpEff outputs encompassing variant calls and predictedvariant effects are available from the iPlant CollaborativeDiscovery Environment directory available at httpdeiplan-tcollaborativeorgdld44044BEE-69B8-4129-A6B5-4C9291-B6B4A1snpeffsummarieszip and httpdeiplantcollabora-tiveorgdldAD464040-9169-4E52-87D5-DEF7466A2C1Evariantsummarieszip Lists of genes containing detectedvariants from all samples are available from iPlant in acompressed zip archive available at httpdeiplantcollabor-ativeorgdld533570A3-1EFB-4864-B9A9-9D82F17E09A8snpeffgeneszip

Competing Interests

Theauthors of this manuscript declare no conflict of interests

Authorsrsquo Contributions

Kevin C Lambirth designed the experiment conducted theSNP detection and effect prediction and generated figuresAdamMWhaley compiled and installed software formattedsequencing data files and was consulted on analysis KevinC Lambirth Kenneth J Piller and Kenneth L Bost wrote themanuscript with input from all authors

Acknowledgments

The authors would like to acknowledge the sequencingefforts of Dr Mike Wang (David H Murdock ResearchInstitute) and Dr Laura Reichenberg (Pfeiffer University) forassistance with plant propagation and selection and Dr KanWang (Iowa State University Agronomy Department) for theestablishment of the ST111 transgenic soybean line

References

[1] A Ziemienowicz ldquoAgrobacterium-mediated plant transforma-tion factors applications and recent advancesrdquoBiocatalysis andAgricultural Biotechnology vol 3 no 4 pp 95ndash102 2014

[2] M S Koch J M Ward S L Levine J A Baum J L Viciniand B G Hammond ldquoThe food and environmental safety of Btcropsrdquo Frontiers in Plant Science vol 6 article 283 2015

[3] G M Dill C A CaJacob and S R Padgette ldquoGlyphosate-resistant crops adoption use and future considerationsrdquo PestManagement Science vol 64 no 4 pp 326ndash331 2008

[4] M A S Valente J A Q A Faria J R L Soares-Ramos et alldquoThe ER luminal binding protein (BiP) mediates an increase indrought tolerance in soybean and delays drought-induced leafsenescence in soybean and tobaccordquo Journal of ExperimentalBotany vol 60 no 2 pp 533ndash546 2009

[5] L C Hudson B S Seabolt J Odle K L Bost C H Stahl andK J Piller ldquoSublethal staphylococcal enterotoxin B challenge

International Journal of Plant Genomics 11

model in pigs to evaluate protection following immunizationwith a soybean-derived vaccinerdquoClinical andVaccine Immunol-ogy vol 20 no 1 pp 24ndash32 2013

[6] Y Zhang D Li X Jin and Z Huang ldquoFighting Ebola withZMapp spotlight on plant-made antibodyrdquo Science China LifeSciences vol 57 no 10 pp 987ndash988 2014

[7] R Powell L C Hudson K C Lambirth et al ldquoRecombinantexpression of homodimeric 660 kDa human thyroglobulin insoybean seeds an alternative source of human thyroglobulinrdquoPlant Cell Reports vol 30 no 7 pp 1327ndash1338 2011

[8] K L Bost K C Lambirth L C Hudson and K J PillerldquoSoybean-derived thyroglobulin as an analyte specific reagentfor in vitro diagnostic tests and devicesrdquo Advances in Medicineand Biology vol 80 pp 23ndash40 2014

[9] R Abiri A Valdiani M Maziah et al ldquoA critical review ofthe concept of transgenic plants insights into pharmaceuticalbiotechnology andmolecular farmingrdquo Current Issues in Molec-ular Biology vol 18 no 1 pp 21ndash42 2016

[10] J L Oakes K L Bost and K J Piller ldquoStability of a soybeanseed-derived vaccine antigen following long-term storage pro-cessing and transport in the absence of a cold chainrdquo Journal ofthe Science of Food andAgriculture vol 89 no 13 pp 2191ndash21992009

[11] R Garg M Tolbert J L Oakes T E Clemente K L Bostand K J Piller ldquoChloroplast targeting of FanC the majorantigenic subunit of Escherichia coli K99 fimbriae in transgenicsoybeanrdquo Plant Cell Reports vol 26 no 7 pp 1011ndash1023 2007

[12] K J Piller T E Clemente S M Jun et al ldquoExpression andimmunogenicity of an Escherichia coli K99 fimbriae subunitantigen in soybeanrdquo Planta vol 222 no 1 pp 6ndash18 2005

[13] L C Hudson R Garg K L Bost and K J Piller ldquoSoybeanseeds a practical host for the production of functional subunitvaccinesrdquo BioMed Research International vol 2014 Article ID340804 13 pages 2014

[14] K L Bost and K J Piller ldquoProtein expression systems whysoybean seedsrdquo in SoybeanmdashMolecular Aspects of Breeding ASudaric Ed Intech Open 2011

[15] D Tilman K G Cassman P A Matson R Naylor and SPolasky ldquoAgricultural sustainability and intensive productionpracticesrdquo Nature vol 418 no 6898 pp 671ndash677 2002

[16] G N De La FuenteU K Frei and T Lubberstedt ldquoAcceleratingplant breedingrdquo Trends in Plant Science vol 18 no 12 pp 667ndash672 2013

[17] L C Hudson K C Lambirth K L Bost and K L PillerAdvancements in Transgenic Soy From Field to Bedside InTech2013

[18] J D Ray T C Kilen C A Abel and R L Paris ldquoSoybeannatural cross-pollination rates under field conditionsrdquo Environ-mental Biosafety Research vol 2 no 2 pp 133ndash138 2003

[19] K L Hart S L Kimura V Mushailov Z M BudimlijaM Prinz and E Wurmbach ldquoImproved eye- and skin-colorprediction based on 8 SNPsrdquo Croatian Medical Journal vol 54no 3 pp 248ndash256 2013

[20] S E Medland T Zayats B Glaser et al ldquoA variant inLIN28B is associatedwith 2D 4D finger-length ratio a putativeretrospective biomarker of prenatal testosterone exposurerdquoTheAmerican Journal of HumanGenetics vol 86 no 4 pp 519ndash5252010

[21] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[22] Y-H Li S-C Zhao J-X Ma et al ldquoMolecular footprints ofdomestication and improvement in soybean revealed by wholegenome re-sequencingrdquo BMC Genomics vol 14 article 5792013

[23] T Joshi B Valliyodan J-H Wu S-H Lee D Xu and H TNguyen ldquoGenomic differences between cultivated soybean Gmax and its wild relative G sojardquo BMCGenomics vol 14 articleS5 2013

[24] K C Lambirth AMWhaley I C Blakley et al ldquoA comparisonof transgenic andwild type soybean seeds analysis of transcrip-tome profiles using RNA-Seqrdquo BMC Biotechnology vol 15 no1 article 89 2015

[25] H Li ldquoA statistical framework for SNP callingmutation discov-ery association mapping and population genetical parameterestimation from sequencing datardquo Bioinformatics vol 27 no 21pp 2987ndash2993 2011

[26] H Li B Handsaker A Wysoker et al ldquoThe sequence align-mentMap format and SAMtoolsrdquo Bioinformatics vol 25 no16 pp 2078ndash2079 2009

[27] D M Goodstein S Shu R Howson et al ldquoPhytozome acomparative platform for green plant genomicsrdquo Nucleic AcidsResearch vol 40 no 1 pp D1178ndashD1186 2012

[28] C Trapnell A Roberts L Goff et al ldquoDifferential gene andtranscript expression analysis of RNA-seq experiments withTopHat and Cufflinksrdquo Nature Protocols vol 7 no 3 pp 562ndash578 2012

[29] PDanecek A AutonG Abecasis et al ldquoThe variant call formatand VCFtoolsrdquo Bioinformatics vol 27 no 15 pp 2156ndash21582011

[30] M Krzywinski J Schein I Birol et al ldquoCircos an informationaesthetic for comparative genomicsrdquo Genome Research vol 19no 9 pp 1639ndash1645 2009

[31] P Cingolani A Platts L L Wang et al ldquoA program forannotating and predicting the effects of single nucleotidepolymorphisms SnpEff SNPs in the genome of Drosophilamelanogaster strain w 1118 iso-2 iso-3rdquo Fly vol 6 no 2 pp80ndash92 2012

[32] Z Du X Zhou Y Ling Z Zhang and Z Su ldquoagriGO A GOanalysis toolkit for the agricultural communityrdquo Nucleic AcidsResearch vol 38 no 2 Article ID gkq310 pp W64ndashW70 2010

[33] S AGoffM Vaughn SMcKay et al ldquoThe iPlant collaborativecyberinfrastructure for plant biologyrdquo Frontiers in Plant Sciencevol 2 article 34 2011

[34] Y L Zhu Q J Song D L Hyten et al ldquoSingle-nucleotidepolymorphisms in soybeanrdquo Genetics vol 163 no 3 pp 1123ndash1134 2003

[35] Y Shu Y Li Z Zhu et al ldquoSNPs discovery and CAPS markerconversion in soybeanrdquo Molecular Biology Reports vol 38 no3 pp 1841ndash1846 2011

[36] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[37] X Wu C Ren T Joshi T Vuong D Xu and H T NguyenldquoSNP discovery by high-throughput sequencing in soybeanrdquoBMC Genomics vol 11 article 469 2010

[38] Y-H Li J C Reif S A Jackson Y-S Ma R-Z Chang and L-J Qiu ldquoDetecting SNPs underlying domestication-related traitsin soybeanrdquo BMC Plant Biology vol 14 article 251 2014

12 International Journal of Plant Genomics

[39] L C Hudson K L Bost and K J Piller ldquoOptimizing recom-binant protein expression in soybeanrdquo in SoybeanmdashMolecularAspects of Breeding InTech Rijeka Croatia 2011

[40] W Goettel E Xia R Upchurch M-L Wang P Chen andY-Q C An ldquoIdentification and characterization of transcriptpolymorphisms in soybean lines varying in oil composition andcontentrdquo BMC Genomics vol 15 article 299 2014

[41] C Chan X Qi M-W Li F-L Wong and H-M Lam ldquoRecentdevelopments of genomic research in soybeanrdquo Journal ofGenetics and Genomics vol 39 no 7 pp 317ndash324 2012

[42] A A Komar ldquoSNPs silent but not invisiblerdquo Science vol 315no 5811 pp 466ndash467 2007

[43] P Katsonis A Koire S J Wilson et al ldquoSingle nucleotidevariations biological impact and theoretical interpretationrdquoProtein Science vol 23 no 12 pp 1650ndash1666 2014

[44] W J Haun D L Hyten W W Xu et al ldquoThe composition andorigins of genomic variation among individuals of the soybeanreference cultivar williams 82rdquo Plant Physiology vol 155 no 2pp 645ndash655 2011

[45] S DeBolt ldquoCopy number variation shapes genome diversityin Arabidopsis over immediate family generational scalesrdquoGenome Biology and Evolution vol 2 no 1 pp 441ndash453 2010

[46] A Zmienko A Samelak P Kozłowski and M FiglerowiczldquoCopy number polymorphism in plant genomesrdquo Theoreticaland Applied Genetics vol 127 no 1 pp 1ndash18 2014

[47] Y-G Lee N Jeong J H Kim et al ldquoDevelopment validationand genetic analysis of a large soybean SNP genotyping arrayrdquoPlant Journal vol 81 no 4 pp 625ndash636 2015

[48] B Valliyodan Q Dan G Patil et al ldquoLandscape of genomicdiversity and trait discovery in soybeanrdquo Scientific Reports vol6 article 23598 2016

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Research Article Transcript Polymorphism Rates in …downloads.hindawi.com/journals/ijpg/2016/1562041.pdfResearch Article Transcript Polymorphism Rates in Soybean Seed Tissue Are Increased

International Journal of Plant Genomics 3

standard error for each experimental group of wild type andtransgenic seeds and unpaired two-tailed 119905-tests were usedto compare each transgenic group with wild type Individualnucleotide base changes transition and transversion ratesINDELS and single and multiallele SNPs were also recordedand compared between groups To identify SNPs present ineach experimental group the intersections of each variantfile for all nine individual replicates in the ST77 ST111 764and WT groups were taken SNPs in these combined filesoverlapping between all three transgenic groups as well asthose unique to each event when compared to WT werepulled from the vcf files using the -vcf-isec command invcftools [29] SNPs shared between the transgenics and WTcontrol group were removed to generate vcf files containingSNPs found exclusively in each transformation event

Visualization of SNPs and INDELS was conducted usingCircos version 069-2 [30] Variance files in bcf format fromthe samtoolsSNPeff output were converted to vcf files withthe -bcftools view -o command then using the cut -f 126command base calls quality score metrics and chromo-somal locations were extracted Formatted tab delimitedfiles were converted to the input text file for Circos using anin-house script generated in Python (version 2711) whichfiltered the calls to exclude all polymorphisms with qualityscores below 15The script is included in the Additional Filesof this manuscript (in Supplementary Material availableonline at httpdxdoiorg10115520161562041) Plots werethen generated using the -circos -conf circusconf commandto display SNP distributions of the transgenic and wild typesamples for all 20 soybean genomic chromosomes SNPswere spaced in the plots every 20 bases and layered vertically

To predict any possible translational effects resultingfrom detected SNPs and INDELS snpEff version 41i [31]was utilized on the resulting variance call files generatedby bcftools A custom database for snpEff was constructedusing the -build command consisting of the soybean genomeFASTA reference file described previously containing our T-DNA sequences as well as the gene model gff3 file fromthe Cufflinks output from our previous RNA-seq data [24]obtained from PhytozomeThe gff3 file provided a referenceindex for gene positions and identifiers as well as intronexonmodels and untranslated regions No codon table configura-tion was necessary as Glycine max utilizes standard codontriplets allowing snpEff to run with the default parametersemploying the SNPINDEL call vcf file from bcftools asinput Multithreaded processing using the -t option was notused as this removes statistical calculations and the resultingreports from the output file Statistical comparisons of SNPrates and effects between each group were conducted usingtwo-tailed unpaired 119905-tests in Microsoft Excel Functionalannotation of genes containing variants was accomplished byloading the gene output list from snpEff into the agriculturalgene ontology (GO) enrichment tool AgriGO [32] using theintegrated single enrichment analysis tool with the Glycinemax Wm82a2v1 background reference Significant termswere calculated using Fisherrsquos exact test with a corrected falsediscovery rate 119901 value threshold of 005 All computationswere conducted on an Apple Macbook Pro with a 27GHzquad-core Intel i7 processor and 16GB of DDR3 RAM

3 Results

31 SNP Rates in Transgenic Soybean Exons Total SNPsdetected by samtools and bcftools across the wild typecontrol group reported a mean of 20707 SNPs per sample(Figure 1(a)) with a mean SNP rate of one polymorphismfor every 42561 nucleotides (Figure 1(b)) Of these 48 onaverage are single allele polymorphisms with 35 multiallelesites containing 23 multiallelic SNPs An overall transition (apurine base changed to another purine base or pyrimidinebase changed to another pyrimidine base) to transversion (apurine base to pyrimidine base or pyrimidine base to purinebase alteration) ratio (TsTv) of 162 (Figure 1(d)) and anaverage of 1862 INDELS were identified per sample in thewild type group (Figure 1(c)) Mean base changes detectedwere as follows 888 ArarrC 3458 ArarrG 1428 ArarrT 793CrarrA769 CrarrG 2977 CrarrT 2941 GrarrA 787 GrarrC 818 GrarrT 1495TrarrA 3433 TrarrC and 944 TrarrG (Figure 2(a))

The ST111 experimental group reported an average of20208 SNPs per individual sample (Figure 1(a)) with apolymorphism rate of one SNP for every 44323 nucleotides(Figure 1(b)) Similar to wild type 48 of the SNPs in theST111 line were single allele alterations with an average of 34multiallele sites and 25 multiallele SNPs The TsTv ratio wasagain very similar to wild type at 163 (Figure 1(d)) with anaverage of 1750 INDELS per sample (Figure 1(c)) Per basechanges reported for the ST111 group were 933 ArarrC 3457ArarrG 1382 ArarrT 747 CrarrA 710 CrarrG 2868 CrarrT 2830GrarrA 754 GrarrC 779 GrarrT 1420 TrarrA 3415 TrarrC and 968TrarrG (Figure 2(b))

The ST77 experimental group was also comparable towild type with 21666 average SNPs per sample (Figure 1(a))at a rate of 41225 nucleotides for every polymorphism(Figure 1(b)) Forty-nine percent of the detected SNPs werereported as singletons with 33 identified multiallele SNPsites and 28 multiallele SNPs The TsTv ratio for the ST77group was slightly lower than wild type at 156 (Figure 1(d))indicating a slightly higher transversion rate in the ST77Dand ST77F siblings INDELS were nearly identical in numberand size to the wild type control group with an average totalof 1829 per sample (Figure 1(c)) Nucleotide base changesinclude 1119 ArarrC 3650 ArarrG 1468 ArarrT 837 CrarrA 772CrarrG 3015 CrarrT 2952 GrarrA 796 GrarrC 842 GrarrT 1520TrarrA 3589 TrarrC and 1135 TrarrG (Figure 2(c))

Lastly the 764 experimental group contained more sub-stantial nucleotide alterations when compared to wild typeand the other two experimental groups (ST111 and ST77)revealing an average SNP count of 38188 per individual(Figure 1(a)) nearly double the quantity of any other groupSNPs were also encountered on average at nearly doublethe frequency of the other samples with one SNP recordedevery 24281 bases (Figure 1(b)) In this group only 27 ofSNPs were reported as singletons with 78 multiallele sitesand 44 multiallele SNPs The TsTv ratio was the lowestof all four groups at 153 (Figure 1(d)) indicating the 764group had the highest overall rate of nucleotide transversionsINDELS also increased to 2390with a larger deviation spreadbetween samples (Figure 1(c)) Base changes included 1972ArarrC 6094 ArarrG 2596 ArarrT 1603 CrarrA 1386 CrarrG

4 International Journal of Plant Genomics

Total SNPs

nsns

000500000

100000015000002000000250000030000003500000400000045000005000000

Wild typeST111

ST77764

p = 4e minus 5lowastlowastlowast

(a)

Bases per SNPns

ns

000500000

100000015000002000000250000030000003500000400000045000005000000

Wild typeST111

ST77764

p = 2e minus 7lowastlowastlowast

(b)

INDELS

ns ns

000

50000

100000

150000

200000

250000

300000

Wild typeST111

ST77764

p = 00181lowast

(c)

Transitiontransversion rations

ns

144146148150152154156158160162164166

Wild typeST111

ST77764

p = 2e minus 6

lowastlowastlowast

(d)

Figure 1 Summary of SNP and INDEL counts Means of total polymorphisms (a) polymorphism rates (b) insertionsdeletions (c) andtransition to transversion ratios (d) were calculated all four experimental groups Standard error bars aswell as statistical results fromunpaired119905-tests between each group and wild type are shown Groups marked with an asterisk indicate significance (119901 lt 005 = lowast 119901 lt 0001 = lowastlowastlowast)Wild type is shown in blue ST111 in red ST77 in green and 764 in purple

5445 CrarrT 5461 GrarrA 1417 GrarrC 1639 GrarrT 2576 TrarrA6104 TrarrC and 1940 TrarrG (Figure 2(d))

Polymorphic sites unique to each transgenic event wereidentified by comparing each transgenic variant list to bothWT and the remaining two transgenic groups The ST77transgenic group shared 2196 common gene variants withWT and exhibited 981 unique polymorphisms The ST111transgenic line revealed similar values sharing 1983 genevariants with the controls while the 764 group shared 1912variants The ST111 and 764 lines each reported 927 and7717 unique gene variants respectively Transgenic linesalso revealed 127 gene variants shared between the threetransgenic groups

Overall SNP calls clearly illustrated that the 764 exper-imental group contained significantly increased levels ofpolymorphisms over the wild type controls and other trans-genic experimental groups while concurrently demonstrat-

ing an increased quantity of more acute transversion typemutations Average total SNPs INDELS SNP rates andtransitiontransversion ratios for each experimental groupare shown in Figure 1 and average individual nucleotidechanges for each group are shown in Figures 2(a)ndash2(d)

32 Classification of Predicted SNP Effects and ImpactsFollowing SNP and INDEL detection with samtools snpEffversion 41i [31] was used to evaluate potential alterations tothe exome resulting from these changes SnpEff annotatesvariants based on their genomic locations including intronsexons upstreamor downstream splice sites and untranslatedregions at 51015840 and 31015840 sequence ends Effects were groupedand sorted according to the variant type potential level ofeffect impact functional class type and region No multi-nucleotide polymorphisms were detected in any individualof all four experimental groups and modifications were

International Journal of Plant Genomics 5

000

100000

200000

300000

400000

500000

600000

700000

Wild type

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Silent3892

Nonsense311

Missense5797

Other051

Downstream2253

Exon1980

Intergenic263

Intron2776

Splice siteregion199

Upstream1537

510431

UTR 5998400

UTR 3998400

000

100000

200000

300000

400000

500000

600000

700000

Transgenic

39

3

58

046

2332

1968

283

2675183

1578

521

414

Silent

Nonsense

Missense

Other

Downstream

Exon

Intergenic

Intron

Splice siteregion

Upstream

UTR 5998400

UTR 3998400

000

100000

200000

300000

400000

500000

600000

700000

041

2329

2137

291

2491213

1516

555427

3877

3095814

Silent

NonsenseMissense

Other

Downstream

Exon

Intergenic

Intron

Splice siteregion

Upstream

UTR 5998400

UTR 3998400

000

100000

200000

300000

400000

500000

600000

700000

4448

181

5371

003

2505

1994

377

2357125

1619

Silent

Nonsense

Missense

Other

Downstream

Exon

Intergenic

Intron

Splice siteregion

Upstream

UTR 5998400

UTR 3998400

409

584

ArarrC

ArarrG

ArarrT

CrarrA

CrarrG

CrarrT

GrarrA

GrarrC

GrarrT

TrarrA

TrarrC

TrarrG

Figure 2 Polymorphism nucleotide base changes and predicted effects Mean rates of change for each nucleotide base are shown for wildtype (a) ST111 (b) ST77 (c) and 764 (d) with bars indicating the standard error of the mean Regions containing detected polymorphismsand their functional classifications are shown for wild type (e) ST111 (f) ST77 (g) and 764 (h)

6 International Journal of Plant Genomics

limited to SNPs insertions and deletions of bases Acrossall experimental groups polymorphisms classified as loweffect comprise sim10 of the total while moderate effects andhigh effects average 12 and 2 respectively The largestcategory of impact classification is the genomic modifiercategory comprising 72ndash78 of detected polymorphisms inall groups and 60 of shared variants between all transgenics(see snpEff reports in Supporting Data) Missense mutationsare highly prevalent in all four groups averaging sim60in wild type ST111 and ST77 events with a missense tosilent mutation ratio of sim16 (Figures 2(e)ndash2(g)) The 764group exhibited a slightly lower missense percentage ofsim53 however it also contained a higher percentage ofsilent mutations at sim45 compared to the other three groups(sim37) generating a missensesilent mutation ratio of sim120(Figure 2(h)) Nonsense mutations comprised sim3 of thetotal reported SNP effects for the wild type ST111 and ST77groups and sim18 in the 764 group (Figures 2(e)ndash2(h))

All four experimental groups showed a high prevalenceof SNPs of common types namely resulting in down-stream gene variants (sim24) intron variants (sim25) mis-sense variants (sim11) synonymous variants (sim7) andupstream gene variants (sim16)The most common locationsfor these variants are intron regions (sim25) the down-stream transcription start site (sim23) exons (sim23) introns(sim23) and the upstream transcription start site (sim16)The 764 group distribution of polymorphism types andaffected regions seem to mirror those of the wild typeST111 and ST77 groups with a slight reduction of SNPs inexonic regions All individuals also reported variants within51015840 and 31015840 untranslated regions (sim4 and sim6 resp) aswell as minimal detection of frameshifts stop and startvariants (lt1) Regions and predicted effects of detectedpolymorphisms are shown in Figures 2(e)ndash2(h)

Transition base changes of AharrG and CharrT were mostprevalent in all four experimental groups while the transver-sion AharrT was sim50 more common than AharrC CharrG orGharrT transversions Codon changes varied most commonlyat the wobble base position typically resulting from a tran-sition in both wild type and transgenic samples FollowingSNP patterns previously described the prevailing modifiedcodons appeared commonly shared between all individualsmeasured Base changes in the first and second positionsof the codon were much less frequent across all groupsHeat maps of the codon base changes representative for eachexperimental group are shown in Supplemental Figure 1Predicted amino acid changes resulting from codon alter-ations predominantly consist of synonymous substitutionshowever other frequently altered residues include valineto alanine alanine to valine leucine to phenylalanine andproline glutamate to glycine and lysine serine to proline andphenylalanine to leucine Supplemental Figure 2 shows thedistribution heat maps of detected amino acid substitutionaverages representative for all events

Minor changes were also detected within the additionalgenomic scaffolds containing the transgene sequences sup-plemented to the alignment reference file Specifically snpEffreported SNPs in the transgenes of samples 764B3 764H3and 764K1 at position 80 ST77D2 at position 1400 ST77F1

at positions 500 and 700 ST77J3 at position 600 ST111B3 atposition 800 ST111I3 at position 190 and ST111K1 ST111B1and ST111K2 at position 1010 Read alignments were unableto verify all called variants in the transgenes likely due toareas of low coverage however one area was corroborated asa consistent variant in the fillerDNAof all constructs betweenthe right border and promoter region (Supplemental Figure3) which was included as a control Supplemental Table 1summarizes all SNP calls and between group statistical testsSummaries of all SNPINDEL calls from bcftools and theirrespective effects from snpEff along with chromosomal dis-tribution plots additional statistics and lists of SNPs sharedbetween groups and within each group are available from theiPlant collaborative Discovery Environment [33] Links to therespective datasets are provided in the SupportingData of thismanuscript

33 Gene Ontology Analysis of Genes Containing SNPs Giventhe extent of detected polymorphic sites and the possibilitiesof effectual contributions to gene expression changes previ-ously documented in the 764 transgenic line it was necessaryto investigate genes containing SNPs and INDELS withgene ontology to identify the possible functional pathwayscontrolled by the affected transcripts Complete gene listscontaining SNPs provided by the output of snpEff for thewild type (36959 transcripts) ST77 (38691 transcripts)ST111 (34142 transcripts) and 764 (43426 transcripts) lineswere loaded into the AgriGO online gene ontology analysistool and subjected to a single enrichment analysis withmultiple corrections to deduce the functional roles of poly-morphic genes Out of the total Glycine maxV21 backgroundset of 29501 GO terms wild type matched 11245 ST77matched 11660 ST111 matched 10326 and 764 matched13321 GO categories were similar between the wild typeST77 and ST111 groups with minor variations with themajority of terms putatively annotated to translational orRNA processes (Figures 3(a) 3(b) and 3(d)) Intracellulartransport also constituted a large portion of the identifiedcategories (sim12 of total annotations) and in the case ofthe 764 transgenic line was the only other significant GOterm following RNA processing and binding (Figure 3(c))Three categories including ATP-dependent helicase activityubiquitin-dependent protein catabolic processes and ncRNAmetabolic processes were all unique to the ST77 and ST111lines (Figures 3(b) and 3(d)) GTP binding was exclusive tothe ST111 line comprising 21 of the total significant GOterms for the event Figure 4 shows all detected SNPs in achromosomal density map for all four experimental groups

Annotated GO terms for SNPs exclusive to each trans-genic event displayed similar functions with ST77 reporting295 gene variants involved with intracellular organellesphotosystem II cytoskeletal components and protein com-plexes ST111 displayed similar numbers of terms totaling 273annotations including photosynthesis and photosystem IIintracellular organelles ribonucleoprotein complexes thy-lakoids protein complexes and oxidoreductase activity The764 line exhibited the highest number of unique annotatedvariants with 2366 involving translation gene expressionbiosynthetic processes ribosomal structures and protein

International Journal of Plant Genomics 7

Wild typeaminoacylation

tRNA

for proteintranslation

5RNA binding

23

Vesicle-mediatedtransport

13Intracellularprotein transport

12

mRNA metabolic process

4

RNA processing13

Translation30

(a)

ST77ATP-dependenthelicase activity

10

RNA binding19

Ribonucleoproteincomplex biogenesis

4

Intracellularprotein transport

11

ncRNAmetabolic process

10

RNA processing12

Translation27

Ubiquitin-dependentprotein catabolic

process7

(b)

764

Intracellulartransport

25

RNA processing27

RNA binding48

(c)

ST111 ATP-dependenthelicase activity

13Intracellular

proteintransport

13

Ubiquitin-dependent

protein catabolicprocess

10

ncRNA metabolicprocess

13

mRNA processing4

RNA binding26

GTP binding21

(d)

Figure 3 AgriGO enriched gene ontology categories of SNP-containing genes Single enrichment analyses for all genes containing SNPs areshown for wild type (a) ST77 (b) 764 (c) and ST111 (d) Percentages are reflective of how many terms matched the indicated GO group outof the total matches to the background reference

complexes Of the 127 polymorphic genes shared by alltransgenic specimens 40 annotations returned 10 GO termspertaining to intracellular organelles macromolecular com-plexes and protein complexes Specific GO terms for eachgroup are shown in Figure 3 and complete AgriGO featurerelationship trees are shown in Supplemental Figure 4

4 Discussion

SNP rates have been previously evaluated in soybean usingmultiple fragment analysis on several genotypes of culti-vated varieties including Lincoln Mandarin Peking andRichland [34 35] From this fragment analysis transitionand transversion rates were reported to be nearly identicalbetween cultivars at 48 and 52 respectively Furthermorenucleotide diversity rates of cultivated soybean varietieswere estimated to be 5ndash8-fold less than the wild varietyGlycine soja and occur at an even lower frequency than thehighly characterizedArabidopsis thaliana self-crossingmodel[34] More recent investigations of SNPs and INDELS in

soybean using next generation resequencing have revealedgenome-wide polymorphisms in an effort to identify nativedisease resistant and favorable trait loci [36 37] Furtherstudies demonstrate the extensive narrowing of soybeangenomic variation due to selective pressures resulting fromdomestication practices which favor more valued traits suchas increased seed mass and oil content quantitative traitloci [38] In addition the self-pollinating nature of Glycinemax also contributes evolutionarily to the constriction ofgenomic variances in domesticated cultivars Thus withsuch selective pressures bottlenecking natural variations incultivated soybean varieties it is quite intriguing to seesuch consistently increased numbers of SNPs in only onetransgenic experimental group with no definitive initiatingevent

SNPs detected in our datasets were divulged from seedtranscriptome sequences which represent sim65 of the totalsoybean genome and 65 of total genomic protein codingsequences The SNPs described here are not considered tobe representative of the frequencies that may be detected in

8 International Journal of Plant Genomics

Wild type

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20

(a)

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20 1

chr2

chr3

chr4

ch6

chrhr7

chr8

chr9ch 1

chr12

chr13

chr14

chr17

chr18

chr19ch

ST77

(b)chr1

chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20 1

chr2

chr3

chr4

ch6

chr7

chr8

chr9ch 1

chr12

chr13

chr14

chr17

chr18

chr19ch

764

(c)

ST111

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20

(d)

Figure 4 Overview of the chromosomal distribution of detected transcript SNPs Aligned SNPs are shown for each soybean chromosomein the wild type controls (a) ST77 (b) 764 (c) and ST111 (d) transgenics The colored bars indicate each chromosome scaled to length witha bold outer hash mark representing every 10Mb of sequence and each nonbolded outer hash mark placed every 1Mb Each internal tickrepresents a detected SNP on which subsequent SNPs are stacked vertically from each transcript at that region

other soybean tissues or in different developmental stagesof these specific transgenic plants rather they demonstrateincreased transcript polymorphism rateswitnessed in a singlesoybean transformant out of several different experimentalgroups Although RNA-seq is focused on the functionalsegments of the genome and does not always capture regu-latory regions such as promoters or nontranscribed regions(eg methylated bases) one aspect of the resulting effectsof polymorphisms in these regions can be directly witnessedthrough examination of gene expression levels Because thesoybean expression systemdescribed previously by our lab [5

7 8 13 14 39] specifically targets seed tissue for recombinantprotein accumulation it was of great interest to identifypossible sequence alterations that may have resulted in geneexpression or protein structure variations in seeds which todate has not been investigated

Base changes for wild type ST77 and ST111 events wereall comparable while the 764 event consistently reportedSNP rates nearly double that of the other groups (1 SNPdetected every sim22000 bases) which is still well below thepreviously reported SNP rate of 1 SNP per sim1400 nucleotidesin soybean seeds [40] Nevertheless SNP base changes

International Journal of Plant Genomics 9

appeared to follow the same pattern of commonality withall groups demonstrating high percentages of transition basechanges with relatively low transversion counts The 764 linedid demonstrate a lower TsTv ratio than the other groups(Figure 1(d)) although progeny from two independent par-ents in the ST77 line (ST77D and ST77F) exhibited lowerratios as well indicating that this characteristic alone cannot reliably indicate internal stresses or overall divergencefrom our controls The majority of detected base changeswere located in the third base of the codon likely gener-ating synonymous (silent) polymorphisms which were themajority of predicted SNP classifications in all events Lesscommon changes occurred in the first or second bases of thecodon generating a nonsynonymous or missense mutationlikely resulting in an amino acid change (Figures 2(e)ndash2(h))Missense to silent ratios detected in all experimental groupsappear to be higher than previously published results forsoybean indicated [40] although the higher nonsynonymousto synonymous mutations may be due to high linkagedisequilibrium exhibited in soy [41]

Interestingly the 764 line had the lowest overall missenseto silent polymorphism ratio demonstrating that while the764 line contained the highest overall number of SNPs ahigher percentage of the total polymorphisms were silentmutations (sim45) compared to the other three experimentalgroups (Figure 2(h)) This reveals the possibility that SNPsand INDELS detected here had originated from the origi-nal transformation event and have been highly conservedthrough self-crossed generations of progeny particularlyconsidering how remarkably well conserved the attributesof detected SNPs were between different biological samplesof the same transformation event Without sequencing datafrom prior generations of each parental line tested herefor comparison this can not be confirmed with completeconfidence however this does pose an interesting inquirythat even silent SNPsmay have the ability to alter gene expres-sion and peptide structure Indeed it has been suggestedthat although synonymous polymorphisms code for identicalamino acids subtle changes resulting from the utilization ofnonoptimal synonymous codons can modulate downstreamtranscription and translation rates [42] thereby altering thefolding conformations and proposed function of the finalpeptides This may explain why the 764 transgenic eventexhibited the highest rates of differentially expressed genesin conjunction with polymorphisms but still maintained ahealthy phenotype

While many SNP studies have excluded synonymousmutations and those present in noncoding regions recentworks have examined these sequences in light of theirpossible epigenetic effects on transcription and translationhowever more advanced tools for the prediction of possibleeffects are still in development and require further testingfor reliable forecasts [43] Nonsynonymous polymorphismsappear at similar rates across the experimental groupsvarying between 50 and 60 of total functional SNP clas-sifications While nonsynonymous mutations nearly alwaysalter amino acid sequences that can potentially yield non-functional protein products this phenomenonmay produce aneutral effect on the organism or protein or activate signaling

cascades in redundant systems [43] The effects of thesedetected nonsynonymous polymorphisms in each transgenicevent are not known or identifiable without extensive pro-teomic investigations into the altered downstream transcriptproductsThe 764 line which previously displayed the largestdegree of differential gene expression exhibited the lowestpercentage of missense and nonsense polymorphisms there-fore these alone are not likely to be the root cause of observedchanges in this transgenic line

Spontaneous mutations may also occur as a result of apreviously occurring stressful event such as parental lineAgrobacterium transformation Epigenetic alterations andnucleotide transpositions have been detected up to fivegenerations forward from transformation in Arabidopsis [4445] however all seeds examined here were bred to orbeyond the 5th generation of progeny The causes for thesemutations are unknown however each transformant was anindependent transformation event and was likely exposedto a varying degree of stress during the process This couldpotentially have introduced genomic variations that havebeen carried forward to the transcriptome as even largecopy number of variations have been reported in plants asa response to stress [46] Segregation of SNPs from existingparental heterozygous loci is expected in progeny howeverall samples examined here were remarkably consistent inSNP numbers location distributions and types within theirexperimental groups further demonstrating the genomicstability of these events This further solidifies the possibilitythat the detected variations in all transgenic lines arose froman early parental event possibly occurring during the originalAgrobacterium transformation or planting event and havebeen stably integrated and carried forward to the generationsexamined Considering the consistency of the SNP patternsthese polymorphisms have likely originated at the genomiclevel and are not likely to appear in such large quantities dueto transcriptional proofreading errors

Detected alterations occurred in a ldquobowtierdquo-shapeddistribution across the 20 chromosomes with lowerinstances of SNPs near the centromeres and the highestincidence of SNPs near the euchromatic telomeric ends of thechromosomal arms This is where gene density is the highestwhich has been illustrated previously by several worksconducted on soybean polymorphism rates [36 38 47] OursnpEff dataset summaries available in the Supporting Dataof this manuscript show the same distribution pattern withno significant alterations of SNP rates between biologicalreplicates of progeny Without complete characterization ofthe targeted SNPs in this work future directions can invokefunctional characterizations to cross-reference detectedpolymorphisms with possible connections to differentiallyexpressed genes and also to improve existing annotatedWilliams 82 cultivar SNPs In order to fully deduce the originof SNP variations with high confidence multiple generationanalyses will need to be conducted Approximately 70 ofthe genes that were differentially expressed also containedSNPs or INDELS however due to the extreme disparityin size between each gene list (a maximum of sim3000 andsim40000 genes resp) this could simply be due to chanceand can not definitively hold concrete biological significance

10 International Journal of Plant Genomics

without further investigation This suggests that the SNPsexamined here may directly impact native gene expressionlevels possibly due to their locations within gene regulatoryregions or enhancer elements located near the telomericends of the chromosomal arms

Gene ontology terms from SNP gene lists of all experi-mental groups were very similar returning many enrichedterms regarding protein transport and localization as wellas RNA and transcriptional processes even in the wild typegroup (Figure 3 and Supplemental Figure 4) Interestinglythese relationships are highly similar to the enriched GOterms derived from the differentially expressed gene sets ofthe transgenic lines indicating the majority of functionalrelationships between these polymorphisms are likely arisingfrom intercultivar variations and are not specifically relatedto transgenesis effects However given the possibility thatmutations from transgenesis could occur within QTLs thatcontrol traits such as plant size flower count and seedquantity identification of novel polymorphisms in theseregions of soybeanrsquos transcriptome is of great importanceto both plant breeders and in biotechnology applicationsIn fact recent investigations have revealed an excess of9 million unique SNPs that occur between 106 differentcultivated soybean varieties [48] Here we have providedlists of exonic SNPs unique to each of the three transgenicgroups and specific to transformation as well as novel SNPsdetected in our Williams 82 wild type controls Furtherinvestigations of the events described will identify the sitesof T-DNA integration and characterize the surroundingsequences to determine whether increased incidences ofpolymorphisms occur within the immediate vicinities of theinsertions Unique SNPs identified and reported here may beused to aide these investigations of crucial trait loci in thefuture to maximize yields and assist in breeding programs oftransgenic specimens

5 Conclusions

This study describes the first direct comparison of poly-morphism rates present in the seed transcriptome of threeindependent transgenic lines of Glycine max with a non-transgenic wild type line all derived from the Williams 82soybean cultivar Previously we examined gene expressionchanges in these transgenic events finding only the 764line to be significantly altered when compared to the othertransgenic lines In the same manner we have expanded ourprevious work and reported transcriptome single nucleotidepolymorphism rates for the ST77 ST111 and 764 trans-genic events compared to wild type Mirroring the resultsfrom our differential expression study the samples of the764 transformation event exhibited transcript SNP ratesnearly double that of the wild type ST77 and ST111 eventswhile also demonstrating higher amounts of transversionmutations While the exact cause for both the large num-ber of differentially expressed genes and polymorphismspresent in the 764 event is not known it is highly likelythat the two phenomena are correlated Our results showthat in addition to differential expression transgenesis mayresult in unpredicted base changes within gene transcripts

and regulatory regions without impacting plant pheno-type

Availability of Supporting Data

The RNA sequencing data described herein may be accessedat httpwwwncbinlmnihgovgeoqueryacccgiacc=GS-E64620 through NCBIrsquos Gene Expression Omnibus underaccession number GSE64620 Summary files for bcftoolsand snpEff outputs encompassing variant calls and predictedvariant effects are available from the iPlant CollaborativeDiscovery Environment directory available at httpdeiplan-tcollaborativeorgdld44044BEE-69B8-4129-A6B5-4C9291-B6B4A1snpeffsummarieszip and httpdeiplantcollabora-tiveorgdldAD464040-9169-4E52-87D5-DEF7466A2C1Evariantsummarieszip Lists of genes containing detectedvariants from all samples are available from iPlant in acompressed zip archive available at httpdeiplantcollabor-ativeorgdld533570A3-1EFB-4864-B9A9-9D82F17E09A8snpeffgeneszip

Competing Interests

Theauthors of this manuscript declare no conflict of interests

Authorsrsquo Contributions

Kevin C Lambirth designed the experiment conducted theSNP detection and effect prediction and generated figuresAdamMWhaley compiled and installed software formattedsequencing data files and was consulted on analysis KevinC Lambirth Kenneth J Piller and Kenneth L Bost wrote themanuscript with input from all authors

Acknowledgments

The authors would like to acknowledge the sequencingefforts of Dr Mike Wang (David H Murdock ResearchInstitute) and Dr Laura Reichenberg (Pfeiffer University) forassistance with plant propagation and selection and Dr KanWang (Iowa State University Agronomy Department) for theestablishment of the ST111 transgenic soybean line

References

[1] A Ziemienowicz ldquoAgrobacterium-mediated plant transforma-tion factors applications and recent advancesrdquoBiocatalysis andAgricultural Biotechnology vol 3 no 4 pp 95ndash102 2014

[2] M S Koch J M Ward S L Levine J A Baum J L Viciniand B G Hammond ldquoThe food and environmental safety of Btcropsrdquo Frontiers in Plant Science vol 6 article 283 2015

[3] G M Dill C A CaJacob and S R Padgette ldquoGlyphosate-resistant crops adoption use and future considerationsrdquo PestManagement Science vol 64 no 4 pp 326ndash331 2008

[4] M A S Valente J A Q A Faria J R L Soares-Ramos et alldquoThe ER luminal binding protein (BiP) mediates an increase indrought tolerance in soybean and delays drought-induced leafsenescence in soybean and tobaccordquo Journal of ExperimentalBotany vol 60 no 2 pp 533ndash546 2009

[5] L C Hudson B S Seabolt J Odle K L Bost C H Stahl andK J Piller ldquoSublethal staphylococcal enterotoxin B challenge

International Journal of Plant Genomics 11

model in pigs to evaluate protection following immunizationwith a soybean-derived vaccinerdquoClinical andVaccine Immunol-ogy vol 20 no 1 pp 24ndash32 2013

[6] Y Zhang D Li X Jin and Z Huang ldquoFighting Ebola withZMapp spotlight on plant-made antibodyrdquo Science China LifeSciences vol 57 no 10 pp 987ndash988 2014

[7] R Powell L C Hudson K C Lambirth et al ldquoRecombinantexpression of homodimeric 660 kDa human thyroglobulin insoybean seeds an alternative source of human thyroglobulinrdquoPlant Cell Reports vol 30 no 7 pp 1327ndash1338 2011

[8] K L Bost K C Lambirth L C Hudson and K J PillerldquoSoybean-derived thyroglobulin as an analyte specific reagentfor in vitro diagnostic tests and devicesrdquo Advances in Medicineand Biology vol 80 pp 23ndash40 2014

[9] R Abiri A Valdiani M Maziah et al ldquoA critical review ofthe concept of transgenic plants insights into pharmaceuticalbiotechnology andmolecular farmingrdquo Current Issues in Molec-ular Biology vol 18 no 1 pp 21ndash42 2016

[10] J L Oakes K L Bost and K J Piller ldquoStability of a soybeanseed-derived vaccine antigen following long-term storage pro-cessing and transport in the absence of a cold chainrdquo Journal ofthe Science of Food andAgriculture vol 89 no 13 pp 2191ndash21992009

[11] R Garg M Tolbert J L Oakes T E Clemente K L Bostand K J Piller ldquoChloroplast targeting of FanC the majorantigenic subunit of Escherichia coli K99 fimbriae in transgenicsoybeanrdquo Plant Cell Reports vol 26 no 7 pp 1011ndash1023 2007

[12] K J Piller T E Clemente S M Jun et al ldquoExpression andimmunogenicity of an Escherichia coli K99 fimbriae subunitantigen in soybeanrdquo Planta vol 222 no 1 pp 6ndash18 2005

[13] L C Hudson R Garg K L Bost and K J Piller ldquoSoybeanseeds a practical host for the production of functional subunitvaccinesrdquo BioMed Research International vol 2014 Article ID340804 13 pages 2014

[14] K L Bost and K J Piller ldquoProtein expression systems whysoybean seedsrdquo in SoybeanmdashMolecular Aspects of Breeding ASudaric Ed Intech Open 2011

[15] D Tilman K G Cassman P A Matson R Naylor and SPolasky ldquoAgricultural sustainability and intensive productionpracticesrdquo Nature vol 418 no 6898 pp 671ndash677 2002

[16] G N De La FuenteU K Frei and T Lubberstedt ldquoAcceleratingplant breedingrdquo Trends in Plant Science vol 18 no 12 pp 667ndash672 2013

[17] L C Hudson K C Lambirth K L Bost and K L PillerAdvancements in Transgenic Soy From Field to Bedside InTech2013

[18] J D Ray T C Kilen C A Abel and R L Paris ldquoSoybeannatural cross-pollination rates under field conditionsrdquo Environ-mental Biosafety Research vol 2 no 2 pp 133ndash138 2003

[19] K L Hart S L Kimura V Mushailov Z M BudimlijaM Prinz and E Wurmbach ldquoImproved eye- and skin-colorprediction based on 8 SNPsrdquo Croatian Medical Journal vol 54no 3 pp 248ndash256 2013

[20] S E Medland T Zayats B Glaser et al ldquoA variant inLIN28B is associatedwith 2D 4D finger-length ratio a putativeretrospective biomarker of prenatal testosterone exposurerdquoTheAmerican Journal of HumanGenetics vol 86 no 4 pp 519ndash5252010

[21] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[22] Y-H Li S-C Zhao J-X Ma et al ldquoMolecular footprints ofdomestication and improvement in soybean revealed by wholegenome re-sequencingrdquo BMC Genomics vol 14 article 5792013

[23] T Joshi B Valliyodan J-H Wu S-H Lee D Xu and H TNguyen ldquoGenomic differences between cultivated soybean Gmax and its wild relative G sojardquo BMCGenomics vol 14 articleS5 2013

[24] K C Lambirth AMWhaley I C Blakley et al ldquoA comparisonof transgenic andwild type soybean seeds analysis of transcrip-tome profiles using RNA-Seqrdquo BMC Biotechnology vol 15 no1 article 89 2015

[25] H Li ldquoA statistical framework for SNP callingmutation discov-ery association mapping and population genetical parameterestimation from sequencing datardquo Bioinformatics vol 27 no 21pp 2987ndash2993 2011

[26] H Li B Handsaker A Wysoker et al ldquoThe sequence align-mentMap format and SAMtoolsrdquo Bioinformatics vol 25 no16 pp 2078ndash2079 2009

[27] D M Goodstein S Shu R Howson et al ldquoPhytozome acomparative platform for green plant genomicsrdquo Nucleic AcidsResearch vol 40 no 1 pp D1178ndashD1186 2012

[28] C Trapnell A Roberts L Goff et al ldquoDifferential gene andtranscript expression analysis of RNA-seq experiments withTopHat and Cufflinksrdquo Nature Protocols vol 7 no 3 pp 562ndash578 2012

[29] PDanecek A AutonG Abecasis et al ldquoThe variant call formatand VCFtoolsrdquo Bioinformatics vol 27 no 15 pp 2156ndash21582011

[30] M Krzywinski J Schein I Birol et al ldquoCircos an informationaesthetic for comparative genomicsrdquo Genome Research vol 19no 9 pp 1639ndash1645 2009

[31] P Cingolani A Platts L L Wang et al ldquoA program forannotating and predicting the effects of single nucleotidepolymorphisms SnpEff SNPs in the genome of Drosophilamelanogaster strain w 1118 iso-2 iso-3rdquo Fly vol 6 no 2 pp80ndash92 2012

[32] Z Du X Zhou Y Ling Z Zhang and Z Su ldquoagriGO A GOanalysis toolkit for the agricultural communityrdquo Nucleic AcidsResearch vol 38 no 2 Article ID gkq310 pp W64ndashW70 2010

[33] S AGoffM Vaughn SMcKay et al ldquoThe iPlant collaborativecyberinfrastructure for plant biologyrdquo Frontiers in Plant Sciencevol 2 article 34 2011

[34] Y L Zhu Q J Song D L Hyten et al ldquoSingle-nucleotidepolymorphisms in soybeanrdquo Genetics vol 163 no 3 pp 1123ndash1134 2003

[35] Y Shu Y Li Z Zhu et al ldquoSNPs discovery and CAPS markerconversion in soybeanrdquo Molecular Biology Reports vol 38 no3 pp 1841ndash1846 2011

[36] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[37] X Wu C Ren T Joshi T Vuong D Xu and H T NguyenldquoSNP discovery by high-throughput sequencing in soybeanrdquoBMC Genomics vol 11 article 469 2010

[38] Y-H Li J C Reif S A Jackson Y-S Ma R-Z Chang and L-J Qiu ldquoDetecting SNPs underlying domestication-related traitsin soybeanrdquo BMC Plant Biology vol 14 article 251 2014

12 International Journal of Plant Genomics

[39] L C Hudson K L Bost and K J Piller ldquoOptimizing recom-binant protein expression in soybeanrdquo in SoybeanmdashMolecularAspects of Breeding InTech Rijeka Croatia 2011

[40] W Goettel E Xia R Upchurch M-L Wang P Chen andY-Q C An ldquoIdentification and characterization of transcriptpolymorphisms in soybean lines varying in oil composition andcontentrdquo BMC Genomics vol 15 article 299 2014

[41] C Chan X Qi M-W Li F-L Wong and H-M Lam ldquoRecentdevelopments of genomic research in soybeanrdquo Journal ofGenetics and Genomics vol 39 no 7 pp 317ndash324 2012

[42] A A Komar ldquoSNPs silent but not invisiblerdquo Science vol 315no 5811 pp 466ndash467 2007

[43] P Katsonis A Koire S J Wilson et al ldquoSingle nucleotidevariations biological impact and theoretical interpretationrdquoProtein Science vol 23 no 12 pp 1650ndash1666 2014

[44] W J Haun D L Hyten W W Xu et al ldquoThe composition andorigins of genomic variation among individuals of the soybeanreference cultivar williams 82rdquo Plant Physiology vol 155 no 2pp 645ndash655 2011

[45] S DeBolt ldquoCopy number variation shapes genome diversityin Arabidopsis over immediate family generational scalesrdquoGenome Biology and Evolution vol 2 no 1 pp 441ndash453 2010

[46] A Zmienko A Samelak P Kozłowski and M FiglerowiczldquoCopy number polymorphism in plant genomesrdquo Theoreticaland Applied Genetics vol 127 no 1 pp 1ndash18 2014

[47] Y-G Lee N Jeong J H Kim et al ldquoDevelopment validationand genetic analysis of a large soybean SNP genotyping arrayrdquoPlant Journal vol 81 no 4 pp 625ndash636 2015

[48] B Valliyodan Q Dan G Patil et al ldquoLandscape of genomicdiversity and trait discovery in soybeanrdquo Scientific Reports vol6 article 23598 2016

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Research Article Transcript Polymorphism Rates in …downloads.hindawi.com/journals/ijpg/2016/1562041.pdfResearch Article Transcript Polymorphism Rates in Soybean Seed Tissue Are Increased

4 International Journal of Plant Genomics

Total SNPs

nsns

000500000

100000015000002000000250000030000003500000400000045000005000000

Wild typeST111

ST77764

p = 4e minus 5lowastlowastlowast

(a)

Bases per SNPns

ns

000500000

100000015000002000000250000030000003500000400000045000005000000

Wild typeST111

ST77764

p = 2e minus 7lowastlowastlowast

(b)

INDELS

ns ns

000

50000

100000

150000

200000

250000

300000

Wild typeST111

ST77764

p = 00181lowast

(c)

Transitiontransversion rations

ns

144146148150152154156158160162164166

Wild typeST111

ST77764

p = 2e minus 6

lowastlowastlowast

(d)

Figure 1 Summary of SNP and INDEL counts Means of total polymorphisms (a) polymorphism rates (b) insertionsdeletions (c) andtransition to transversion ratios (d) were calculated all four experimental groups Standard error bars aswell as statistical results fromunpaired119905-tests between each group and wild type are shown Groups marked with an asterisk indicate significance (119901 lt 005 = lowast 119901 lt 0001 = lowastlowastlowast)Wild type is shown in blue ST111 in red ST77 in green and 764 in purple

5445 CrarrT 5461 GrarrA 1417 GrarrC 1639 GrarrT 2576 TrarrA6104 TrarrC and 1940 TrarrG (Figure 2(d))

Polymorphic sites unique to each transgenic event wereidentified by comparing each transgenic variant list to bothWT and the remaining two transgenic groups The ST77transgenic group shared 2196 common gene variants withWT and exhibited 981 unique polymorphisms The ST111transgenic line revealed similar values sharing 1983 genevariants with the controls while the 764 group shared 1912variants The ST111 and 764 lines each reported 927 and7717 unique gene variants respectively Transgenic linesalso revealed 127 gene variants shared between the threetransgenic groups

Overall SNP calls clearly illustrated that the 764 exper-imental group contained significantly increased levels ofpolymorphisms over the wild type controls and other trans-genic experimental groups while concurrently demonstrat-

ing an increased quantity of more acute transversion typemutations Average total SNPs INDELS SNP rates andtransitiontransversion ratios for each experimental groupare shown in Figure 1 and average individual nucleotidechanges for each group are shown in Figures 2(a)ndash2(d)

32 Classification of Predicted SNP Effects and ImpactsFollowing SNP and INDEL detection with samtools snpEffversion 41i [31] was used to evaluate potential alterations tothe exome resulting from these changes SnpEff annotatesvariants based on their genomic locations including intronsexons upstreamor downstream splice sites and untranslatedregions at 51015840 and 31015840 sequence ends Effects were groupedand sorted according to the variant type potential level ofeffect impact functional class type and region No multi-nucleotide polymorphisms were detected in any individualof all four experimental groups and modifications were

International Journal of Plant Genomics 5

000

100000

200000

300000

400000

500000

600000

700000

Wild type

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Silent3892

Nonsense311

Missense5797

Other051

Downstream2253

Exon1980

Intergenic263

Intron2776

Splice siteregion199

Upstream1537

510431

UTR 5998400

UTR 3998400

000

100000

200000

300000

400000

500000

600000

700000

Transgenic

39

3

58

046

2332

1968

283

2675183

1578

521

414

Silent

Nonsense

Missense

Other

Downstream

Exon

Intergenic

Intron

Splice siteregion

Upstream

UTR 5998400

UTR 3998400

000

100000

200000

300000

400000

500000

600000

700000

041

2329

2137

291

2491213

1516

555427

3877

3095814

Silent

NonsenseMissense

Other

Downstream

Exon

Intergenic

Intron

Splice siteregion

Upstream

UTR 5998400

UTR 3998400

000

100000

200000

300000

400000

500000

600000

700000

4448

181

5371

003

2505

1994

377

2357125

1619

Silent

Nonsense

Missense

Other

Downstream

Exon

Intergenic

Intron

Splice siteregion

Upstream

UTR 5998400

UTR 3998400

409

584

ArarrC

ArarrG

ArarrT

CrarrA

CrarrG

CrarrT

GrarrA

GrarrC

GrarrT

TrarrA

TrarrC

TrarrG

Figure 2 Polymorphism nucleotide base changes and predicted effects Mean rates of change for each nucleotide base are shown for wildtype (a) ST111 (b) ST77 (c) and 764 (d) with bars indicating the standard error of the mean Regions containing detected polymorphismsand their functional classifications are shown for wild type (e) ST111 (f) ST77 (g) and 764 (h)

6 International Journal of Plant Genomics

limited to SNPs insertions and deletions of bases Acrossall experimental groups polymorphisms classified as loweffect comprise sim10 of the total while moderate effects andhigh effects average 12 and 2 respectively The largestcategory of impact classification is the genomic modifiercategory comprising 72ndash78 of detected polymorphisms inall groups and 60 of shared variants between all transgenics(see snpEff reports in Supporting Data) Missense mutationsare highly prevalent in all four groups averaging sim60in wild type ST111 and ST77 events with a missense tosilent mutation ratio of sim16 (Figures 2(e)ndash2(g)) The 764group exhibited a slightly lower missense percentage ofsim53 however it also contained a higher percentage ofsilent mutations at sim45 compared to the other three groups(sim37) generating a missensesilent mutation ratio of sim120(Figure 2(h)) Nonsense mutations comprised sim3 of thetotal reported SNP effects for the wild type ST111 and ST77groups and sim18 in the 764 group (Figures 2(e)ndash2(h))

All four experimental groups showed a high prevalenceof SNPs of common types namely resulting in down-stream gene variants (sim24) intron variants (sim25) mis-sense variants (sim11) synonymous variants (sim7) andupstream gene variants (sim16)The most common locationsfor these variants are intron regions (sim25) the down-stream transcription start site (sim23) exons (sim23) introns(sim23) and the upstream transcription start site (sim16)The 764 group distribution of polymorphism types andaffected regions seem to mirror those of the wild typeST111 and ST77 groups with a slight reduction of SNPs inexonic regions All individuals also reported variants within51015840 and 31015840 untranslated regions (sim4 and sim6 resp) aswell as minimal detection of frameshifts stop and startvariants (lt1) Regions and predicted effects of detectedpolymorphisms are shown in Figures 2(e)ndash2(h)

Transition base changes of AharrG and CharrT were mostprevalent in all four experimental groups while the transver-sion AharrT was sim50 more common than AharrC CharrG orGharrT transversions Codon changes varied most commonlyat the wobble base position typically resulting from a tran-sition in both wild type and transgenic samples FollowingSNP patterns previously described the prevailing modifiedcodons appeared commonly shared between all individualsmeasured Base changes in the first and second positionsof the codon were much less frequent across all groupsHeat maps of the codon base changes representative for eachexperimental group are shown in Supplemental Figure 1Predicted amino acid changes resulting from codon alter-ations predominantly consist of synonymous substitutionshowever other frequently altered residues include valineto alanine alanine to valine leucine to phenylalanine andproline glutamate to glycine and lysine serine to proline andphenylalanine to leucine Supplemental Figure 2 shows thedistribution heat maps of detected amino acid substitutionaverages representative for all events

Minor changes were also detected within the additionalgenomic scaffolds containing the transgene sequences sup-plemented to the alignment reference file Specifically snpEffreported SNPs in the transgenes of samples 764B3 764H3and 764K1 at position 80 ST77D2 at position 1400 ST77F1

at positions 500 and 700 ST77J3 at position 600 ST111B3 atposition 800 ST111I3 at position 190 and ST111K1 ST111B1and ST111K2 at position 1010 Read alignments were unableto verify all called variants in the transgenes likely due toareas of low coverage however one area was corroborated asa consistent variant in the fillerDNAof all constructs betweenthe right border and promoter region (Supplemental Figure3) which was included as a control Supplemental Table 1summarizes all SNP calls and between group statistical testsSummaries of all SNPINDEL calls from bcftools and theirrespective effects from snpEff along with chromosomal dis-tribution plots additional statistics and lists of SNPs sharedbetween groups and within each group are available from theiPlant collaborative Discovery Environment [33] Links to therespective datasets are provided in the SupportingData of thismanuscript

33 Gene Ontology Analysis of Genes Containing SNPs Giventhe extent of detected polymorphic sites and the possibilitiesof effectual contributions to gene expression changes previ-ously documented in the 764 transgenic line it was necessaryto investigate genes containing SNPs and INDELS withgene ontology to identify the possible functional pathwayscontrolled by the affected transcripts Complete gene listscontaining SNPs provided by the output of snpEff for thewild type (36959 transcripts) ST77 (38691 transcripts)ST111 (34142 transcripts) and 764 (43426 transcripts) lineswere loaded into the AgriGO online gene ontology analysistool and subjected to a single enrichment analysis withmultiple corrections to deduce the functional roles of poly-morphic genes Out of the total Glycine maxV21 backgroundset of 29501 GO terms wild type matched 11245 ST77matched 11660 ST111 matched 10326 and 764 matched13321 GO categories were similar between the wild typeST77 and ST111 groups with minor variations with themajority of terms putatively annotated to translational orRNA processes (Figures 3(a) 3(b) and 3(d)) Intracellulartransport also constituted a large portion of the identifiedcategories (sim12 of total annotations) and in the case ofthe 764 transgenic line was the only other significant GOterm following RNA processing and binding (Figure 3(c))Three categories including ATP-dependent helicase activityubiquitin-dependent protein catabolic processes and ncRNAmetabolic processes were all unique to the ST77 and ST111lines (Figures 3(b) and 3(d)) GTP binding was exclusive tothe ST111 line comprising 21 of the total significant GOterms for the event Figure 4 shows all detected SNPs in achromosomal density map for all four experimental groups

Annotated GO terms for SNPs exclusive to each trans-genic event displayed similar functions with ST77 reporting295 gene variants involved with intracellular organellesphotosystem II cytoskeletal components and protein com-plexes ST111 displayed similar numbers of terms totaling 273annotations including photosynthesis and photosystem IIintracellular organelles ribonucleoprotein complexes thy-lakoids protein complexes and oxidoreductase activity The764 line exhibited the highest number of unique annotatedvariants with 2366 involving translation gene expressionbiosynthetic processes ribosomal structures and protein

International Journal of Plant Genomics 7

Wild typeaminoacylation

tRNA

for proteintranslation

5RNA binding

23

Vesicle-mediatedtransport

13Intracellularprotein transport

12

mRNA metabolic process

4

RNA processing13

Translation30

(a)

ST77ATP-dependenthelicase activity

10

RNA binding19

Ribonucleoproteincomplex biogenesis

4

Intracellularprotein transport

11

ncRNAmetabolic process

10

RNA processing12

Translation27

Ubiquitin-dependentprotein catabolic

process7

(b)

764

Intracellulartransport

25

RNA processing27

RNA binding48

(c)

ST111 ATP-dependenthelicase activity

13Intracellular

proteintransport

13

Ubiquitin-dependent

protein catabolicprocess

10

ncRNA metabolicprocess

13

mRNA processing4

RNA binding26

GTP binding21

(d)

Figure 3 AgriGO enriched gene ontology categories of SNP-containing genes Single enrichment analyses for all genes containing SNPs areshown for wild type (a) ST77 (b) 764 (c) and ST111 (d) Percentages are reflective of how many terms matched the indicated GO group outof the total matches to the background reference

complexes Of the 127 polymorphic genes shared by alltransgenic specimens 40 annotations returned 10 GO termspertaining to intracellular organelles macromolecular com-plexes and protein complexes Specific GO terms for eachgroup are shown in Figure 3 and complete AgriGO featurerelationship trees are shown in Supplemental Figure 4

4 Discussion

SNP rates have been previously evaluated in soybean usingmultiple fragment analysis on several genotypes of culti-vated varieties including Lincoln Mandarin Peking andRichland [34 35] From this fragment analysis transitionand transversion rates were reported to be nearly identicalbetween cultivars at 48 and 52 respectively Furthermorenucleotide diversity rates of cultivated soybean varietieswere estimated to be 5ndash8-fold less than the wild varietyGlycine soja and occur at an even lower frequency than thehighly characterizedArabidopsis thaliana self-crossingmodel[34] More recent investigations of SNPs and INDELS in

soybean using next generation resequencing have revealedgenome-wide polymorphisms in an effort to identify nativedisease resistant and favorable trait loci [36 37] Furtherstudies demonstrate the extensive narrowing of soybeangenomic variation due to selective pressures resulting fromdomestication practices which favor more valued traits suchas increased seed mass and oil content quantitative traitloci [38] In addition the self-pollinating nature of Glycinemax also contributes evolutionarily to the constriction ofgenomic variances in domesticated cultivars Thus withsuch selective pressures bottlenecking natural variations incultivated soybean varieties it is quite intriguing to seesuch consistently increased numbers of SNPs in only onetransgenic experimental group with no definitive initiatingevent

SNPs detected in our datasets were divulged from seedtranscriptome sequences which represent sim65 of the totalsoybean genome and 65 of total genomic protein codingsequences The SNPs described here are not considered tobe representative of the frequencies that may be detected in

8 International Journal of Plant Genomics

Wild type

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20

(a)

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20 1

chr2

chr3

chr4

ch6

chrhr7

chr8

chr9ch 1

chr12

chr13

chr14

chr17

chr18

chr19ch

ST77

(b)chr1

chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20 1

chr2

chr3

chr4

ch6

chr7

chr8

chr9ch 1

chr12

chr13

chr14

chr17

chr18

chr19ch

764

(c)

ST111

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20

(d)

Figure 4 Overview of the chromosomal distribution of detected transcript SNPs Aligned SNPs are shown for each soybean chromosomein the wild type controls (a) ST77 (b) 764 (c) and ST111 (d) transgenics The colored bars indicate each chromosome scaled to length witha bold outer hash mark representing every 10Mb of sequence and each nonbolded outer hash mark placed every 1Mb Each internal tickrepresents a detected SNP on which subsequent SNPs are stacked vertically from each transcript at that region

other soybean tissues or in different developmental stagesof these specific transgenic plants rather they demonstrateincreased transcript polymorphism rateswitnessed in a singlesoybean transformant out of several different experimentalgroups Although RNA-seq is focused on the functionalsegments of the genome and does not always capture regu-latory regions such as promoters or nontranscribed regions(eg methylated bases) one aspect of the resulting effectsof polymorphisms in these regions can be directly witnessedthrough examination of gene expression levels Because thesoybean expression systemdescribed previously by our lab [5

7 8 13 14 39] specifically targets seed tissue for recombinantprotein accumulation it was of great interest to identifypossible sequence alterations that may have resulted in geneexpression or protein structure variations in seeds which todate has not been investigated

Base changes for wild type ST77 and ST111 events wereall comparable while the 764 event consistently reportedSNP rates nearly double that of the other groups (1 SNPdetected every sim22000 bases) which is still well below thepreviously reported SNP rate of 1 SNP per sim1400 nucleotidesin soybean seeds [40] Nevertheless SNP base changes

International Journal of Plant Genomics 9

appeared to follow the same pattern of commonality withall groups demonstrating high percentages of transition basechanges with relatively low transversion counts The 764 linedid demonstrate a lower TsTv ratio than the other groups(Figure 1(d)) although progeny from two independent par-ents in the ST77 line (ST77D and ST77F) exhibited lowerratios as well indicating that this characteristic alone cannot reliably indicate internal stresses or overall divergencefrom our controls The majority of detected base changeswere located in the third base of the codon likely gener-ating synonymous (silent) polymorphisms which were themajority of predicted SNP classifications in all events Lesscommon changes occurred in the first or second bases of thecodon generating a nonsynonymous or missense mutationlikely resulting in an amino acid change (Figures 2(e)ndash2(h))Missense to silent ratios detected in all experimental groupsappear to be higher than previously published results forsoybean indicated [40] although the higher nonsynonymousto synonymous mutations may be due to high linkagedisequilibrium exhibited in soy [41]

Interestingly the 764 line had the lowest overall missenseto silent polymorphism ratio demonstrating that while the764 line contained the highest overall number of SNPs ahigher percentage of the total polymorphisms were silentmutations (sim45) compared to the other three experimentalgroups (Figure 2(h)) This reveals the possibility that SNPsand INDELS detected here had originated from the origi-nal transformation event and have been highly conservedthrough self-crossed generations of progeny particularlyconsidering how remarkably well conserved the attributesof detected SNPs were between different biological samplesof the same transformation event Without sequencing datafrom prior generations of each parental line tested herefor comparison this can not be confirmed with completeconfidence however this does pose an interesting inquirythat even silent SNPsmay have the ability to alter gene expres-sion and peptide structure Indeed it has been suggestedthat although synonymous polymorphisms code for identicalamino acids subtle changes resulting from the utilization ofnonoptimal synonymous codons can modulate downstreamtranscription and translation rates [42] thereby altering thefolding conformations and proposed function of the finalpeptides This may explain why the 764 transgenic eventexhibited the highest rates of differentially expressed genesin conjunction with polymorphisms but still maintained ahealthy phenotype

While many SNP studies have excluded synonymousmutations and those present in noncoding regions recentworks have examined these sequences in light of theirpossible epigenetic effects on transcription and translationhowever more advanced tools for the prediction of possibleeffects are still in development and require further testingfor reliable forecasts [43] Nonsynonymous polymorphismsappear at similar rates across the experimental groupsvarying between 50 and 60 of total functional SNP clas-sifications While nonsynonymous mutations nearly alwaysalter amino acid sequences that can potentially yield non-functional protein products this phenomenonmay produce aneutral effect on the organism or protein or activate signaling

cascades in redundant systems [43] The effects of thesedetected nonsynonymous polymorphisms in each transgenicevent are not known or identifiable without extensive pro-teomic investigations into the altered downstream transcriptproductsThe 764 line which previously displayed the largestdegree of differential gene expression exhibited the lowestpercentage of missense and nonsense polymorphisms there-fore these alone are not likely to be the root cause of observedchanges in this transgenic line

Spontaneous mutations may also occur as a result of apreviously occurring stressful event such as parental lineAgrobacterium transformation Epigenetic alterations andnucleotide transpositions have been detected up to fivegenerations forward from transformation in Arabidopsis [4445] however all seeds examined here were bred to orbeyond the 5th generation of progeny The causes for thesemutations are unknown however each transformant was anindependent transformation event and was likely exposedto a varying degree of stress during the process This couldpotentially have introduced genomic variations that havebeen carried forward to the transcriptome as even largecopy number of variations have been reported in plants asa response to stress [46] Segregation of SNPs from existingparental heterozygous loci is expected in progeny howeverall samples examined here were remarkably consistent inSNP numbers location distributions and types within theirexperimental groups further demonstrating the genomicstability of these events This further solidifies the possibilitythat the detected variations in all transgenic lines arose froman early parental event possibly occurring during the originalAgrobacterium transformation or planting event and havebeen stably integrated and carried forward to the generationsexamined Considering the consistency of the SNP patternsthese polymorphisms have likely originated at the genomiclevel and are not likely to appear in such large quantities dueto transcriptional proofreading errors

Detected alterations occurred in a ldquobowtierdquo-shapeddistribution across the 20 chromosomes with lowerinstances of SNPs near the centromeres and the highestincidence of SNPs near the euchromatic telomeric ends of thechromosomal arms This is where gene density is the highestwhich has been illustrated previously by several worksconducted on soybean polymorphism rates [36 38 47] OursnpEff dataset summaries available in the Supporting Dataof this manuscript show the same distribution pattern withno significant alterations of SNP rates between biologicalreplicates of progeny Without complete characterization ofthe targeted SNPs in this work future directions can invokefunctional characterizations to cross-reference detectedpolymorphisms with possible connections to differentiallyexpressed genes and also to improve existing annotatedWilliams 82 cultivar SNPs In order to fully deduce the originof SNP variations with high confidence multiple generationanalyses will need to be conducted Approximately 70 ofthe genes that were differentially expressed also containedSNPs or INDELS however due to the extreme disparityin size between each gene list (a maximum of sim3000 andsim40000 genes resp) this could simply be due to chanceand can not definitively hold concrete biological significance

10 International Journal of Plant Genomics

without further investigation This suggests that the SNPsexamined here may directly impact native gene expressionlevels possibly due to their locations within gene regulatoryregions or enhancer elements located near the telomericends of the chromosomal arms

Gene ontology terms from SNP gene lists of all experi-mental groups were very similar returning many enrichedterms regarding protein transport and localization as wellas RNA and transcriptional processes even in the wild typegroup (Figure 3 and Supplemental Figure 4) Interestinglythese relationships are highly similar to the enriched GOterms derived from the differentially expressed gene sets ofthe transgenic lines indicating the majority of functionalrelationships between these polymorphisms are likely arisingfrom intercultivar variations and are not specifically relatedto transgenesis effects However given the possibility thatmutations from transgenesis could occur within QTLs thatcontrol traits such as plant size flower count and seedquantity identification of novel polymorphisms in theseregions of soybeanrsquos transcriptome is of great importanceto both plant breeders and in biotechnology applicationsIn fact recent investigations have revealed an excess of9 million unique SNPs that occur between 106 differentcultivated soybean varieties [48] Here we have providedlists of exonic SNPs unique to each of the three transgenicgroups and specific to transformation as well as novel SNPsdetected in our Williams 82 wild type controls Furtherinvestigations of the events described will identify the sitesof T-DNA integration and characterize the surroundingsequences to determine whether increased incidences ofpolymorphisms occur within the immediate vicinities of theinsertions Unique SNPs identified and reported here may beused to aide these investigations of crucial trait loci in thefuture to maximize yields and assist in breeding programs oftransgenic specimens

5 Conclusions

This study describes the first direct comparison of poly-morphism rates present in the seed transcriptome of threeindependent transgenic lines of Glycine max with a non-transgenic wild type line all derived from the Williams 82soybean cultivar Previously we examined gene expressionchanges in these transgenic events finding only the 764line to be significantly altered when compared to the othertransgenic lines In the same manner we have expanded ourprevious work and reported transcriptome single nucleotidepolymorphism rates for the ST77 ST111 and 764 trans-genic events compared to wild type Mirroring the resultsfrom our differential expression study the samples of the764 transformation event exhibited transcript SNP ratesnearly double that of the wild type ST77 and ST111 eventswhile also demonstrating higher amounts of transversionmutations While the exact cause for both the large num-ber of differentially expressed genes and polymorphismspresent in the 764 event is not known it is highly likelythat the two phenomena are correlated Our results showthat in addition to differential expression transgenesis mayresult in unpredicted base changes within gene transcripts

and regulatory regions without impacting plant pheno-type

Availability of Supporting Data

The RNA sequencing data described herein may be accessedat httpwwwncbinlmnihgovgeoqueryacccgiacc=GS-E64620 through NCBIrsquos Gene Expression Omnibus underaccession number GSE64620 Summary files for bcftoolsand snpEff outputs encompassing variant calls and predictedvariant effects are available from the iPlant CollaborativeDiscovery Environment directory available at httpdeiplan-tcollaborativeorgdld44044BEE-69B8-4129-A6B5-4C9291-B6B4A1snpeffsummarieszip and httpdeiplantcollabora-tiveorgdldAD464040-9169-4E52-87D5-DEF7466A2C1Evariantsummarieszip Lists of genes containing detectedvariants from all samples are available from iPlant in acompressed zip archive available at httpdeiplantcollabor-ativeorgdld533570A3-1EFB-4864-B9A9-9D82F17E09A8snpeffgeneszip

Competing Interests

Theauthors of this manuscript declare no conflict of interests

Authorsrsquo Contributions

Kevin C Lambirth designed the experiment conducted theSNP detection and effect prediction and generated figuresAdamMWhaley compiled and installed software formattedsequencing data files and was consulted on analysis KevinC Lambirth Kenneth J Piller and Kenneth L Bost wrote themanuscript with input from all authors

Acknowledgments

The authors would like to acknowledge the sequencingefforts of Dr Mike Wang (David H Murdock ResearchInstitute) and Dr Laura Reichenberg (Pfeiffer University) forassistance with plant propagation and selection and Dr KanWang (Iowa State University Agronomy Department) for theestablishment of the ST111 transgenic soybean line

References

[1] A Ziemienowicz ldquoAgrobacterium-mediated plant transforma-tion factors applications and recent advancesrdquoBiocatalysis andAgricultural Biotechnology vol 3 no 4 pp 95ndash102 2014

[2] M S Koch J M Ward S L Levine J A Baum J L Viciniand B G Hammond ldquoThe food and environmental safety of Btcropsrdquo Frontiers in Plant Science vol 6 article 283 2015

[3] G M Dill C A CaJacob and S R Padgette ldquoGlyphosate-resistant crops adoption use and future considerationsrdquo PestManagement Science vol 64 no 4 pp 326ndash331 2008

[4] M A S Valente J A Q A Faria J R L Soares-Ramos et alldquoThe ER luminal binding protein (BiP) mediates an increase indrought tolerance in soybean and delays drought-induced leafsenescence in soybean and tobaccordquo Journal of ExperimentalBotany vol 60 no 2 pp 533ndash546 2009

[5] L C Hudson B S Seabolt J Odle K L Bost C H Stahl andK J Piller ldquoSublethal staphylococcal enterotoxin B challenge

International Journal of Plant Genomics 11

model in pigs to evaluate protection following immunizationwith a soybean-derived vaccinerdquoClinical andVaccine Immunol-ogy vol 20 no 1 pp 24ndash32 2013

[6] Y Zhang D Li X Jin and Z Huang ldquoFighting Ebola withZMapp spotlight on plant-made antibodyrdquo Science China LifeSciences vol 57 no 10 pp 987ndash988 2014

[7] R Powell L C Hudson K C Lambirth et al ldquoRecombinantexpression of homodimeric 660 kDa human thyroglobulin insoybean seeds an alternative source of human thyroglobulinrdquoPlant Cell Reports vol 30 no 7 pp 1327ndash1338 2011

[8] K L Bost K C Lambirth L C Hudson and K J PillerldquoSoybean-derived thyroglobulin as an analyte specific reagentfor in vitro diagnostic tests and devicesrdquo Advances in Medicineand Biology vol 80 pp 23ndash40 2014

[9] R Abiri A Valdiani M Maziah et al ldquoA critical review ofthe concept of transgenic plants insights into pharmaceuticalbiotechnology andmolecular farmingrdquo Current Issues in Molec-ular Biology vol 18 no 1 pp 21ndash42 2016

[10] J L Oakes K L Bost and K J Piller ldquoStability of a soybeanseed-derived vaccine antigen following long-term storage pro-cessing and transport in the absence of a cold chainrdquo Journal ofthe Science of Food andAgriculture vol 89 no 13 pp 2191ndash21992009

[11] R Garg M Tolbert J L Oakes T E Clemente K L Bostand K J Piller ldquoChloroplast targeting of FanC the majorantigenic subunit of Escherichia coli K99 fimbriae in transgenicsoybeanrdquo Plant Cell Reports vol 26 no 7 pp 1011ndash1023 2007

[12] K J Piller T E Clemente S M Jun et al ldquoExpression andimmunogenicity of an Escherichia coli K99 fimbriae subunitantigen in soybeanrdquo Planta vol 222 no 1 pp 6ndash18 2005

[13] L C Hudson R Garg K L Bost and K J Piller ldquoSoybeanseeds a practical host for the production of functional subunitvaccinesrdquo BioMed Research International vol 2014 Article ID340804 13 pages 2014

[14] K L Bost and K J Piller ldquoProtein expression systems whysoybean seedsrdquo in SoybeanmdashMolecular Aspects of Breeding ASudaric Ed Intech Open 2011

[15] D Tilman K G Cassman P A Matson R Naylor and SPolasky ldquoAgricultural sustainability and intensive productionpracticesrdquo Nature vol 418 no 6898 pp 671ndash677 2002

[16] G N De La FuenteU K Frei and T Lubberstedt ldquoAcceleratingplant breedingrdquo Trends in Plant Science vol 18 no 12 pp 667ndash672 2013

[17] L C Hudson K C Lambirth K L Bost and K L PillerAdvancements in Transgenic Soy From Field to Bedside InTech2013

[18] J D Ray T C Kilen C A Abel and R L Paris ldquoSoybeannatural cross-pollination rates under field conditionsrdquo Environ-mental Biosafety Research vol 2 no 2 pp 133ndash138 2003

[19] K L Hart S L Kimura V Mushailov Z M BudimlijaM Prinz and E Wurmbach ldquoImproved eye- and skin-colorprediction based on 8 SNPsrdquo Croatian Medical Journal vol 54no 3 pp 248ndash256 2013

[20] S E Medland T Zayats B Glaser et al ldquoA variant inLIN28B is associatedwith 2D 4D finger-length ratio a putativeretrospective biomarker of prenatal testosterone exposurerdquoTheAmerican Journal of HumanGenetics vol 86 no 4 pp 519ndash5252010

[21] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[22] Y-H Li S-C Zhao J-X Ma et al ldquoMolecular footprints ofdomestication and improvement in soybean revealed by wholegenome re-sequencingrdquo BMC Genomics vol 14 article 5792013

[23] T Joshi B Valliyodan J-H Wu S-H Lee D Xu and H TNguyen ldquoGenomic differences between cultivated soybean Gmax and its wild relative G sojardquo BMCGenomics vol 14 articleS5 2013

[24] K C Lambirth AMWhaley I C Blakley et al ldquoA comparisonof transgenic andwild type soybean seeds analysis of transcrip-tome profiles using RNA-Seqrdquo BMC Biotechnology vol 15 no1 article 89 2015

[25] H Li ldquoA statistical framework for SNP callingmutation discov-ery association mapping and population genetical parameterestimation from sequencing datardquo Bioinformatics vol 27 no 21pp 2987ndash2993 2011

[26] H Li B Handsaker A Wysoker et al ldquoThe sequence align-mentMap format and SAMtoolsrdquo Bioinformatics vol 25 no16 pp 2078ndash2079 2009

[27] D M Goodstein S Shu R Howson et al ldquoPhytozome acomparative platform for green plant genomicsrdquo Nucleic AcidsResearch vol 40 no 1 pp D1178ndashD1186 2012

[28] C Trapnell A Roberts L Goff et al ldquoDifferential gene andtranscript expression analysis of RNA-seq experiments withTopHat and Cufflinksrdquo Nature Protocols vol 7 no 3 pp 562ndash578 2012

[29] PDanecek A AutonG Abecasis et al ldquoThe variant call formatand VCFtoolsrdquo Bioinformatics vol 27 no 15 pp 2156ndash21582011

[30] M Krzywinski J Schein I Birol et al ldquoCircos an informationaesthetic for comparative genomicsrdquo Genome Research vol 19no 9 pp 1639ndash1645 2009

[31] P Cingolani A Platts L L Wang et al ldquoA program forannotating and predicting the effects of single nucleotidepolymorphisms SnpEff SNPs in the genome of Drosophilamelanogaster strain w 1118 iso-2 iso-3rdquo Fly vol 6 no 2 pp80ndash92 2012

[32] Z Du X Zhou Y Ling Z Zhang and Z Su ldquoagriGO A GOanalysis toolkit for the agricultural communityrdquo Nucleic AcidsResearch vol 38 no 2 Article ID gkq310 pp W64ndashW70 2010

[33] S AGoffM Vaughn SMcKay et al ldquoThe iPlant collaborativecyberinfrastructure for plant biologyrdquo Frontiers in Plant Sciencevol 2 article 34 2011

[34] Y L Zhu Q J Song D L Hyten et al ldquoSingle-nucleotidepolymorphisms in soybeanrdquo Genetics vol 163 no 3 pp 1123ndash1134 2003

[35] Y Shu Y Li Z Zhu et al ldquoSNPs discovery and CAPS markerconversion in soybeanrdquo Molecular Biology Reports vol 38 no3 pp 1841ndash1846 2011

[36] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[37] X Wu C Ren T Joshi T Vuong D Xu and H T NguyenldquoSNP discovery by high-throughput sequencing in soybeanrdquoBMC Genomics vol 11 article 469 2010

[38] Y-H Li J C Reif S A Jackson Y-S Ma R-Z Chang and L-J Qiu ldquoDetecting SNPs underlying domestication-related traitsin soybeanrdquo BMC Plant Biology vol 14 article 251 2014

12 International Journal of Plant Genomics

[39] L C Hudson K L Bost and K J Piller ldquoOptimizing recom-binant protein expression in soybeanrdquo in SoybeanmdashMolecularAspects of Breeding InTech Rijeka Croatia 2011

[40] W Goettel E Xia R Upchurch M-L Wang P Chen andY-Q C An ldquoIdentification and characterization of transcriptpolymorphisms in soybean lines varying in oil composition andcontentrdquo BMC Genomics vol 15 article 299 2014

[41] C Chan X Qi M-W Li F-L Wong and H-M Lam ldquoRecentdevelopments of genomic research in soybeanrdquo Journal ofGenetics and Genomics vol 39 no 7 pp 317ndash324 2012

[42] A A Komar ldquoSNPs silent but not invisiblerdquo Science vol 315no 5811 pp 466ndash467 2007

[43] P Katsonis A Koire S J Wilson et al ldquoSingle nucleotidevariations biological impact and theoretical interpretationrdquoProtein Science vol 23 no 12 pp 1650ndash1666 2014

[44] W J Haun D L Hyten W W Xu et al ldquoThe composition andorigins of genomic variation among individuals of the soybeanreference cultivar williams 82rdquo Plant Physiology vol 155 no 2pp 645ndash655 2011

[45] S DeBolt ldquoCopy number variation shapes genome diversityin Arabidopsis over immediate family generational scalesrdquoGenome Biology and Evolution vol 2 no 1 pp 441ndash453 2010

[46] A Zmienko A Samelak P Kozłowski and M FiglerowiczldquoCopy number polymorphism in plant genomesrdquo Theoreticaland Applied Genetics vol 127 no 1 pp 1ndash18 2014

[47] Y-G Lee N Jeong J H Kim et al ldquoDevelopment validationand genetic analysis of a large soybean SNP genotyping arrayrdquoPlant Journal vol 81 no 4 pp 625ndash636 2015

[48] B Valliyodan Q Dan G Patil et al ldquoLandscape of genomicdiversity and trait discovery in soybeanrdquo Scientific Reports vol6 article 23598 2016

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Research Article Transcript Polymorphism Rates in …downloads.hindawi.com/journals/ijpg/2016/1562041.pdfResearch Article Transcript Polymorphism Rates in Soybean Seed Tissue Are Increased

International Journal of Plant Genomics 5

000

100000

200000

300000

400000

500000

600000

700000

Wild type

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Silent3892

Nonsense311

Missense5797

Other051

Downstream2253

Exon1980

Intergenic263

Intron2776

Splice siteregion199

Upstream1537

510431

UTR 5998400

UTR 3998400

000

100000

200000

300000

400000

500000

600000

700000

Transgenic

39

3

58

046

2332

1968

283

2675183

1578

521

414

Silent

Nonsense

Missense

Other

Downstream

Exon

Intergenic

Intron

Splice siteregion

Upstream

UTR 5998400

UTR 3998400

000

100000

200000

300000

400000

500000

600000

700000

041

2329

2137

291

2491213

1516

555427

3877

3095814

Silent

NonsenseMissense

Other

Downstream

Exon

Intergenic

Intron

Splice siteregion

Upstream

UTR 5998400

UTR 3998400

000

100000

200000

300000

400000

500000

600000

700000

4448

181

5371

003

2505

1994

377

2357125

1619

Silent

Nonsense

Missense

Other

Downstream

Exon

Intergenic

Intron

Splice siteregion

Upstream

UTR 5998400

UTR 3998400

409

584

ArarrC

ArarrG

ArarrT

CrarrA

CrarrG

CrarrT

GrarrA

GrarrC

GrarrT

TrarrA

TrarrC

TrarrG

Figure 2 Polymorphism nucleotide base changes and predicted effects Mean rates of change for each nucleotide base are shown for wildtype (a) ST111 (b) ST77 (c) and 764 (d) with bars indicating the standard error of the mean Regions containing detected polymorphismsand their functional classifications are shown for wild type (e) ST111 (f) ST77 (g) and 764 (h)

6 International Journal of Plant Genomics

limited to SNPs insertions and deletions of bases Acrossall experimental groups polymorphisms classified as loweffect comprise sim10 of the total while moderate effects andhigh effects average 12 and 2 respectively The largestcategory of impact classification is the genomic modifiercategory comprising 72ndash78 of detected polymorphisms inall groups and 60 of shared variants between all transgenics(see snpEff reports in Supporting Data) Missense mutationsare highly prevalent in all four groups averaging sim60in wild type ST111 and ST77 events with a missense tosilent mutation ratio of sim16 (Figures 2(e)ndash2(g)) The 764group exhibited a slightly lower missense percentage ofsim53 however it also contained a higher percentage ofsilent mutations at sim45 compared to the other three groups(sim37) generating a missensesilent mutation ratio of sim120(Figure 2(h)) Nonsense mutations comprised sim3 of thetotal reported SNP effects for the wild type ST111 and ST77groups and sim18 in the 764 group (Figures 2(e)ndash2(h))

All four experimental groups showed a high prevalenceof SNPs of common types namely resulting in down-stream gene variants (sim24) intron variants (sim25) mis-sense variants (sim11) synonymous variants (sim7) andupstream gene variants (sim16)The most common locationsfor these variants are intron regions (sim25) the down-stream transcription start site (sim23) exons (sim23) introns(sim23) and the upstream transcription start site (sim16)The 764 group distribution of polymorphism types andaffected regions seem to mirror those of the wild typeST111 and ST77 groups with a slight reduction of SNPs inexonic regions All individuals also reported variants within51015840 and 31015840 untranslated regions (sim4 and sim6 resp) aswell as minimal detection of frameshifts stop and startvariants (lt1) Regions and predicted effects of detectedpolymorphisms are shown in Figures 2(e)ndash2(h)

Transition base changes of AharrG and CharrT were mostprevalent in all four experimental groups while the transver-sion AharrT was sim50 more common than AharrC CharrG orGharrT transversions Codon changes varied most commonlyat the wobble base position typically resulting from a tran-sition in both wild type and transgenic samples FollowingSNP patterns previously described the prevailing modifiedcodons appeared commonly shared between all individualsmeasured Base changes in the first and second positionsof the codon were much less frequent across all groupsHeat maps of the codon base changes representative for eachexperimental group are shown in Supplemental Figure 1Predicted amino acid changes resulting from codon alter-ations predominantly consist of synonymous substitutionshowever other frequently altered residues include valineto alanine alanine to valine leucine to phenylalanine andproline glutamate to glycine and lysine serine to proline andphenylalanine to leucine Supplemental Figure 2 shows thedistribution heat maps of detected amino acid substitutionaverages representative for all events

Minor changes were also detected within the additionalgenomic scaffolds containing the transgene sequences sup-plemented to the alignment reference file Specifically snpEffreported SNPs in the transgenes of samples 764B3 764H3and 764K1 at position 80 ST77D2 at position 1400 ST77F1

at positions 500 and 700 ST77J3 at position 600 ST111B3 atposition 800 ST111I3 at position 190 and ST111K1 ST111B1and ST111K2 at position 1010 Read alignments were unableto verify all called variants in the transgenes likely due toareas of low coverage however one area was corroborated asa consistent variant in the fillerDNAof all constructs betweenthe right border and promoter region (Supplemental Figure3) which was included as a control Supplemental Table 1summarizes all SNP calls and between group statistical testsSummaries of all SNPINDEL calls from bcftools and theirrespective effects from snpEff along with chromosomal dis-tribution plots additional statistics and lists of SNPs sharedbetween groups and within each group are available from theiPlant collaborative Discovery Environment [33] Links to therespective datasets are provided in the SupportingData of thismanuscript

33 Gene Ontology Analysis of Genes Containing SNPs Giventhe extent of detected polymorphic sites and the possibilitiesof effectual contributions to gene expression changes previ-ously documented in the 764 transgenic line it was necessaryto investigate genes containing SNPs and INDELS withgene ontology to identify the possible functional pathwayscontrolled by the affected transcripts Complete gene listscontaining SNPs provided by the output of snpEff for thewild type (36959 transcripts) ST77 (38691 transcripts)ST111 (34142 transcripts) and 764 (43426 transcripts) lineswere loaded into the AgriGO online gene ontology analysistool and subjected to a single enrichment analysis withmultiple corrections to deduce the functional roles of poly-morphic genes Out of the total Glycine maxV21 backgroundset of 29501 GO terms wild type matched 11245 ST77matched 11660 ST111 matched 10326 and 764 matched13321 GO categories were similar between the wild typeST77 and ST111 groups with minor variations with themajority of terms putatively annotated to translational orRNA processes (Figures 3(a) 3(b) and 3(d)) Intracellulartransport also constituted a large portion of the identifiedcategories (sim12 of total annotations) and in the case ofthe 764 transgenic line was the only other significant GOterm following RNA processing and binding (Figure 3(c))Three categories including ATP-dependent helicase activityubiquitin-dependent protein catabolic processes and ncRNAmetabolic processes were all unique to the ST77 and ST111lines (Figures 3(b) and 3(d)) GTP binding was exclusive tothe ST111 line comprising 21 of the total significant GOterms for the event Figure 4 shows all detected SNPs in achromosomal density map for all four experimental groups

Annotated GO terms for SNPs exclusive to each trans-genic event displayed similar functions with ST77 reporting295 gene variants involved with intracellular organellesphotosystem II cytoskeletal components and protein com-plexes ST111 displayed similar numbers of terms totaling 273annotations including photosynthesis and photosystem IIintracellular organelles ribonucleoprotein complexes thy-lakoids protein complexes and oxidoreductase activity The764 line exhibited the highest number of unique annotatedvariants with 2366 involving translation gene expressionbiosynthetic processes ribosomal structures and protein

International Journal of Plant Genomics 7

Wild typeaminoacylation

tRNA

for proteintranslation

5RNA binding

23

Vesicle-mediatedtransport

13Intracellularprotein transport

12

mRNA metabolic process

4

RNA processing13

Translation30

(a)

ST77ATP-dependenthelicase activity

10

RNA binding19

Ribonucleoproteincomplex biogenesis

4

Intracellularprotein transport

11

ncRNAmetabolic process

10

RNA processing12

Translation27

Ubiquitin-dependentprotein catabolic

process7

(b)

764

Intracellulartransport

25

RNA processing27

RNA binding48

(c)

ST111 ATP-dependenthelicase activity

13Intracellular

proteintransport

13

Ubiquitin-dependent

protein catabolicprocess

10

ncRNA metabolicprocess

13

mRNA processing4

RNA binding26

GTP binding21

(d)

Figure 3 AgriGO enriched gene ontology categories of SNP-containing genes Single enrichment analyses for all genes containing SNPs areshown for wild type (a) ST77 (b) 764 (c) and ST111 (d) Percentages are reflective of how many terms matched the indicated GO group outof the total matches to the background reference

complexes Of the 127 polymorphic genes shared by alltransgenic specimens 40 annotations returned 10 GO termspertaining to intracellular organelles macromolecular com-plexes and protein complexes Specific GO terms for eachgroup are shown in Figure 3 and complete AgriGO featurerelationship trees are shown in Supplemental Figure 4

4 Discussion

SNP rates have been previously evaluated in soybean usingmultiple fragment analysis on several genotypes of culti-vated varieties including Lincoln Mandarin Peking andRichland [34 35] From this fragment analysis transitionand transversion rates were reported to be nearly identicalbetween cultivars at 48 and 52 respectively Furthermorenucleotide diversity rates of cultivated soybean varietieswere estimated to be 5ndash8-fold less than the wild varietyGlycine soja and occur at an even lower frequency than thehighly characterizedArabidopsis thaliana self-crossingmodel[34] More recent investigations of SNPs and INDELS in

soybean using next generation resequencing have revealedgenome-wide polymorphisms in an effort to identify nativedisease resistant and favorable trait loci [36 37] Furtherstudies demonstrate the extensive narrowing of soybeangenomic variation due to selective pressures resulting fromdomestication practices which favor more valued traits suchas increased seed mass and oil content quantitative traitloci [38] In addition the self-pollinating nature of Glycinemax also contributes evolutionarily to the constriction ofgenomic variances in domesticated cultivars Thus withsuch selective pressures bottlenecking natural variations incultivated soybean varieties it is quite intriguing to seesuch consistently increased numbers of SNPs in only onetransgenic experimental group with no definitive initiatingevent

SNPs detected in our datasets were divulged from seedtranscriptome sequences which represent sim65 of the totalsoybean genome and 65 of total genomic protein codingsequences The SNPs described here are not considered tobe representative of the frequencies that may be detected in

8 International Journal of Plant Genomics

Wild type

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20

(a)

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20 1

chr2

chr3

chr4

ch6

chrhr7

chr8

chr9ch 1

chr12

chr13

chr14

chr17

chr18

chr19ch

ST77

(b)chr1

chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20 1

chr2

chr3

chr4

ch6

chr7

chr8

chr9ch 1

chr12

chr13

chr14

chr17

chr18

chr19ch

764

(c)

ST111

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20

(d)

Figure 4 Overview of the chromosomal distribution of detected transcript SNPs Aligned SNPs are shown for each soybean chromosomein the wild type controls (a) ST77 (b) 764 (c) and ST111 (d) transgenics The colored bars indicate each chromosome scaled to length witha bold outer hash mark representing every 10Mb of sequence and each nonbolded outer hash mark placed every 1Mb Each internal tickrepresents a detected SNP on which subsequent SNPs are stacked vertically from each transcript at that region

other soybean tissues or in different developmental stagesof these specific transgenic plants rather they demonstrateincreased transcript polymorphism rateswitnessed in a singlesoybean transformant out of several different experimentalgroups Although RNA-seq is focused on the functionalsegments of the genome and does not always capture regu-latory regions such as promoters or nontranscribed regions(eg methylated bases) one aspect of the resulting effectsof polymorphisms in these regions can be directly witnessedthrough examination of gene expression levels Because thesoybean expression systemdescribed previously by our lab [5

7 8 13 14 39] specifically targets seed tissue for recombinantprotein accumulation it was of great interest to identifypossible sequence alterations that may have resulted in geneexpression or protein structure variations in seeds which todate has not been investigated

Base changes for wild type ST77 and ST111 events wereall comparable while the 764 event consistently reportedSNP rates nearly double that of the other groups (1 SNPdetected every sim22000 bases) which is still well below thepreviously reported SNP rate of 1 SNP per sim1400 nucleotidesin soybean seeds [40] Nevertheless SNP base changes

International Journal of Plant Genomics 9

appeared to follow the same pattern of commonality withall groups demonstrating high percentages of transition basechanges with relatively low transversion counts The 764 linedid demonstrate a lower TsTv ratio than the other groups(Figure 1(d)) although progeny from two independent par-ents in the ST77 line (ST77D and ST77F) exhibited lowerratios as well indicating that this characteristic alone cannot reliably indicate internal stresses or overall divergencefrom our controls The majority of detected base changeswere located in the third base of the codon likely gener-ating synonymous (silent) polymorphisms which were themajority of predicted SNP classifications in all events Lesscommon changes occurred in the first or second bases of thecodon generating a nonsynonymous or missense mutationlikely resulting in an amino acid change (Figures 2(e)ndash2(h))Missense to silent ratios detected in all experimental groupsappear to be higher than previously published results forsoybean indicated [40] although the higher nonsynonymousto synonymous mutations may be due to high linkagedisequilibrium exhibited in soy [41]

Interestingly the 764 line had the lowest overall missenseto silent polymorphism ratio demonstrating that while the764 line contained the highest overall number of SNPs ahigher percentage of the total polymorphisms were silentmutations (sim45) compared to the other three experimentalgroups (Figure 2(h)) This reveals the possibility that SNPsand INDELS detected here had originated from the origi-nal transformation event and have been highly conservedthrough self-crossed generations of progeny particularlyconsidering how remarkably well conserved the attributesof detected SNPs were between different biological samplesof the same transformation event Without sequencing datafrom prior generations of each parental line tested herefor comparison this can not be confirmed with completeconfidence however this does pose an interesting inquirythat even silent SNPsmay have the ability to alter gene expres-sion and peptide structure Indeed it has been suggestedthat although synonymous polymorphisms code for identicalamino acids subtle changes resulting from the utilization ofnonoptimal synonymous codons can modulate downstreamtranscription and translation rates [42] thereby altering thefolding conformations and proposed function of the finalpeptides This may explain why the 764 transgenic eventexhibited the highest rates of differentially expressed genesin conjunction with polymorphisms but still maintained ahealthy phenotype

While many SNP studies have excluded synonymousmutations and those present in noncoding regions recentworks have examined these sequences in light of theirpossible epigenetic effects on transcription and translationhowever more advanced tools for the prediction of possibleeffects are still in development and require further testingfor reliable forecasts [43] Nonsynonymous polymorphismsappear at similar rates across the experimental groupsvarying between 50 and 60 of total functional SNP clas-sifications While nonsynonymous mutations nearly alwaysalter amino acid sequences that can potentially yield non-functional protein products this phenomenonmay produce aneutral effect on the organism or protein or activate signaling

cascades in redundant systems [43] The effects of thesedetected nonsynonymous polymorphisms in each transgenicevent are not known or identifiable without extensive pro-teomic investigations into the altered downstream transcriptproductsThe 764 line which previously displayed the largestdegree of differential gene expression exhibited the lowestpercentage of missense and nonsense polymorphisms there-fore these alone are not likely to be the root cause of observedchanges in this transgenic line

Spontaneous mutations may also occur as a result of apreviously occurring stressful event such as parental lineAgrobacterium transformation Epigenetic alterations andnucleotide transpositions have been detected up to fivegenerations forward from transformation in Arabidopsis [4445] however all seeds examined here were bred to orbeyond the 5th generation of progeny The causes for thesemutations are unknown however each transformant was anindependent transformation event and was likely exposedto a varying degree of stress during the process This couldpotentially have introduced genomic variations that havebeen carried forward to the transcriptome as even largecopy number of variations have been reported in plants asa response to stress [46] Segregation of SNPs from existingparental heterozygous loci is expected in progeny howeverall samples examined here were remarkably consistent inSNP numbers location distributions and types within theirexperimental groups further demonstrating the genomicstability of these events This further solidifies the possibilitythat the detected variations in all transgenic lines arose froman early parental event possibly occurring during the originalAgrobacterium transformation or planting event and havebeen stably integrated and carried forward to the generationsexamined Considering the consistency of the SNP patternsthese polymorphisms have likely originated at the genomiclevel and are not likely to appear in such large quantities dueto transcriptional proofreading errors

Detected alterations occurred in a ldquobowtierdquo-shapeddistribution across the 20 chromosomes with lowerinstances of SNPs near the centromeres and the highestincidence of SNPs near the euchromatic telomeric ends of thechromosomal arms This is where gene density is the highestwhich has been illustrated previously by several worksconducted on soybean polymorphism rates [36 38 47] OursnpEff dataset summaries available in the Supporting Dataof this manuscript show the same distribution pattern withno significant alterations of SNP rates between biologicalreplicates of progeny Without complete characterization ofthe targeted SNPs in this work future directions can invokefunctional characterizations to cross-reference detectedpolymorphisms with possible connections to differentiallyexpressed genes and also to improve existing annotatedWilliams 82 cultivar SNPs In order to fully deduce the originof SNP variations with high confidence multiple generationanalyses will need to be conducted Approximately 70 ofthe genes that were differentially expressed also containedSNPs or INDELS however due to the extreme disparityin size between each gene list (a maximum of sim3000 andsim40000 genes resp) this could simply be due to chanceand can not definitively hold concrete biological significance

10 International Journal of Plant Genomics

without further investigation This suggests that the SNPsexamined here may directly impact native gene expressionlevels possibly due to their locations within gene regulatoryregions or enhancer elements located near the telomericends of the chromosomal arms

Gene ontology terms from SNP gene lists of all experi-mental groups were very similar returning many enrichedterms regarding protein transport and localization as wellas RNA and transcriptional processes even in the wild typegroup (Figure 3 and Supplemental Figure 4) Interestinglythese relationships are highly similar to the enriched GOterms derived from the differentially expressed gene sets ofthe transgenic lines indicating the majority of functionalrelationships between these polymorphisms are likely arisingfrom intercultivar variations and are not specifically relatedto transgenesis effects However given the possibility thatmutations from transgenesis could occur within QTLs thatcontrol traits such as plant size flower count and seedquantity identification of novel polymorphisms in theseregions of soybeanrsquos transcriptome is of great importanceto both plant breeders and in biotechnology applicationsIn fact recent investigations have revealed an excess of9 million unique SNPs that occur between 106 differentcultivated soybean varieties [48] Here we have providedlists of exonic SNPs unique to each of the three transgenicgroups and specific to transformation as well as novel SNPsdetected in our Williams 82 wild type controls Furtherinvestigations of the events described will identify the sitesof T-DNA integration and characterize the surroundingsequences to determine whether increased incidences ofpolymorphisms occur within the immediate vicinities of theinsertions Unique SNPs identified and reported here may beused to aide these investigations of crucial trait loci in thefuture to maximize yields and assist in breeding programs oftransgenic specimens

5 Conclusions

This study describes the first direct comparison of poly-morphism rates present in the seed transcriptome of threeindependent transgenic lines of Glycine max with a non-transgenic wild type line all derived from the Williams 82soybean cultivar Previously we examined gene expressionchanges in these transgenic events finding only the 764line to be significantly altered when compared to the othertransgenic lines In the same manner we have expanded ourprevious work and reported transcriptome single nucleotidepolymorphism rates for the ST77 ST111 and 764 trans-genic events compared to wild type Mirroring the resultsfrom our differential expression study the samples of the764 transformation event exhibited transcript SNP ratesnearly double that of the wild type ST77 and ST111 eventswhile also demonstrating higher amounts of transversionmutations While the exact cause for both the large num-ber of differentially expressed genes and polymorphismspresent in the 764 event is not known it is highly likelythat the two phenomena are correlated Our results showthat in addition to differential expression transgenesis mayresult in unpredicted base changes within gene transcripts

and regulatory regions without impacting plant pheno-type

Availability of Supporting Data

The RNA sequencing data described herein may be accessedat httpwwwncbinlmnihgovgeoqueryacccgiacc=GS-E64620 through NCBIrsquos Gene Expression Omnibus underaccession number GSE64620 Summary files for bcftoolsand snpEff outputs encompassing variant calls and predictedvariant effects are available from the iPlant CollaborativeDiscovery Environment directory available at httpdeiplan-tcollaborativeorgdld44044BEE-69B8-4129-A6B5-4C9291-B6B4A1snpeffsummarieszip and httpdeiplantcollabora-tiveorgdldAD464040-9169-4E52-87D5-DEF7466A2C1Evariantsummarieszip Lists of genes containing detectedvariants from all samples are available from iPlant in acompressed zip archive available at httpdeiplantcollabor-ativeorgdld533570A3-1EFB-4864-B9A9-9D82F17E09A8snpeffgeneszip

Competing Interests

Theauthors of this manuscript declare no conflict of interests

Authorsrsquo Contributions

Kevin C Lambirth designed the experiment conducted theSNP detection and effect prediction and generated figuresAdamMWhaley compiled and installed software formattedsequencing data files and was consulted on analysis KevinC Lambirth Kenneth J Piller and Kenneth L Bost wrote themanuscript with input from all authors

Acknowledgments

The authors would like to acknowledge the sequencingefforts of Dr Mike Wang (David H Murdock ResearchInstitute) and Dr Laura Reichenberg (Pfeiffer University) forassistance with plant propagation and selection and Dr KanWang (Iowa State University Agronomy Department) for theestablishment of the ST111 transgenic soybean line

References

[1] A Ziemienowicz ldquoAgrobacterium-mediated plant transforma-tion factors applications and recent advancesrdquoBiocatalysis andAgricultural Biotechnology vol 3 no 4 pp 95ndash102 2014

[2] M S Koch J M Ward S L Levine J A Baum J L Viciniand B G Hammond ldquoThe food and environmental safety of Btcropsrdquo Frontiers in Plant Science vol 6 article 283 2015

[3] G M Dill C A CaJacob and S R Padgette ldquoGlyphosate-resistant crops adoption use and future considerationsrdquo PestManagement Science vol 64 no 4 pp 326ndash331 2008

[4] M A S Valente J A Q A Faria J R L Soares-Ramos et alldquoThe ER luminal binding protein (BiP) mediates an increase indrought tolerance in soybean and delays drought-induced leafsenescence in soybean and tobaccordquo Journal of ExperimentalBotany vol 60 no 2 pp 533ndash546 2009

[5] L C Hudson B S Seabolt J Odle K L Bost C H Stahl andK J Piller ldquoSublethal staphylococcal enterotoxin B challenge

International Journal of Plant Genomics 11

model in pigs to evaluate protection following immunizationwith a soybean-derived vaccinerdquoClinical andVaccine Immunol-ogy vol 20 no 1 pp 24ndash32 2013

[6] Y Zhang D Li X Jin and Z Huang ldquoFighting Ebola withZMapp spotlight on plant-made antibodyrdquo Science China LifeSciences vol 57 no 10 pp 987ndash988 2014

[7] R Powell L C Hudson K C Lambirth et al ldquoRecombinantexpression of homodimeric 660 kDa human thyroglobulin insoybean seeds an alternative source of human thyroglobulinrdquoPlant Cell Reports vol 30 no 7 pp 1327ndash1338 2011

[8] K L Bost K C Lambirth L C Hudson and K J PillerldquoSoybean-derived thyroglobulin as an analyte specific reagentfor in vitro diagnostic tests and devicesrdquo Advances in Medicineand Biology vol 80 pp 23ndash40 2014

[9] R Abiri A Valdiani M Maziah et al ldquoA critical review ofthe concept of transgenic plants insights into pharmaceuticalbiotechnology andmolecular farmingrdquo Current Issues in Molec-ular Biology vol 18 no 1 pp 21ndash42 2016

[10] J L Oakes K L Bost and K J Piller ldquoStability of a soybeanseed-derived vaccine antigen following long-term storage pro-cessing and transport in the absence of a cold chainrdquo Journal ofthe Science of Food andAgriculture vol 89 no 13 pp 2191ndash21992009

[11] R Garg M Tolbert J L Oakes T E Clemente K L Bostand K J Piller ldquoChloroplast targeting of FanC the majorantigenic subunit of Escherichia coli K99 fimbriae in transgenicsoybeanrdquo Plant Cell Reports vol 26 no 7 pp 1011ndash1023 2007

[12] K J Piller T E Clemente S M Jun et al ldquoExpression andimmunogenicity of an Escherichia coli K99 fimbriae subunitantigen in soybeanrdquo Planta vol 222 no 1 pp 6ndash18 2005

[13] L C Hudson R Garg K L Bost and K J Piller ldquoSoybeanseeds a practical host for the production of functional subunitvaccinesrdquo BioMed Research International vol 2014 Article ID340804 13 pages 2014

[14] K L Bost and K J Piller ldquoProtein expression systems whysoybean seedsrdquo in SoybeanmdashMolecular Aspects of Breeding ASudaric Ed Intech Open 2011

[15] D Tilman K G Cassman P A Matson R Naylor and SPolasky ldquoAgricultural sustainability and intensive productionpracticesrdquo Nature vol 418 no 6898 pp 671ndash677 2002

[16] G N De La FuenteU K Frei and T Lubberstedt ldquoAcceleratingplant breedingrdquo Trends in Plant Science vol 18 no 12 pp 667ndash672 2013

[17] L C Hudson K C Lambirth K L Bost and K L PillerAdvancements in Transgenic Soy From Field to Bedside InTech2013

[18] J D Ray T C Kilen C A Abel and R L Paris ldquoSoybeannatural cross-pollination rates under field conditionsrdquo Environ-mental Biosafety Research vol 2 no 2 pp 133ndash138 2003

[19] K L Hart S L Kimura V Mushailov Z M BudimlijaM Prinz and E Wurmbach ldquoImproved eye- and skin-colorprediction based on 8 SNPsrdquo Croatian Medical Journal vol 54no 3 pp 248ndash256 2013

[20] S E Medland T Zayats B Glaser et al ldquoA variant inLIN28B is associatedwith 2D 4D finger-length ratio a putativeretrospective biomarker of prenatal testosterone exposurerdquoTheAmerican Journal of HumanGenetics vol 86 no 4 pp 519ndash5252010

[21] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[22] Y-H Li S-C Zhao J-X Ma et al ldquoMolecular footprints ofdomestication and improvement in soybean revealed by wholegenome re-sequencingrdquo BMC Genomics vol 14 article 5792013

[23] T Joshi B Valliyodan J-H Wu S-H Lee D Xu and H TNguyen ldquoGenomic differences between cultivated soybean Gmax and its wild relative G sojardquo BMCGenomics vol 14 articleS5 2013

[24] K C Lambirth AMWhaley I C Blakley et al ldquoA comparisonof transgenic andwild type soybean seeds analysis of transcrip-tome profiles using RNA-Seqrdquo BMC Biotechnology vol 15 no1 article 89 2015

[25] H Li ldquoA statistical framework for SNP callingmutation discov-ery association mapping and population genetical parameterestimation from sequencing datardquo Bioinformatics vol 27 no 21pp 2987ndash2993 2011

[26] H Li B Handsaker A Wysoker et al ldquoThe sequence align-mentMap format and SAMtoolsrdquo Bioinformatics vol 25 no16 pp 2078ndash2079 2009

[27] D M Goodstein S Shu R Howson et al ldquoPhytozome acomparative platform for green plant genomicsrdquo Nucleic AcidsResearch vol 40 no 1 pp D1178ndashD1186 2012

[28] C Trapnell A Roberts L Goff et al ldquoDifferential gene andtranscript expression analysis of RNA-seq experiments withTopHat and Cufflinksrdquo Nature Protocols vol 7 no 3 pp 562ndash578 2012

[29] PDanecek A AutonG Abecasis et al ldquoThe variant call formatand VCFtoolsrdquo Bioinformatics vol 27 no 15 pp 2156ndash21582011

[30] M Krzywinski J Schein I Birol et al ldquoCircos an informationaesthetic for comparative genomicsrdquo Genome Research vol 19no 9 pp 1639ndash1645 2009

[31] P Cingolani A Platts L L Wang et al ldquoA program forannotating and predicting the effects of single nucleotidepolymorphisms SnpEff SNPs in the genome of Drosophilamelanogaster strain w 1118 iso-2 iso-3rdquo Fly vol 6 no 2 pp80ndash92 2012

[32] Z Du X Zhou Y Ling Z Zhang and Z Su ldquoagriGO A GOanalysis toolkit for the agricultural communityrdquo Nucleic AcidsResearch vol 38 no 2 Article ID gkq310 pp W64ndashW70 2010

[33] S AGoffM Vaughn SMcKay et al ldquoThe iPlant collaborativecyberinfrastructure for plant biologyrdquo Frontiers in Plant Sciencevol 2 article 34 2011

[34] Y L Zhu Q J Song D L Hyten et al ldquoSingle-nucleotidepolymorphisms in soybeanrdquo Genetics vol 163 no 3 pp 1123ndash1134 2003

[35] Y Shu Y Li Z Zhu et al ldquoSNPs discovery and CAPS markerconversion in soybeanrdquo Molecular Biology Reports vol 38 no3 pp 1841ndash1846 2011

[36] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[37] X Wu C Ren T Joshi T Vuong D Xu and H T NguyenldquoSNP discovery by high-throughput sequencing in soybeanrdquoBMC Genomics vol 11 article 469 2010

[38] Y-H Li J C Reif S A Jackson Y-S Ma R-Z Chang and L-J Qiu ldquoDetecting SNPs underlying domestication-related traitsin soybeanrdquo BMC Plant Biology vol 14 article 251 2014

12 International Journal of Plant Genomics

[39] L C Hudson K L Bost and K J Piller ldquoOptimizing recom-binant protein expression in soybeanrdquo in SoybeanmdashMolecularAspects of Breeding InTech Rijeka Croatia 2011

[40] W Goettel E Xia R Upchurch M-L Wang P Chen andY-Q C An ldquoIdentification and characterization of transcriptpolymorphisms in soybean lines varying in oil composition andcontentrdquo BMC Genomics vol 15 article 299 2014

[41] C Chan X Qi M-W Li F-L Wong and H-M Lam ldquoRecentdevelopments of genomic research in soybeanrdquo Journal ofGenetics and Genomics vol 39 no 7 pp 317ndash324 2012

[42] A A Komar ldquoSNPs silent but not invisiblerdquo Science vol 315no 5811 pp 466ndash467 2007

[43] P Katsonis A Koire S J Wilson et al ldquoSingle nucleotidevariations biological impact and theoretical interpretationrdquoProtein Science vol 23 no 12 pp 1650ndash1666 2014

[44] W J Haun D L Hyten W W Xu et al ldquoThe composition andorigins of genomic variation among individuals of the soybeanreference cultivar williams 82rdquo Plant Physiology vol 155 no 2pp 645ndash655 2011

[45] S DeBolt ldquoCopy number variation shapes genome diversityin Arabidopsis over immediate family generational scalesrdquoGenome Biology and Evolution vol 2 no 1 pp 441ndash453 2010

[46] A Zmienko A Samelak P Kozłowski and M FiglerowiczldquoCopy number polymorphism in plant genomesrdquo Theoreticaland Applied Genetics vol 127 no 1 pp 1ndash18 2014

[47] Y-G Lee N Jeong J H Kim et al ldquoDevelopment validationand genetic analysis of a large soybean SNP genotyping arrayrdquoPlant Journal vol 81 no 4 pp 625ndash636 2015

[48] B Valliyodan Q Dan G Patil et al ldquoLandscape of genomicdiversity and trait discovery in soybeanrdquo Scientific Reports vol6 article 23598 2016

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Research Article Transcript Polymorphism Rates in …downloads.hindawi.com/journals/ijpg/2016/1562041.pdfResearch Article Transcript Polymorphism Rates in Soybean Seed Tissue Are Increased

6 International Journal of Plant Genomics

limited to SNPs insertions and deletions of bases Acrossall experimental groups polymorphisms classified as loweffect comprise sim10 of the total while moderate effects andhigh effects average 12 and 2 respectively The largestcategory of impact classification is the genomic modifiercategory comprising 72ndash78 of detected polymorphisms inall groups and 60 of shared variants between all transgenics(see snpEff reports in Supporting Data) Missense mutationsare highly prevalent in all four groups averaging sim60in wild type ST111 and ST77 events with a missense tosilent mutation ratio of sim16 (Figures 2(e)ndash2(g)) The 764group exhibited a slightly lower missense percentage ofsim53 however it also contained a higher percentage ofsilent mutations at sim45 compared to the other three groups(sim37) generating a missensesilent mutation ratio of sim120(Figure 2(h)) Nonsense mutations comprised sim3 of thetotal reported SNP effects for the wild type ST111 and ST77groups and sim18 in the 764 group (Figures 2(e)ndash2(h))

All four experimental groups showed a high prevalenceof SNPs of common types namely resulting in down-stream gene variants (sim24) intron variants (sim25) mis-sense variants (sim11) synonymous variants (sim7) andupstream gene variants (sim16)The most common locationsfor these variants are intron regions (sim25) the down-stream transcription start site (sim23) exons (sim23) introns(sim23) and the upstream transcription start site (sim16)The 764 group distribution of polymorphism types andaffected regions seem to mirror those of the wild typeST111 and ST77 groups with a slight reduction of SNPs inexonic regions All individuals also reported variants within51015840 and 31015840 untranslated regions (sim4 and sim6 resp) aswell as minimal detection of frameshifts stop and startvariants (lt1) Regions and predicted effects of detectedpolymorphisms are shown in Figures 2(e)ndash2(h)

Transition base changes of AharrG and CharrT were mostprevalent in all four experimental groups while the transver-sion AharrT was sim50 more common than AharrC CharrG orGharrT transversions Codon changes varied most commonlyat the wobble base position typically resulting from a tran-sition in both wild type and transgenic samples FollowingSNP patterns previously described the prevailing modifiedcodons appeared commonly shared between all individualsmeasured Base changes in the first and second positionsof the codon were much less frequent across all groupsHeat maps of the codon base changes representative for eachexperimental group are shown in Supplemental Figure 1Predicted amino acid changes resulting from codon alter-ations predominantly consist of synonymous substitutionshowever other frequently altered residues include valineto alanine alanine to valine leucine to phenylalanine andproline glutamate to glycine and lysine serine to proline andphenylalanine to leucine Supplemental Figure 2 shows thedistribution heat maps of detected amino acid substitutionaverages representative for all events

Minor changes were also detected within the additionalgenomic scaffolds containing the transgene sequences sup-plemented to the alignment reference file Specifically snpEffreported SNPs in the transgenes of samples 764B3 764H3and 764K1 at position 80 ST77D2 at position 1400 ST77F1

at positions 500 and 700 ST77J3 at position 600 ST111B3 atposition 800 ST111I3 at position 190 and ST111K1 ST111B1and ST111K2 at position 1010 Read alignments were unableto verify all called variants in the transgenes likely due toareas of low coverage however one area was corroborated asa consistent variant in the fillerDNAof all constructs betweenthe right border and promoter region (Supplemental Figure3) which was included as a control Supplemental Table 1summarizes all SNP calls and between group statistical testsSummaries of all SNPINDEL calls from bcftools and theirrespective effects from snpEff along with chromosomal dis-tribution plots additional statistics and lists of SNPs sharedbetween groups and within each group are available from theiPlant collaborative Discovery Environment [33] Links to therespective datasets are provided in the SupportingData of thismanuscript

33 Gene Ontology Analysis of Genes Containing SNPs Giventhe extent of detected polymorphic sites and the possibilitiesof effectual contributions to gene expression changes previ-ously documented in the 764 transgenic line it was necessaryto investigate genes containing SNPs and INDELS withgene ontology to identify the possible functional pathwayscontrolled by the affected transcripts Complete gene listscontaining SNPs provided by the output of snpEff for thewild type (36959 transcripts) ST77 (38691 transcripts)ST111 (34142 transcripts) and 764 (43426 transcripts) lineswere loaded into the AgriGO online gene ontology analysistool and subjected to a single enrichment analysis withmultiple corrections to deduce the functional roles of poly-morphic genes Out of the total Glycine maxV21 backgroundset of 29501 GO terms wild type matched 11245 ST77matched 11660 ST111 matched 10326 and 764 matched13321 GO categories were similar between the wild typeST77 and ST111 groups with minor variations with themajority of terms putatively annotated to translational orRNA processes (Figures 3(a) 3(b) and 3(d)) Intracellulartransport also constituted a large portion of the identifiedcategories (sim12 of total annotations) and in the case ofthe 764 transgenic line was the only other significant GOterm following RNA processing and binding (Figure 3(c))Three categories including ATP-dependent helicase activityubiquitin-dependent protein catabolic processes and ncRNAmetabolic processes were all unique to the ST77 and ST111lines (Figures 3(b) and 3(d)) GTP binding was exclusive tothe ST111 line comprising 21 of the total significant GOterms for the event Figure 4 shows all detected SNPs in achromosomal density map for all four experimental groups

Annotated GO terms for SNPs exclusive to each trans-genic event displayed similar functions with ST77 reporting295 gene variants involved with intracellular organellesphotosystem II cytoskeletal components and protein com-plexes ST111 displayed similar numbers of terms totaling 273annotations including photosynthesis and photosystem IIintracellular organelles ribonucleoprotein complexes thy-lakoids protein complexes and oxidoreductase activity The764 line exhibited the highest number of unique annotatedvariants with 2366 involving translation gene expressionbiosynthetic processes ribosomal structures and protein

International Journal of Plant Genomics 7

Wild typeaminoacylation

tRNA

for proteintranslation

5RNA binding

23

Vesicle-mediatedtransport

13Intracellularprotein transport

12

mRNA metabolic process

4

RNA processing13

Translation30

(a)

ST77ATP-dependenthelicase activity

10

RNA binding19

Ribonucleoproteincomplex biogenesis

4

Intracellularprotein transport

11

ncRNAmetabolic process

10

RNA processing12

Translation27

Ubiquitin-dependentprotein catabolic

process7

(b)

764

Intracellulartransport

25

RNA processing27

RNA binding48

(c)

ST111 ATP-dependenthelicase activity

13Intracellular

proteintransport

13

Ubiquitin-dependent

protein catabolicprocess

10

ncRNA metabolicprocess

13

mRNA processing4

RNA binding26

GTP binding21

(d)

Figure 3 AgriGO enriched gene ontology categories of SNP-containing genes Single enrichment analyses for all genes containing SNPs areshown for wild type (a) ST77 (b) 764 (c) and ST111 (d) Percentages are reflective of how many terms matched the indicated GO group outof the total matches to the background reference

complexes Of the 127 polymorphic genes shared by alltransgenic specimens 40 annotations returned 10 GO termspertaining to intracellular organelles macromolecular com-plexes and protein complexes Specific GO terms for eachgroup are shown in Figure 3 and complete AgriGO featurerelationship trees are shown in Supplemental Figure 4

4 Discussion

SNP rates have been previously evaluated in soybean usingmultiple fragment analysis on several genotypes of culti-vated varieties including Lincoln Mandarin Peking andRichland [34 35] From this fragment analysis transitionand transversion rates were reported to be nearly identicalbetween cultivars at 48 and 52 respectively Furthermorenucleotide diversity rates of cultivated soybean varietieswere estimated to be 5ndash8-fold less than the wild varietyGlycine soja and occur at an even lower frequency than thehighly characterizedArabidopsis thaliana self-crossingmodel[34] More recent investigations of SNPs and INDELS in

soybean using next generation resequencing have revealedgenome-wide polymorphisms in an effort to identify nativedisease resistant and favorable trait loci [36 37] Furtherstudies demonstrate the extensive narrowing of soybeangenomic variation due to selective pressures resulting fromdomestication practices which favor more valued traits suchas increased seed mass and oil content quantitative traitloci [38] In addition the self-pollinating nature of Glycinemax also contributes evolutionarily to the constriction ofgenomic variances in domesticated cultivars Thus withsuch selective pressures bottlenecking natural variations incultivated soybean varieties it is quite intriguing to seesuch consistently increased numbers of SNPs in only onetransgenic experimental group with no definitive initiatingevent

SNPs detected in our datasets were divulged from seedtranscriptome sequences which represent sim65 of the totalsoybean genome and 65 of total genomic protein codingsequences The SNPs described here are not considered tobe representative of the frequencies that may be detected in

8 International Journal of Plant Genomics

Wild type

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20

(a)

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20 1

chr2

chr3

chr4

ch6

chrhr7

chr8

chr9ch 1

chr12

chr13

chr14

chr17

chr18

chr19ch

ST77

(b)chr1

chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20 1

chr2

chr3

chr4

ch6

chr7

chr8

chr9ch 1

chr12

chr13

chr14

chr17

chr18

chr19ch

764

(c)

ST111

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20

(d)

Figure 4 Overview of the chromosomal distribution of detected transcript SNPs Aligned SNPs are shown for each soybean chromosomein the wild type controls (a) ST77 (b) 764 (c) and ST111 (d) transgenics The colored bars indicate each chromosome scaled to length witha bold outer hash mark representing every 10Mb of sequence and each nonbolded outer hash mark placed every 1Mb Each internal tickrepresents a detected SNP on which subsequent SNPs are stacked vertically from each transcript at that region

other soybean tissues or in different developmental stagesof these specific transgenic plants rather they demonstrateincreased transcript polymorphism rateswitnessed in a singlesoybean transformant out of several different experimentalgroups Although RNA-seq is focused on the functionalsegments of the genome and does not always capture regu-latory regions such as promoters or nontranscribed regions(eg methylated bases) one aspect of the resulting effectsof polymorphisms in these regions can be directly witnessedthrough examination of gene expression levels Because thesoybean expression systemdescribed previously by our lab [5

7 8 13 14 39] specifically targets seed tissue for recombinantprotein accumulation it was of great interest to identifypossible sequence alterations that may have resulted in geneexpression or protein structure variations in seeds which todate has not been investigated

Base changes for wild type ST77 and ST111 events wereall comparable while the 764 event consistently reportedSNP rates nearly double that of the other groups (1 SNPdetected every sim22000 bases) which is still well below thepreviously reported SNP rate of 1 SNP per sim1400 nucleotidesin soybean seeds [40] Nevertheless SNP base changes

International Journal of Plant Genomics 9

appeared to follow the same pattern of commonality withall groups demonstrating high percentages of transition basechanges with relatively low transversion counts The 764 linedid demonstrate a lower TsTv ratio than the other groups(Figure 1(d)) although progeny from two independent par-ents in the ST77 line (ST77D and ST77F) exhibited lowerratios as well indicating that this characteristic alone cannot reliably indicate internal stresses or overall divergencefrom our controls The majority of detected base changeswere located in the third base of the codon likely gener-ating synonymous (silent) polymorphisms which were themajority of predicted SNP classifications in all events Lesscommon changes occurred in the first or second bases of thecodon generating a nonsynonymous or missense mutationlikely resulting in an amino acid change (Figures 2(e)ndash2(h))Missense to silent ratios detected in all experimental groupsappear to be higher than previously published results forsoybean indicated [40] although the higher nonsynonymousto synonymous mutations may be due to high linkagedisequilibrium exhibited in soy [41]

Interestingly the 764 line had the lowest overall missenseto silent polymorphism ratio demonstrating that while the764 line contained the highest overall number of SNPs ahigher percentage of the total polymorphisms were silentmutations (sim45) compared to the other three experimentalgroups (Figure 2(h)) This reveals the possibility that SNPsand INDELS detected here had originated from the origi-nal transformation event and have been highly conservedthrough self-crossed generations of progeny particularlyconsidering how remarkably well conserved the attributesof detected SNPs were between different biological samplesof the same transformation event Without sequencing datafrom prior generations of each parental line tested herefor comparison this can not be confirmed with completeconfidence however this does pose an interesting inquirythat even silent SNPsmay have the ability to alter gene expres-sion and peptide structure Indeed it has been suggestedthat although synonymous polymorphisms code for identicalamino acids subtle changes resulting from the utilization ofnonoptimal synonymous codons can modulate downstreamtranscription and translation rates [42] thereby altering thefolding conformations and proposed function of the finalpeptides This may explain why the 764 transgenic eventexhibited the highest rates of differentially expressed genesin conjunction with polymorphisms but still maintained ahealthy phenotype

While many SNP studies have excluded synonymousmutations and those present in noncoding regions recentworks have examined these sequences in light of theirpossible epigenetic effects on transcription and translationhowever more advanced tools for the prediction of possibleeffects are still in development and require further testingfor reliable forecasts [43] Nonsynonymous polymorphismsappear at similar rates across the experimental groupsvarying between 50 and 60 of total functional SNP clas-sifications While nonsynonymous mutations nearly alwaysalter amino acid sequences that can potentially yield non-functional protein products this phenomenonmay produce aneutral effect on the organism or protein or activate signaling

cascades in redundant systems [43] The effects of thesedetected nonsynonymous polymorphisms in each transgenicevent are not known or identifiable without extensive pro-teomic investigations into the altered downstream transcriptproductsThe 764 line which previously displayed the largestdegree of differential gene expression exhibited the lowestpercentage of missense and nonsense polymorphisms there-fore these alone are not likely to be the root cause of observedchanges in this transgenic line

Spontaneous mutations may also occur as a result of apreviously occurring stressful event such as parental lineAgrobacterium transformation Epigenetic alterations andnucleotide transpositions have been detected up to fivegenerations forward from transformation in Arabidopsis [4445] however all seeds examined here were bred to orbeyond the 5th generation of progeny The causes for thesemutations are unknown however each transformant was anindependent transformation event and was likely exposedto a varying degree of stress during the process This couldpotentially have introduced genomic variations that havebeen carried forward to the transcriptome as even largecopy number of variations have been reported in plants asa response to stress [46] Segregation of SNPs from existingparental heterozygous loci is expected in progeny howeverall samples examined here were remarkably consistent inSNP numbers location distributions and types within theirexperimental groups further demonstrating the genomicstability of these events This further solidifies the possibilitythat the detected variations in all transgenic lines arose froman early parental event possibly occurring during the originalAgrobacterium transformation or planting event and havebeen stably integrated and carried forward to the generationsexamined Considering the consistency of the SNP patternsthese polymorphisms have likely originated at the genomiclevel and are not likely to appear in such large quantities dueto transcriptional proofreading errors

Detected alterations occurred in a ldquobowtierdquo-shapeddistribution across the 20 chromosomes with lowerinstances of SNPs near the centromeres and the highestincidence of SNPs near the euchromatic telomeric ends of thechromosomal arms This is where gene density is the highestwhich has been illustrated previously by several worksconducted on soybean polymorphism rates [36 38 47] OursnpEff dataset summaries available in the Supporting Dataof this manuscript show the same distribution pattern withno significant alterations of SNP rates between biologicalreplicates of progeny Without complete characterization ofthe targeted SNPs in this work future directions can invokefunctional characterizations to cross-reference detectedpolymorphisms with possible connections to differentiallyexpressed genes and also to improve existing annotatedWilliams 82 cultivar SNPs In order to fully deduce the originof SNP variations with high confidence multiple generationanalyses will need to be conducted Approximately 70 ofthe genes that were differentially expressed also containedSNPs or INDELS however due to the extreme disparityin size between each gene list (a maximum of sim3000 andsim40000 genes resp) this could simply be due to chanceand can not definitively hold concrete biological significance

10 International Journal of Plant Genomics

without further investigation This suggests that the SNPsexamined here may directly impact native gene expressionlevels possibly due to their locations within gene regulatoryregions or enhancer elements located near the telomericends of the chromosomal arms

Gene ontology terms from SNP gene lists of all experi-mental groups were very similar returning many enrichedterms regarding protein transport and localization as wellas RNA and transcriptional processes even in the wild typegroup (Figure 3 and Supplemental Figure 4) Interestinglythese relationships are highly similar to the enriched GOterms derived from the differentially expressed gene sets ofthe transgenic lines indicating the majority of functionalrelationships between these polymorphisms are likely arisingfrom intercultivar variations and are not specifically relatedto transgenesis effects However given the possibility thatmutations from transgenesis could occur within QTLs thatcontrol traits such as plant size flower count and seedquantity identification of novel polymorphisms in theseregions of soybeanrsquos transcriptome is of great importanceto both plant breeders and in biotechnology applicationsIn fact recent investigations have revealed an excess of9 million unique SNPs that occur between 106 differentcultivated soybean varieties [48] Here we have providedlists of exonic SNPs unique to each of the three transgenicgroups and specific to transformation as well as novel SNPsdetected in our Williams 82 wild type controls Furtherinvestigations of the events described will identify the sitesof T-DNA integration and characterize the surroundingsequences to determine whether increased incidences ofpolymorphisms occur within the immediate vicinities of theinsertions Unique SNPs identified and reported here may beused to aide these investigations of crucial trait loci in thefuture to maximize yields and assist in breeding programs oftransgenic specimens

5 Conclusions

This study describes the first direct comparison of poly-morphism rates present in the seed transcriptome of threeindependent transgenic lines of Glycine max with a non-transgenic wild type line all derived from the Williams 82soybean cultivar Previously we examined gene expressionchanges in these transgenic events finding only the 764line to be significantly altered when compared to the othertransgenic lines In the same manner we have expanded ourprevious work and reported transcriptome single nucleotidepolymorphism rates for the ST77 ST111 and 764 trans-genic events compared to wild type Mirroring the resultsfrom our differential expression study the samples of the764 transformation event exhibited transcript SNP ratesnearly double that of the wild type ST77 and ST111 eventswhile also demonstrating higher amounts of transversionmutations While the exact cause for both the large num-ber of differentially expressed genes and polymorphismspresent in the 764 event is not known it is highly likelythat the two phenomena are correlated Our results showthat in addition to differential expression transgenesis mayresult in unpredicted base changes within gene transcripts

and regulatory regions without impacting plant pheno-type

Availability of Supporting Data

The RNA sequencing data described herein may be accessedat httpwwwncbinlmnihgovgeoqueryacccgiacc=GS-E64620 through NCBIrsquos Gene Expression Omnibus underaccession number GSE64620 Summary files for bcftoolsand snpEff outputs encompassing variant calls and predictedvariant effects are available from the iPlant CollaborativeDiscovery Environment directory available at httpdeiplan-tcollaborativeorgdld44044BEE-69B8-4129-A6B5-4C9291-B6B4A1snpeffsummarieszip and httpdeiplantcollabora-tiveorgdldAD464040-9169-4E52-87D5-DEF7466A2C1Evariantsummarieszip Lists of genes containing detectedvariants from all samples are available from iPlant in acompressed zip archive available at httpdeiplantcollabor-ativeorgdld533570A3-1EFB-4864-B9A9-9D82F17E09A8snpeffgeneszip

Competing Interests

Theauthors of this manuscript declare no conflict of interests

Authorsrsquo Contributions

Kevin C Lambirth designed the experiment conducted theSNP detection and effect prediction and generated figuresAdamMWhaley compiled and installed software formattedsequencing data files and was consulted on analysis KevinC Lambirth Kenneth J Piller and Kenneth L Bost wrote themanuscript with input from all authors

Acknowledgments

The authors would like to acknowledge the sequencingefforts of Dr Mike Wang (David H Murdock ResearchInstitute) and Dr Laura Reichenberg (Pfeiffer University) forassistance with plant propagation and selection and Dr KanWang (Iowa State University Agronomy Department) for theestablishment of the ST111 transgenic soybean line

References

[1] A Ziemienowicz ldquoAgrobacterium-mediated plant transforma-tion factors applications and recent advancesrdquoBiocatalysis andAgricultural Biotechnology vol 3 no 4 pp 95ndash102 2014

[2] M S Koch J M Ward S L Levine J A Baum J L Viciniand B G Hammond ldquoThe food and environmental safety of Btcropsrdquo Frontiers in Plant Science vol 6 article 283 2015

[3] G M Dill C A CaJacob and S R Padgette ldquoGlyphosate-resistant crops adoption use and future considerationsrdquo PestManagement Science vol 64 no 4 pp 326ndash331 2008

[4] M A S Valente J A Q A Faria J R L Soares-Ramos et alldquoThe ER luminal binding protein (BiP) mediates an increase indrought tolerance in soybean and delays drought-induced leafsenescence in soybean and tobaccordquo Journal of ExperimentalBotany vol 60 no 2 pp 533ndash546 2009

[5] L C Hudson B S Seabolt J Odle K L Bost C H Stahl andK J Piller ldquoSublethal staphylococcal enterotoxin B challenge

International Journal of Plant Genomics 11

model in pigs to evaluate protection following immunizationwith a soybean-derived vaccinerdquoClinical andVaccine Immunol-ogy vol 20 no 1 pp 24ndash32 2013

[6] Y Zhang D Li X Jin and Z Huang ldquoFighting Ebola withZMapp spotlight on plant-made antibodyrdquo Science China LifeSciences vol 57 no 10 pp 987ndash988 2014

[7] R Powell L C Hudson K C Lambirth et al ldquoRecombinantexpression of homodimeric 660 kDa human thyroglobulin insoybean seeds an alternative source of human thyroglobulinrdquoPlant Cell Reports vol 30 no 7 pp 1327ndash1338 2011

[8] K L Bost K C Lambirth L C Hudson and K J PillerldquoSoybean-derived thyroglobulin as an analyte specific reagentfor in vitro diagnostic tests and devicesrdquo Advances in Medicineand Biology vol 80 pp 23ndash40 2014

[9] R Abiri A Valdiani M Maziah et al ldquoA critical review ofthe concept of transgenic plants insights into pharmaceuticalbiotechnology andmolecular farmingrdquo Current Issues in Molec-ular Biology vol 18 no 1 pp 21ndash42 2016

[10] J L Oakes K L Bost and K J Piller ldquoStability of a soybeanseed-derived vaccine antigen following long-term storage pro-cessing and transport in the absence of a cold chainrdquo Journal ofthe Science of Food andAgriculture vol 89 no 13 pp 2191ndash21992009

[11] R Garg M Tolbert J L Oakes T E Clemente K L Bostand K J Piller ldquoChloroplast targeting of FanC the majorantigenic subunit of Escherichia coli K99 fimbriae in transgenicsoybeanrdquo Plant Cell Reports vol 26 no 7 pp 1011ndash1023 2007

[12] K J Piller T E Clemente S M Jun et al ldquoExpression andimmunogenicity of an Escherichia coli K99 fimbriae subunitantigen in soybeanrdquo Planta vol 222 no 1 pp 6ndash18 2005

[13] L C Hudson R Garg K L Bost and K J Piller ldquoSoybeanseeds a practical host for the production of functional subunitvaccinesrdquo BioMed Research International vol 2014 Article ID340804 13 pages 2014

[14] K L Bost and K J Piller ldquoProtein expression systems whysoybean seedsrdquo in SoybeanmdashMolecular Aspects of Breeding ASudaric Ed Intech Open 2011

[15] D Tilman K G Cassman P A Matson R Naylor and SPolasky ldquoAgricultural sustainability and intensive productionpracticesrdquo Nature vol 418 no 6898 pp 671ndash677 2002

[16] G N De La FuenteU K Frei and T Lubberstedt ldquoAcceleratingplant breedingrdquo Trends in Plant Science vol 18 no 12 pp 667ndash672 2013

[17] L C Hudson K C Lambirth K L Bost and K L PillerAdvancements in Transgenic Soy From Field to Bedside InTech2013

[18] J D Ray T C Kilen C A Abel and R L Paris ldquoSoybeannatural cross-pollination rates under field conditionsrdquo Environ-mental Biosafety Research vol 2 no 2 pp 133ndash138 2003

[19] K L Hart S L Kimura V Mushailov Z M BudimlijaM Prinz and E Wurmbach ldquoImproved eye- and skin-colorprediction based on 8 SNPsrdquo Croatian Medical Journal vol 54no 3 pp 248ndash256 2013

[20] S E Medland T Zayats B Glaser et al ldquoA variant inLIN28B is associatedwith 2D 4D finger-length ratio a putativeretrospective biomarker of prenatal testosterone exposurerdquoTheAmerican Journal of HumanGenetics vol 86 no 4 pp 519ndash5252010

[21] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[22] Y-H Li S-C Zhao J-X Ma et al ldquoMolecular footprints ofdomestication and improvement in soybean revealed by wholegenome re-sequencingrdquo BMC Genomics vol 14 article 5792013

[23] T Joshi B Valliyodan J-H Wu S-H Lee D Xu and H TNguyen ldquoGenomic differences between cultivated soybean Gmax and its wild relative G sojardquo BMCGenomics vol 14 articleS5 2013

[24] K C Lambirth AMWhaley I C Blakley et al ldquoA comparisonof transgenic andwild type soybean seeds analysis of transcrip-tome profiles using RNA-Seqrdquo BMC Biotechnology vol 15 no1 article 89 2015

[25] H Li ldquoA statistical framework for SNP callingmutation discov-ery association mapping and population genetical parameterestimation from sequencing datardquo Bioinformatics vol 27 no 21pp 2987ndash2993 2011

[26] H Li B Handsaker A Wysoker et al ldquoThe sequence align-mentMap format and SAMtoolsrdquo Bioinformatics vol 25 no16 pp 2078ndash2079 2009

[27] D M Goodstein S Shu R Howson et al ldquoPhytozome acomparative platform for green plant genomicsrdquo Nucleic AcidsResearch vol 40 no 1 pp D1178ndashD1186 2012

[28] C Trapnell A Roberts L Goff et al ldquoDifferential gene andtranscript expression analysis of RNA-seq experiments withTopHat and Cufflinksrdquo Nature Protocols vol 7 no 3 pp 562ndash578 2012

[29] PDanecek A AutonG Abecasis et al ldquoThe variant call formatand VCFtoolsrdquo Bioinformatics vol 27 no 15 pp 2156ndash21582011

[30] M Krzywinski J Schein I Birol et al ldquoCircos an informationaesthetic for comparative genomicsrdquo Genome Research vol 19no 9 pp 1639ndash1645 2009

[31] P Cingolani A Platts L L Wang et al ldquoA program forannotating and predicting the effects of single nucleotidepolymorphisms SnpEff SNPs in the genome of Drosophilamelanogaster strain w 1118 iso-2 iso-3rdquo Fly vol 6 no 2 pp80ndash92 2012

[32] Z Du X Zhou Y Ling Z Zhang and Z Su ldquoagriGO A GOanalysis toolkit for the agricultural communityrdquo Nucleic AcidsResearch vol 38 no 2 Article ID gkq310 pp W64ndashW70 2010

[33] S AGoffM Vaughn SMcKay et al ldquoThe iPlant collaborativecyberinfrastructure for plant biologyrdquo Frontiers in Plant Sciencevol 2 article 34 2011

[34] Y L Zhu Q J Song D L Hyten et al ldquoSingle-nucleotidepolymorphisms in soybeanrdquo Genetics vol 163 no 3 pp 1123ndash1134 2003

[35] Y Shu Y Li Z Zhu et al ldquoSNPs discovery and CAPS markerconversion in soybeanrdquo Molecular Biology Reports vol 38 no3 pp 1841ndash1846 2011

[36] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[37] X Wu C Ren T Joshi T Vuong D Xu and H T NguyenldquoSNP discovery by high-throughput sequencing in soybeanrdquoBMC Genomics vol 11 article 469 2010

[38] Y-H Li J C Reif S A Jackson Y-S Ma R-Z Chang and L-J Qiu ldquoDetecting SNPs underlying domestication-related traitsin soybeanrdquo BMC Plant Biology vol 14 article 251 2014

12 International Journal of Plant Genomics

[39] L C Hudson K L Bost and K J Piller ldquoOptimizing recom-binant protein expression in soybeanrdquo in SoybeanmdashMolecularAspects of Breeding InTech Rijeka Croatia 2011

[40] W Goettel E Xia R Upchurch M-L Wang P Chen andY-Q C An ldquoIdentification and characterization of transcriptpolymorphisms in soybean lines varying in oil composition andcontentrdquo BMC Genomics vol 15 article 299 2014

[41] C Chan X Qi M-W Li F-L Wong and H-M Lam ldquoRecentdevelopments of genomic research in soybeanrdquo Journal ofGenetics and Genomics vol 39 no 7 pp 317ndash324 2012

[42] A A Komar ldquoSNPs silent but not invisiblerdquo Science vol 315no 5811 pp 466ndash467 2007

[43] P Katsonis A Koire S J Wilson et al ldquoSingle nucleotidevariations biological impact and theoretical interpretationrdquoProtein Science vol 23 no 12 pp 1650ndash1666 2014

[44] W J Haun D L Hyten W W Xu et al ldquoThe composition andorigins of genomic variation among individuals of the soybeanreference cultivar williams 82rdquo Plant Physiology vol 155 no 2pp 645ndash655 2011

[45] S DeBolt ldquoCopy number variation shapes genome diversityin Arabidopsis over immediate family generational scalesrdquoGenome Biology and Evolution vol 2 no 1 pp 441ndash453 2010

[46] A Zmienko A Samelak P Kozłowski and M FiglerowiczldquoCopy number polymorphism in plant genomesrdquo Theoreticaland Applied Genetics vol 127 no 1 pp 1ndash18 2014

[47] Y-G Lee N Jeong J H Kim et al ldquoDevelopment validationand genetic analysis of a large soybean SNP genotyping arrayrdquoPlant Journal vol 81 no 4 pp 625ndash636 2015

[48] B Valliyodan Q Dan G Patil et al ldquoLandscape of genomicdiversity and trait discovery in soybeanrdquo Scientific Reports vol6 article 23598 2016

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Research Article Transcript Polymorphism Rates in …downloads.hindawi.com/journals/ijpg/2016/1562041.pdfResearch Article Transcript Polymorphism Rates in Soybean Seed Tissue Are Increased

International Journal of Plant Genomics 7

Wild typeaminoacylation

tRNA

for proteintranslation

5RNA binding

23

Vesicle-mediatedtransport

13Intracellularprotein transport

12

mRNA metabolic process

4

RNA processing13

Translation30

(a)

ST77ATP-dependenthelicase activity

10

RNA binding19

Ribonucleoproteincomplex biogenesis

4

Intracellularprotein transport

11

ncRNAmetabolic process

10

RNA processing12

Translation27

Ubiquitin-dependentprotein catabolic

process7

(b)

764

Intracellulartransport

25

RNA processing27

RNA binding48

(c)

ST111 ATP-dependenthelicase activity

13Intracellular

proteintransport

13

Ubiquitin-dependent

protein catabolicprocess

10

ncRNA metabolicprocess

13

mRNA processing4

RNA binding26

GTP binding21

(d)

Figure 3 AgriGO enriched gene ontology categories of SNP-containing genes Single enrichment analyses for all genes containing SNPs areshown for wild type (a) ST77 (b) 764 (c) and ST111 (d) Percentages are reflective of how many terms matched the indicated GO group outof the total matches to the background reference

complexes Of the 127 polymorphic genes shared by alltransgenic specimens 40 annotations returned 10 GO termspertaining to intracellular organelles macromolecular com-plexes and protein complexes Specific GO terms for eachgroup are shown in Figure 3 and complete AgriGO featurerelationship trees are shown in Supplemental Figure 4

4 Discussion

SNP rates have been previously evaluated in soybean usingmultiple fragment analysis on several genotypes of culti-vated varieties including Lincoln Mandarin Peking andRichland [34 35] From this fragment analysis transitionand transversion rates were reported to be nearly identicalbetween cultivars at 48 and 52 respectively Furthermorenucleotide diversity rates of cultivated soybean varietieswere estimated to be 5ndash8-fold less than the wild varietyGlycine soja and occur at an even lower frequency than thehighly characterizedArabidopsis thaliana self-crossingmodel[34] More recent investigations of SNPs and INDELS in

soybean using next generation resequencing have revealedgenome-wide polymorphisms in an effort to identify nativedisease resistant and favorable trait loci [36 37] Furtherstudies demonstrate the extensive narrowing of soybeangenomic variation due to selective pressures resulting fromdomestication practices which favor more valued traits suchas increased seed mass and oil content quantitative traitloci [38] In addition the self-pollinating nature of Glycinemax also contributes evolutionarily to the constriction ofgenomic variances in domesticated cultivars Thus withsuch selective pressures bottlenecking natural variations incultivated soybean varieties it is quite intriguing to seesuch consistently increased numbers of SNPs in only onetransgenic experimental group with no definitive initiatingevent

SNPs detected in our datasets were divulged from seedtranscriptome sequences which represent sim65 of the totalsoybean genome and 65 of total genomic protein codingsequences The SNPs described here are not considered tobe representative of the frequencies that may be detected in

8 International Journal of Plant Genomics

Wild type

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20

(a)

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20 1

chr2

chr3

chr4

ch6

chrhr7

chr8

chr9ch 1

chr12

chr13

chr14

chr17

chr18

chr19ch

ST77

(b)chr1

chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20 1

chr2

chr3

chr4

ch6

chr7

chr8

chr9ch 1

chr12

chr13

chr14

chr17

chr18

chr19ch

764

(c)

ST111

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20

(d)

Figure 4 Overview of the chromosomal distribution of detected transcript SNPs Aligned SNPs are shown for each soybean chromosomein the wild type controls (a) ST77 (b) 764 (c) and ST111 (d) transgenics The colored bars indicate each chromosome scaled to length witha bold outer hash mark representing every 10Mb of sequence and each nonbolded outer hash mark placed every 1Mb Each internal tickrepresents a detected SNP on which subsequent SNPs are stacked vertically from each transcript at that region

other soybean tissues or in different developmental stagesof these specific transgenic plants rather they demonstrateincreased transcript polymorphism rateswitnessed in a singlesoybean transformant out of several different experimentalgroups Although RNA-seq is focused on the functionalsegments of the genome and does not always capture regu-latory regions such as promoters or nontranscribed regions(eg methylated bases) one aspect of the resulting effectsof polymorphisms in these regions can be directly witnessedthrough examination of gene expression levels Because thesoybean expression systemdescribed previously by our lab [5

7 8 13 14 39] specifically targets seed tissue for recombinantprotein accumulation it was of great interest to identifypossible sequence alterations that may have resulted in geneexpression or protein structure variations in seeds which todate has not been investigated

Base changes for wild type ST77 and ST111 events wereall comparable while the 764 event consistently reportedSNP rates nearly double that of the other groups (1 SNPdetected every sim22000 bases) which is still well below thepreviously reported SNP rate of 1 SNP per sim1400 nucleotidesin soybean seeds [40] Nevertheless SNP base changes

International Journal of Plant Genomics 9

appeared to follow the same pattern of commonality withall groups demonstrating high percentages of transition basechanges with relatively low transversion counts The 764 linedid demonstrate a lower TsTv ratio than the other groups(Figure 1(d)) although progeny from two independent par-ents in the ST77 line (ST77D and ST77F) exhibited lowerratios as well indicating that this characteristic alone cannot reliably indicate internal stresses or overall divergencefrom our controls The majority of detected base changeswere located in the third base of the codon likely gener-ating synonymous (silent) polymorphisms which were themajority of predicted SNP classifications in all events Lesscommon changes occurred in the first or second bases of thecodon generating a nonsynonymous or missense mutationlikely resulting in an amino acid change (Figures 2(e)ndash2(h))Missense to silent ratios detected in all experimental groupsappear to be higher than previously published results forsoybean indicated [40] although the higher nonsynonymousto synonymous mutations may be due to high linkagedisequilibrium exhibited in soy [41]

Interestingly the 764 line had the lowest overall missenseto silent polymorphism ratio demonstrating that while the764 line contained the highest overall number of SNPs ahigher percentage of the total polymorphisms were silentmutations (sim45) compared to the other three experimentalgroups (Figure 2(h)) This reveals the possibility that SNPsand INDELS detected here had originated from the origi-nal transformation event and have been highly conservedthrough self-crossed generations of progeny particularlyconsidering how remarkably well conserved the attributesof detected SNPs were between different biological samplesof the same transformation event Without sequencing datafrom prior generations of each parental line tested herefor comparison this can not be confirmed with completeconfidence however this does pose an interesting inquirythat even silent SNPsmay have the ability to alter gene expres-sion and peptide structure Indeed it has been suggestedthat although synonymous polymorphisms code for identicalamino acids subtle changes resulting from the utilization ofnonoptimal synonymous codons can modulate downstreamtranscription and translation rates [42] thereby altering thefolding conformations and proposed function of the finalpeptides This may explain why the 764 transgenic eventexhibited the highest rates of differentially expressed genesin conjunction with polymorphisms but still maintained ahealthy phenotype

While many SNP studies have excluded synonymousmutations and those present in noncoding regions recentworks have examined these sequences in light of theirpossible epigenetic effects on transcription and translationhowever more advanced tools for the prediction of possibleeffects are still in development and require further testingfor reliable forecasts [43] Nonsynonymous polymorphismsappear at similar rates across the experimental groupsvarying between 50 and 60 of total functional SNP clas-sifications While nonsynonymous mutations nearly alwaysalter amino acid sequences that can potentially yield non-functional protein products this phenomenonmay produce aneutral effect on the organism or protein or activate signaling

cascades in redundant systems [43] The effects of thesedetected nonsynonymous polymorphisms in each transgenicevent are not known or identifiable without extensive pro-teomic investigations into the altered downstream transcriptproductsThe 764 line which previously displayed the largestdegree of differential gene expression exhibited the lowestpercentage of missense and nonsense polymorphisms there-fore these alone are not likely to be the root cause of observedchanges in this transgenic line

Spontaneous mutations may also occur as a result of apreviously occurring stressful event such as parental lineAgrobacterium transformation Epigenetic alterations andnucleotide transpositions have been detected up to fivegenerations forward from transformation in Arabidopsis [4445] however all seeds examined here were bred to orbeyond the 5th generation of progeny The causes for thesemutations are unknown however each transformant was anindependent transformation event and was likely exposedto a varying degree of stress during the process This couldpotentially have introduced genomic variations that havebeen carried forward to the transcriptome as even largecopy number of variations have been reported in plants asa response to stress [46] Segregation of SNPs from existingparental heterozygous loci is expected in progeny howeverall samples examined here were remarkably consistent inSNP numbers location distributions and types within theirexperimental groups further demonstrating the genomicstability of these events This further solidifies the possibilitythat the detected variations in all transgenic lines arose froman early parental event possibly occurring during the originalAgrobacterium transformation or planting event and havebeen stably integrated and carried forward to the generationsexamined Considering the consistency of the SNP patternsthese polymorphisms have likely originated at the genomiclevel and are not likely to appear in such large quantities dueto transcriptional proofreading errors

Detected alterations occurred in a ldquobowtierdquo-shapeddistribution across the 20 chromosomes with lowerinstances of SNPs near the centromeres and the highestincidence of SNPs near the euchromatic telomeric ends of thechromosomal arms This is where gene density is the highestwhich has been illustrated previously by several worksconducted on soybean polymorphism rates [36 38 47] OursnpEff dataset summaries available in the Supporting Dataof this manuscript show the same distribution pattern withno significant alterations of SNP rates between biologicalreplicates of progeny Without complete characterization ofthe targeted SNPs in this work future directions can invokefunctional characterizations to cross-reference detectedpolymorphisms with possible connections to differentiallyexpressed genes and also to improve existing annotatedWilliams 82 cultivar SNPs In order to fully deduce the originof SNP variations with high confidence multiple generationanalyses will need to be conducted Approximately 70 ofthe genes that were differentially expressed also containedSNPs or INDELS however due to the extreme disparityin size between each gene list (a maximum of sim3000 andsim40000 genes resp) this could simply be due to chanceand can not definitively hold concrete biological significance

10 International Journal of Plant Genomics

without further investigation This suggests that the SNPsexamined here may directly impact native gene expressionlevels possibly due to their locations within gene regulatoryregions or enhancer elements located near the telomericends of the chromosomal arms

Gene ontology terms from SNP gene lists of all experi-mental groups were very similar returning many enrichedterms regarding protein transport and localization as wellas RNA and transcriptional processes even in the wild typegroup (Figure 3 and Supplemental Figure 4) Interestinglythese relationships are highly similar to the enriched GOterms derived from the differentially expressed gene sets ofthe transgenic lines indicating the majority of functionalrelationships between these polymorphisms are likely arisingfrom intercultivar variations and are not specifically relatedto transgenesis effects However given the possibility thatmutations from transgenesis could occur within QTLs thatcontrol traits such as plant size flower count and seedquantity identification of novel polymorphisms in theseregions of soybeanrsquos transcriptome is of great importanceto both plant breeders and in biotechnology applicationsIn fact recent investigations have revealed an excess of9 million unique SNPs that occur between 106 differentcultivated soybean varieties [48] Here we have providedlists of exonic SNPs unique to each of the three transgenicgroups and specific to transformation as well as novel SNPsdetected in our Williams 82 wild type controls Furtherinvestigations of the events described will identify the sitesof T-DNA integration and characterize the surroundingsequences to determine whether increased incidences ofpolymorphisms occur within the immediate vicinities of theinsertions Unique SNPs identified and reported here may beused to aide these investigations of crucial trait loci in thefuture to maximize yields and assist in breeding programs oftransgenic specimens

5 Conclusions

This study describes the first direct comparison of poly-morphism rates present in the seed transcriptome of threeindependent transgenic lines of Glycine max with a non-transgenic wild type line all derived from the Williams 82soybean cultivar Previously we examined gene expressionchanges in these transgenic events finding only the 764line to be significantly altered when compared to the othertransgenic lines In the same manner we have expanded ourprevious work and reported transcriptome single nucleotidepolymorphism rates for the ST77 ST111 and 764 trans-genic events compared to wild type Mirroring the resultsfrom our differential expression study the samples of the764 transformation event exhibited transcript SNP ratesnearly double that of the wild type ST77 and ST111 eventswhile also demonstrating higher amounts of transversionmutations While the exact cause for both the large num-ber of differentially expressed genes and polymorphismspresent in the 764 event is not known it is highly likelythat the two phenomena are correlated Our results showthat in addition to differential expression transgenesis mayresult in unpredicted base changes within gene transcripts

and regulatory regions without impacting plant pheno-type

Availability of Supporting Data

The RNA sequencing data described herein may be accessedat httpwwwncbinlmnihgovgeoqueryacccgiacc=GS-E64620 through NCBIrsquos Gene Expression Omnibus underaccession number GSE64620 Summary files for bcftoolsand snpEff outputs encompassing variant calls and predictedvariant effects are available from the iPlant CollaborativeDiscovery Environment directory available at httpdeiplan-tcollaborativeorgdld44044BEE-69B8-4129-A6B5-4C9291-B6B4A1snpeffsummarieszip and httpdeiplantcollabora-tiveorgdldAD464040-9169-4E52-87D5-DEF7466A2C1Evariantsummarieszip Lists of genes containing detectedvariants from all samples are available from iPlant in acompressed zip archive available at httpdeiplantcollabor-ativeorgdld533570A3-1EFB-4864-B9A9-9D82F17E09A8snpeffgeneszip

Competing Interests

Theauthors of this manuscript declare no conflict of interests

Authorsrsquo Contributions

Kevin C Lambirth designed the experiment conducted theSNP detection and effect prediction and generated figuresAdamMWhaley compiled and installed software formattedsequencing data files and was consulted on analysis KevinC Lambirth Kenneth J Piller and Kenneth L Bost wrote themanuscript with input from all authors

Acknowledgments

The authors would like to acknowledge the sequencingefforts of Dr Mike Wang (David H Murdock ResearchInstitute) and Dr Laura Reichenberg (Pfeiffer University) forassistance with plant propagation and selection and Dr KanWang (Iowa State University Agronomy Department) for theestablishment of the ST111 transgenic soybean line

References

[1] A Ziemienowicz ldquoAgrobacterium-mediated plant transforma-tion factors applications and recent advancesrdquoBiocatalysis andAgricultural Biotechnology vol 3 no 4 pp 95ndash102 2014

[2] M S Koch J M Ward S L Levine J A Baum J L Viciniand B G Hammond ldquoThe food and environmental safety of Btcropsrdquo Frontiers in Plant Science vol 6 article 283 2015

[3] G M Dill C A CaJacob and S R Padgette ldquoGlyphosate-resistant crops adoption use and future considerationsrdquo PestManagement Science vol 64 no 4 pp 326ndash331 2008

[4] M A S Valente J A Q A Faria J R L Soares-Ramos et alldquoThe ER luminal binding protein (BiP) mediates an increase indrought tolerance in soybean and delays drought-induced leafsenescence in soybean and tobaccordquo Journal of ExperimentalBotany vol 60 no 2 pp 533ndash546 2009

[5] L C Hudson B S Seabolt J Odle K L Bost C H Stahl andK J Piller ldquoSublethal staphylococcal enterotoxin B challenge

International Journal of Plant Genomics 11

model in pigs to evaluate protection following immunizationwith a soybean-derived vaccinerdquoClinical andVaccine Immunol-ogy vol 20 no 1 pp 24ndash32 2013

[6] Y Zhang D Li X Jin and Z Huang ldquoFighting Ebola withZMapp spotlight on plant-made antibodyrdquo Science China LifeSciences vol 57 no 10 pp 987ndash988 2014

[7] R Powell L C Hudson K C Lambirth et al ldquoRecombinantexpression of homodimeric 660 kDa human thyroglobulin insoybean seeds an alternative source of human thyroglobulinrdquoPlant Cell Reports vol 30 no 7 pp 1327ndash1338 2011

[8] K L Bost K C Lambirth L C Hudson and K J PillerldquoSoybean-derived thyroglobulin as an analyte specific reagentfor in vitro diagnostic tests and devicesrdquo Advances in Medicineand Biology vol 80 pp 23ndash40 2014

[9] R Abiri A Valdiani M Maziah et al ldquoA critical review ofthe concept of transgenic plants insights into pharmaceuticalbiotechnology andmolecular farmingrdquo Current Issues in Molec-ular Biology vol 18 no 1 pp 21ndash42 2016

[10] J L Oakes K L Bost and K J Piller ldquoStability of a soybeanseed-derived vaccine antigen following long-term storage pro-cessing and transport in the absence of a cold chainrdquo Journal ofthe Science of Food andAgriculture vol 89 no 13 pp 2191ndash21992009

[11] R Garg M Tolbert J L Oakes T E Clemente K L Bostand K J Piller ldquoChloroplast targeting of FanC the majorantigenic subunit of Escherichia coli K99 fimbriae in transgenicsoybeanrdquo Plant Cell Reports vol 26 no 7 pp 1011ndash1023 2007

[12] K J Piller T E Clemente S M Jun et al ldquoExpression andimmunogenicity of an Escherichia coli K99 fimbriae subunitantigen in soybeanrdquo Planta vol 222 no 1 pp 6ndash18 2005

[13] L C Hudson R Garg K L Bost and K J Piller ldquoSoybeanseeds a practical host for the production of functional subunitvaccinesrdquo BioMed Research International vol 2014 Article ID340804 13 pages 2014

[14] K L Bost and K J Piller ldquoProtein expression systems whysoybean seedsrdquo in SoybeanmdashMolecular Aspects of Breeding ASudaric Ed Intech Open 2011

[15] D Tilman K G Cassman P A Matson R Naylor and SPolasky ldquoAgricultural sustainability and intensive productionpracticesrdquo Nature vol 418 no 6898 pp 671ndash677 2002

[16] G N De La FuenteU K Frei and T Lubberstedt ldquoAcceleratingplant breedingrdquo Trends in Plant Science vol 18 no 12 pp 667ndash672 2013

[17] L C Hudson K C Lambirth K L Bost and K L PillerAdvancements in Transgenic Soy From Field to Bedside InTech2013

[18] J D Ray T C Kilen C A Abel and R L Paris ldquoSoybeannatural cross-pollination rates under field conditionsrdquo Environ-mental Biosafety Research vol 2 no 2 pp 133ndash138 2003

[19] K L Hart S L Kimura V Mushailov Z M BudimlijaM Prinz and E Wurmbach ldquoImproved eye- and skin-colorprediction based on 8 SNPsrdquo Croatian Medical Journal vol 54no 3 pp 248ndash256 2013

[20] S E Medland T Zayats B Glaser et al ldquoA variant inLIN28B is associatedwith 2D 4D finger-length ratio a putativeretrospective biomarker of prenatal testosterone exposurerdquoTheAmerican Journal of HumanGenetics vol 86 no 4 pp 519ndash5252010

[21] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[22] Y-H Li S-C Zhao J-X Ma et al ldquoMolecular footprints ofdomestication and improvement in soybean revealed by wholegenome re-sequencingrdquo BMC Genomics vol 14 article 5792013

[23] T Joshi B Valliyodan J-H Wu S-H Lee D Xu and H TNguyen ldquoGenomic differences between cultivated soybean Gmax and its wild relative G sojardquo BMCGenomics vol 14 articleS5 2013

[24] K C Lambirth AMWhaley I C Blakley et al ldquoA comparisonof transgenic andwild type soybean seeds analysis of transcrip-tome profiles using RNA-Seqrdquo BMC Biotechnology vol 15 no1 article 89 2015

[25] H Li ldquoA statistical framework for SNP callingmutation discov-ery association mapping and population genetical parameterestimation from sequencing datardquo Bioinformatics vol 27 no 21pp 2987ndash2993 2011

[26] H Li B Handsaker A Wysoker et al ldquoThe sequence align-mentMap format and SAMtoolsrdquo Bioinformatics vol 25 no16 pp 2078ndash2079 2009

[27] D M Goodstein S Shu R Howson et al ldquoPhytozome acomparative platform for green plant genomicsrdquo Nucleic AcidsResearch vol 40 no 1 pp D1178ndashD1186 2012

[28] C Trapnell A Roberts L Goff et al ldquoDifferential gene andtranscript expression analysis of RNA-seq experiments withTopHat and Cufflinksrdquo Nature Protocols vol 7 no 3 pp 562ndash578 2012

[29] PDanecek A AutonG Abecasis et al ldquoThe variant call formatand VCFtoolsrdquo Bioinformatics vol 27 no 15 pp 2156ndash21582011

[30] M Krzywinski J Schein I Birol et al ldquoCircos an informationaesthetic for comparative genomicsrdquo Genome Research vol 19no 9 pp 1639ndash1645 2009

[31] P Cingolani A Platts L L Wang et al ldquoA program forannotating and predicting the effects of single nucleotidepolymorphisms SnpEff SNPs in the genome of Drosophilamelanogaster strain w 1118 iso-2 iso-3rdquo Fly vol 6 no 2 pp80ndash92 2012

[32] Z Du X Zhou Y Ling Z Zhang and Z Su ldquoagriGO A GOanalysis toolkit for the agricultural communityrdquo Nucleic AcidsResearch vol 38 no 2 Article ID gkq310 pp W64ndashW70 2010

[33] S AGoffM Vaughn SMcKay et al ldquoThe iPlant collaborativecyberinfrastructure for plant biologyrdquo Frontiers in Plant Sciencevol 2 article 34 2011

[34] Y L Zhu Q J Song D L Hyten et al ldquoSingle-nucleotidepolymorphisms in soybeanrdquo Genetics vol 163 no 3 pp 1123ndash1134 2003

[35] Y Shu Y Li Z Zhu et al ldquoSNPs discovery and CAPS markerconversion in soybeanrdquo Molecular Biology Reports vol 38 no3 pp 1841ndash1846 2011

[36] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[37] X Wu C Ren T Joshi T Vuong D Xu and H T NguyenldquoSNP discovery by high-throughput sequencing in soybeanrdquoBMC Genomics vol 11 article 469 2010

[38] Y-H Li J C Reif S A Jackson Y-S Ma R-Z Chang and L-J Qiu ldquoDetecting SNPs underlying domestication-related traitsin soybeanrdquo BMC Plant Biology vol 14 article 251 2014

12 International Journal of Plant Genomics

[39] L C Hudson K L Bost and K J Piller ldquoOptimizing recom-binant protein expression in soybeanrdquo in SoybeanmdashMolecularAspects of Breeding InTech Rijeka Croatia 2011

[40] W Goettel E Xia R Upchurch M-L Wang P Chen andY-Q C An ldquoIdentification and characterization of transcriptpolymorphisms in soybean lines varying in oil composition andcontentrdquo BMC Genomics vol 15 article 299 2014

[41] C Chan X Qi M-W Li F-L Wong and H-M Lam ldquoRecentdevelopments of genomic research in soybeanrdquo Journal ofGenetics and Genomics vol 39 no 7 pp 317ndash324 2012

[42] A A Komar ldquoSNPs silent but not invisiblerdquo Science vol 315no 5811 pp 466ndash467 2007

[43] P Katsonis A Koire S J Wilson et al ldquoSingle nucleotidevariations biological impact and theoretical interpretationrdquoProtein Science vol 23 no 12 pp 1650ndash1666 2014

[44] W J Haun D L Hyten W W Xu et al ldquoThe composition andorigins of genomic variation among individuals of the soybeanreference cultivar williams 82rdquo Plant Physiology vol 155 no 2pp 645ndash655 2011

[45] S DeBolt ldquoCopy number variation shapes genome diversityin Arabidopsis over immediate family generational scalesrdquoGenome Biology and Evolution vol 2 no 1 pp 441ndash453 2010

[46] A Zmienko A Samelak P Kozłowski and M FiglerowiczldquoCopy number polymorphism in plant genomesrdquo Theoreticaland Applied Genetics vol 127 no 1 pp 1ndash18 2014

[47] Y-G Lee N Jeong J H Kim et al ldquoDevelopment validationand genetic analysis of a large soybean SNP genotyping arrayrdquoPlant Journal vol 81 no 4 pp 625ndash636 2015

[48] B Valliyodan Q Dan G Patil et al ldquoLandscape of genomicdiversity and trait discovery in soybeanrdquo Scientific Reports vol6 article 23598 2016

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Research Article Transcript Polymorphism Rates in …downloads.hindawi.com/journals/ijpg/2016/1562041.pdfResearch Article Transcript Polymorphism Rates in Soybean Seed Tissue Are Increased

8 International Journal of Plant Genomics

Wild type

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20

(a)

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20 1

chr2

chr3

chr4

ch6

chrhr7

chr8

chr9ch 1

chr12

chr13

chr14

chr17

chr18

chr19ch

ST77

(b)chr1

chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20 1

chr2

chr3

chr4

ch6

chr7

chr8

chr9ch 1

chr12

chr13

chr14

chr17

chr18

chr19ch

764

(c)

ST111

chr1chr2

chr3

chr4chr5

chr6

chr7

chr8

chr9chr10chr11

chr12

chr13

chr14

chr15

chr16

chr17

chr18

chr19chr20

(d)

Figure 4 Overview of the chromosomal distribution of detected transcript SNPs Aligned SNPs are shown for each soybean chromosomein the wild type controls (a) ST77 (b) 764 (c) and ST111 (d) transgenics The colored bars indicate each chromosome scaled to length witha bold outer hash mark representing every 10Mb of sequence and each nonbolded outer hash mark placed every 1Mb Each internal tickrepresents a detected SNP on which subsequent SNPs are stacked vertically from each transcript at that region

other soybean tissues or in different developmental stagesof these specific transgenic plants rather they demonstrateincreased transcript polymorphism rateswitnessed in a singlesoybean transformant out of several different experimentalgroups Although RNA-seq is focused on the functionalsegments of the genome and does not always capture regu-latory regions such as promoters or nontranscribed regions(eg methylated bases) one aspect of the resulting effectsof polymorphisms in these regions can be directly witnessedthrough examination of gene expression levels Because thesoybean expression systemdescribed previously by our lab [5

7 8 13 14 39] specifically targets seed tissue for recombinantprotein accumulation it was of great interest to identifypossible sequence alterations that may have resulted in geneexpression or protein structure variations in seeds which todate has not been investigated

Base changes for wild type ST77 and ST111 events wereall comparable while the 764 event consistently reportedSNP rates nearly double that of the other groups (1 SNPdetected every sim22000 bases) which is still well below thepreviously reported SNP rate of 1 SNP per sim1400 nucleotidesin soybean seeds [40] Nevertheless SNP base changes

International Journal of Plant Genomics 9

appeared to follow the same pattern of commonality withall groups demonstrating high percentages of transition basechanges with relatively low transversion counts The 764 linedid demonstrate a lower TsTv ratio than the other groups(Figure 1(d)) although progeny from two independent par-ents in the ST77 line (ST77D and ST77F) exhibited lowerratios as well indicating that this characteristic alone cannot reliably indicate internal stresses or overall divergencefrom our controls The majority of detected base changeswere located in the third base of the codon likely gener-ating synonymous (silent) polymorphisms which were themajority of predicted SNP classifications in all events Lesscommon changes occurred in the first or second bases of thecodon generating a nonsynonymous or missense mutationlikely resulting in an amino acid change (Figures 2(e)ndash2(h))Missense to silent ratios detected in all experimental groupsappear to be higher than previously published results forsoybean indicated [40] although the higher nonsynonymousto synonymous mutations may be due to high linkagedisequilibrium exhibited in soy [41]

Interestingly the 764 line had the lowest overall missenseto silent polymorphism ratio demonstrating that while the764 line contained the highest overall number of SNPs ahigher percentage of the total polymorphisms were silentmutations (sim45) compared to the other three experimentalgroups (Figure 2(h)) This reveals the possibility that SNPsand INDELS detected here had originated from the origi-nal transformation event and have been highly conservedthrough self-crossed generations of progeny particularlyconsidering how remarkably well conserved the attributesof detected SNPs were between different biological samplesof the same transformation event Without sequencing datafrom prior generations of each parental line tested herefor comparison this can not be confirmed with completeconfidence however this does pose an interesting inquirythat even silent SNPsmay have the ability to alter gene expres-sion and peptide structure Indeed it has been suggestedthat although synonymous polymorphisms code for identicalamino acids subtle changes resulting from the utilization ofnonoptimal synonymous codons can modulate downstreamtranscription and translation rates [42] thereby altering thefolding conformations and proposed function of the finalpeptides This may explain why the 764 transgenic eventexhibited the highest rates of differentially expressed genesin conjunction with polymorphisms but still maintained ahealthy phenotype

While many SNP studies have excluded synonymousmutations and those present in noncoding regions recentworks have examined these sequences in light of theirpossible epigenetic effects on transcription and translationhowever more advanced tools for the prediction of possibleeffects are still in development and require further testingfor reliable forecasts [43] Nonsynonymous polymorphismsappear at similar rates across the experimental groupsvarying between 50 and 60 of total functional SNP clas-sifications While nonsynonymous mutations nearly alwaysalter amino acid sequences that can potentially yield non-functional protein products this phenomenonmay produce aneutral effect on the organism or protein or activate signaling

cascades in redundant systems [43] The effects of thesedetected nonsynonymous polymorphisms in each transgenicevent are not known or identifiable without extensive pro-teomic investigations into the altered downstream transcriptproductsThe 764 line which previously displayed the largestdegree of differential gene expression exhibited the lowestpercentage of missense and nonsense polymorphisms there-fore these alone are not likely to be the root cause of observedchanges in this transgenic line

Spontaneous mutations may also occur as a result of apreviously occurring stressful event such as parental lineAgrobacterium transformation Epigenetic alterations andnucleotide transpositions have been detected up to fivegenerations forward from transformation in Arabidopsis [4445] however all seeds examined here were bred to orbeyond the 5th generation of progeny The causes for thesemutations are unknown however each transformant was anindependent transformation event and was likely exposedto a varying degree of stress during the process This couldpotentially have introduced genomic variations that havebeen carried forward to the transcriptome as even largecopy number of variations have been reported in plants asa response to stress [46] Segregation of SNPs from existingparental heterozygous loci is expected in progeny howeverall samples examined here were remarkably consistent inSNP numbers location distributions and types within theirexperimental groups further demonstrating the genomicstability of these events This further solidifies the possibilitythat the detected variations in all transgenic lines arose froman early parental event possibly occurring during the originalAgrobacterium transformation or planting event and havebeen stably integrated and carried forward to the generationsexamined Considering the consistency of the SNP patternsthese polymorphisms have likely originated at the genomiclevel and are not likely to appear in such large quantities dueto transcriptional proofreading errors

Detected alterations occurred in a ldquobowtierdquo-shapeddistribution across the 20 chromosomes with lowerinstances of SNPs near the centromeres and the highestincidence of SNPs near the euchromatic telomeric ends of thechromosomal arms This is where gene density is the highestwhich has been illustrated previously by several worksconducted on soybean polymorphism rates [36 38 47] OursnpEff dataset summaries available in the Supporting Dataof this manuscript show the same distribution pattern withno significant alterations of SNP rates between biologicalreplicates of progeny Without complete characterization ofthe targeted SNPs in this work future directions can invokefunctional characterizations to cross-reference detectedpolymorphisms with possible connections to differentiallyexpressed genes and also to improve existing annotatedWilliams 82 cultivar SNPs In order to fully deduce the originof SNP variations with high confidence multiple generationanalyses will need to be conducted Approximately 70 ofthe genes that were differentially expressed also containedSNPs or INDELS however due to the extreme disparityin size between each gene list (a maximum of sim3000 andsim40000 genes resp) this could simply be due to chanceand can not definitively hold concrete biological significance

10 International Journal of Plant Genomics

without further investigation This suggests that the SNPsexamined here may directly impact native gene expressionlevels possibly due to their locations within gene regulatoryregions or enhancer elements located near the telomericends of the chromosomal arms

Gene ontology terms from SNP gene lists of all experi-mental groups were very similar returning many enrichedterms regarding protein transport and localization as wellas RNA and transcriptional processes even in the wild typegroup (Figure 3 and Supplemental Figure 4) Interestinglythese relationships are highly similar to the enriched GOterms derived from the differentially expressed gene sets ofthe transgenic lines indicating the majority of functionalrelationships between these polymorphisms are likely arisingfrom intercultivar variations and are not specifically relatedto transgenesis effects However given the possibility thatmutations from transgenesis could occur within QTLs thatcontrol traits such as plant size flower count and seedquantity identification of novel polymorphisms in theseregions of soybeanrsquos transcriptome is of great importanceto both plant breeders and in biotechnology applicationsIn fact recent investigations have revealed an excess of9 million unique SNPs that occur between 106 differentcultivated soybean varieties [48] Here we have providedlists of exonic SNPs unique to each of the three transgenicgroups and specific to transformation as well as novel SNPsdetected in our Williams 82 wild type controls Furtherinvestigations of the events described will identify the sitesof T-DNA integration and characterize the surroundingsequences to determine whether increased incidences ofpolymorphisms occur within the immediate vicinities of theinsertions Unique SNPs identified and reported here may beused to aide these investigations of crucial trait loci in thefuture to maximize yields and assist in breeding programs oftransgenic specimens

5 Conclusions

This study describes the first direct comparison of poly-morphism rates present in the seed transcriptome of threeindependent transgenic lines of Glycine max with a non-transgenic wild type line all derived from the Williams 82soybean cultivar Previously we examined gene expressionchanges in these transgenic events finding only the 764line to be significantly altered when compared to the othertransgenic lines In the same manner we have expanded ourprevious work and reported transcriptome single nucleotidepolymorphism rates for the ST77 ST111 and 764 trans-genic events compared to wild type Mirroring the resultsfrom our differential expression study the samples of the764 transformation event exhibited transcript SNP ratesnearly double that of the wild type ST77 and ST111 eventswhile also demonstrating higher amounts of transversionmutations While the exact cause for both the large num-ber of differentially expressed genes and polymorphismspresent in the 764 event is not known it is highly likelythat the two phenomena are correlated Our results showthat in addition to differential expression transgenesis mayresult in unpredicted base changes within gene transcripts

and regulatory regions without impacting plant pheno-type

Availability of Supporting Data

The RNA sequencing data described herein may be accessedat httpwwwncbinlmnihgovgeoqueryacccgiacc=GS-E64620 through NCBIrsquos Gene Expression Omnibus underaccession number GSE64620 Summary files for bcftoolsand snpEff outputs encompassing variant calls and predictedvariant effects are available from the iPlant CollaborativeDiscovery Environment directory available at httpdeiplan-tcollaborativeorgdld44044BEE-69B8-4129-A6B5-4C9291-B6B4A1snpeffsummarieszip and httpdeiplantcollabora-tiveorgdldAD464040-9169-4E52-87D5-DEF7466A2C1Evariantsummarieszip Lists of genes containing detectedvariants from all samples are available from iPlant in acompressed zip archive available at httpdeiplantcollabor-ativeorgdld533570A3-1EFB-4864-B9A9-9D82F17E09A8snpeffgeneszip

Competing Interests

Theauthors of this manuscript declare no conflict of interests

Authorsrsquo Contributions

Kevin C Lambirth designed the experiment conducted theSNP detection and effect prediction and generated figuresAdamMWhaley compiled and installed software formattedsequencing data files and was consulted on analysis KevinC Lambirth Kenneth J Piller and Kenneth L Bost wrote themanuscript with input from all authors

Acknowledgments

The authors would like to acknowledge the sequencingefforts of Dr Mike Wang (David H Murdock ResearchInstitute) and Dr Laura Reichenberg (Pfeiffer University) forassistance with plant propagation and selection and Dr KanWang (Iowa State University Agronomy Department) for theestablishment of the ST111 transgenic soybean line

References

[1] A Ziemienowicz ldquoAgrobacterium-mediated plant transforma-tion factors applications and recent advancesrdquoBiocatalysis andAgricultural Biotechnology vol 3 no 4 pp 95ndash102 2014

[2] M S Koch J M Ward S L Levine J A Baum J L Viciniand B G Hammond ldquoThe food and environmental safety of Btcropsrdquo Frontiers in Plant Science vol 6 article 283 2015

[3] G M Dill C A CaJacob and S R Padgette ldquoGlyphosate-resistant crops adoption use and future considerationsrdquo PestManagement Science vol 64 no 4 pp 326ndash331 2008

[4] M A S Valente J A Q A Faria J R L Soares-Ramos et alldquoThe ER luminal binding protein (BiP) mediates an increase indrought tolerance in soybean and delays drought-induced leafsenescence in soybean and tobaccordquo Journal of ExperimentalBotany vol 60 no 2 pp 533ndash546 2009

[5] L C Hudson B S Seabolt J Odle K L Bost C H Stahl andK J Piller ldquoSublethal staphylococcal enterotoxin B challenge

International Journal of Plant Genomics 11

model in pigs to evaluate protection following immunizationwith a soybean-derived vaccinerdquoClinical andVaccine Immunol-ogy vol 20 no 1 pp 24ndash32 2013

[6] Y Zhang D Li X Jin and Z Huang ldquoFighting Ebola withZMapp spotlight on plant-made antibodyrdquo Science China LifeSciences vol 57 no 10 pp 987ndash988 2014

[7] R Powell L C Hudson K C Lambirth et al ldquoRecombinantexpression of homodimeric 660 kDa human thyroglobulin insoybean seeds an alternative source of human thyroglobulinrdquoPlant Cell Reports vol 30 no 7 pp 1327ndash1338 2011

[8] K L Bost K C Lambirth L C Hudson and K J PillerldquoSoybean-derived thyroglobulin as an analyte specific reagentfor in vitro diagnostic tests and devicesrdquo Advances in Medicineand Biology vol 80 pp 23ndash40 2014

[9] R Abiri A Valdiani M Maziah et al ldquoA critical review ofthe concept of transgenic plants insights into pharmaceuticalbiotechnology andmolecular farmingrdquo Current Issues in Molec-ular Biology vol 18 no 1 pp 21ndash42 2016

[10] J L Oakes K L Bost and K J Piller ldquoStability of a soybeanseed-derived vaccine antigen following long-term storage pro-cessing and transport in the absence of a cold chainrdquo Journal ofthe Science of Food andAgriculture vol 89 no 13 pp 2191ndash21992009

[11] R Garg M Tolbert J L Oakes T E Clemente K L Bostand K J Piller ldquoChloroplast targeting of FanC the majorantigenic subunit of Escherichia coli K99 fimbriae in transgenicsoybeanrdquo Plant Cell Reports vol 26 no 7 pp 1011ndash1023 2007

[12] K J Piller T E Clemente S M Jun et al ldquoExpression andimmunogenicity of an Escherichia coli K99 fimbriae subunitantigen in soybeanrdquo Planta vol 222 no 1 pp 6ndash18 2005

[13] L C Hudson R Garg K L Bost and K J Piller ldquoSoybeanseeds a practical host for the production of functional subunitvaccinesrdquo BioMed Research International vol 2014 Article ID340804 13 pages 2014

[14] K L Bost and K J Piller ldquoProtein expression systems whysoybean seedsrdquo in SoybeanmdashMolecular Aspects of Breeding ASudaric Ed Intech Open 2011

[15] D Tilman K G Cassman P A Matson R Naylor and SPolasky ldquoAgricultural sustainability and intensive productionpracticesrdquo Nature vol 418 no 6898 pp 671ndash677 2002

[16] G N De La FuenteU K Frei and T Lubberstedt ldquoAcceleratingplant breedingrdquo Trends in Plant Science vol 18 no 12 pp 667ndash672 2013

[17] L C Hudson K C Lambirth K L Bost and K L PillerAdvancements in Transgenic Soy From Field to Bedside InTech2013

[18] J D Ray T C Kilen C A Abel and R L Paris ldquoSoybeannatural cross-pollination rates under field conditionsrdquo Environ-mental Biosafety Research vol 2 no 2 pp 133ndash138 2003

[19] K L Hart S L Kimura V Mushailov Z M BudimlijaM Prinz and E Wurmbach ldquoImproved eye- and skin-colorprediction based on 8 SNPsrdquo Croatian Medical Journal vol 54no 3 pp 248ndash256 2013

[20] S E Medland T Zayats B Glaser et al ldquoA variant inLIN28B is associatedwith 2D 4D finger-length ratio a putativeretrospective biomarker of prenatal testosterone exposurerdquoTheAmerican Journal of HumanGenetics vol 86 no 4 pp 519ndash5252010

[21] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[22] Y-H Li S-C Zhao J-X Ma et al ldquoMolecular footprints ofdomestication and improvement in soybean revealed by wholegenome re-sequencingrdquo BMC Genomics vol 14 article 5792013

[23] T Joshi B Valliyodan J-H Wu S-H Lee D Xu and H TNguyen ldquoGenomic differences between cultivated soybean Gmax and its wild relative G sojardquo BMCGenomics vol 14 articleS5 2013

[24] K C Lambirth AMWhaley I C Blakley et al ldquoA comparisonof transgenic andwild type soybean seeds analysis of transcrip-tome profiles using RNA-Seqrdquo BMC Biotechnology vol 15 no1 article 89 2015

[25] H Li ldquoA statistical framework for SNP callingmutation discov-ery association mapping and population genetical parameterestimation from sequencing datardquo Bioinformatics vol 27 no 21pp 2987ndash2993 2011

[26] H Li B Handsaker A Wysoker et al ldquoThe sequence align-mentMap format and SAMtoolsrdquo Bioinformatics vol 25 no16 pp 2078ndash2079 2009

[27] D M Goodstein S Shu R Howson et al ldquoPhytozome acomparative platform for green plant genomicsrdquo Nucleic AcidsResearch vol 40 no 1 pp D1178ndashD1186 2012

[28] C Trapnell A Roberts L Goff et al ldquoDifferential gene andtranscript expression analysis of RNA-seq experiments withTopHat and Cufflinksrdquo Nature Protocols vol 7 no 3 pp 562ndash578 2012

[29] PDanecek A AutonG Abecasis et al ldquoThe variant call formatand VCFtoolsrdquo Bioinformatics vol 27 no 15 pp 2156ndash21582011

[30] M Krzywinski J Schein I Birol et al ldquoCircos an informationaesthetic for comparative genomicsrdquo Genome Research vol 19no 9 pp 1639ndash1645 2009

[31] P Cingolani A Platts L L Wang et al ldquoA program forannotating and predicting the effects of single nucleotidepolymorphisms SnpEff SNPs in the genome of Drosophilamelanogaster strain w 1118 iso-2 iso-3rdquo Fly vol 6 no 2 pp80ndash92 2012

[32] Z Du X Zhou Y Ling Z Zhang and Z Su ldquoagriGO A GOanalysis toolkit for the agricultural communityrdquo Nucleic AcidsResearch vol 38 no 2 Article ID gkq310 pp W64ndashW70 2010

[33] S AGoffM Vaughn SMcKay et al ldquoThe iPlant collaborativecyberinfrastructure for plant biologyrdquo Frontiers in Plant Sciencevol 2 article 34 2011

[34] Y L Zhu Q J Song D L Hyten et al ldquoSingle-nucleotidepolymorphisms in soybeanrdquo Genetics vol 163 no 3 pp 1123ndash1134 2003

[35] Y Shu Y Li Z Zhu et al ldquoSNPs discovery and CAPS markerconversion in soybeanrdquo Molecular Biology Reports vol 38 no3 pp 1841ndash1846 2011

[36] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[37] X Wu C Ren T Joshi T Vuong D Xu and H T NguyenldquoSNP discovery by high-throughput sequencing in soybeanrdquoBMC Genomics vol 11 article 469 2010

[38] Y-H Li J C Reif S A Jackson Y-S Ma R-Z Chang and L-J Qiu ldquoDetecting SNPs underlying domestication-related traitsin soybeanrdquo BMC Plant Biology vol 14 article 251 2014

12 International Journal of Plant Genomics

[39] L C Hudson K L Bost and K J Piller ldquoOptimizing recom-binant protein expression in soybeanrdquo in SoybeanmdashMolecularAspects of Breeding InTech Rijeka Croatia 2011

[40] W Goettel E Xia R Upchurch M-L Wang P Chen andY-Q C An ldquoIdentification and characterization of transcriptpolymorphisms in soybean lines varying in oil composition andcontentrdquo BMC Genomics vol 15 article 299 2014

[41] C Chan X Qi M-W Li F-L Wong and H-M Lam ldquoRecentdevelopments of genomic research in soybeanrdquo Journal ofGenetics and Genomics vol 39 no 7 pp 317ndash324 2012

[42] A A Komar ldquoSNPs silent but not invisiblerdquo Science vol 315no 5811 pp 466ndash467 2007

[43] P Katsonis A Koire S J Wilson et al ldquoSingle nucleotidevariations biological impact and theoretical interpretationrdquoProtein Science vol 23 no 12 pp 1650ndash1666 2014

[44] W J Haun D L Hyten W W Xu et al ldquoThe composition andorigins of genomic variation among individuals of the soybeanreference cultivar williams 82rdquo Plant Physiology vol 155 no 2pp 645ndash655 2011

[45] S DeBolt ldquoCopy number variation shapes genome diversityin Arabidopsis over immediate family generational scalesrdquoGenome Biology and Evolution vol 2 no 1 pp 441ndash453 2010

[46] A Zmienko A Samelak P Kozłowski and M FiglerowiczldquoCopy number polymorphism in plant genomesrdquo Theoreticaland Applied Genetics vol 127 no 1 pp 1ndash18 2014

[47] Y-G Lee N Jeong J H Kim et al ldquoDevelopment validationand genetic analysis of a large soybean SNP genotyping arrayrdquoPlant Journal vol 81 no 4 pp 625ndash636 2015

[48] B Valliyodan Q Dan G Patil et al ldquoLandscape of genomicdiversity and trait discovery in soybeanrdquo Scientific Reports vol6 article 23598 2016

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Research Article Transcript Polymorphism Rates in …downloads.hindawi.com/journals/ijpg/2016/1562041.pdfResearch Article Transcript Polymorphism Rates in Soybean Seed Tissue Are Increased

International Journal of Plant Genomics 9

appeared to follow the same pattern of commonality withall groups demonstrating high percentages of transition basechanges with relatively low transversion counts The 764 linedid demonstrate a lower TsTv ratio than the other groups(Figure 1(d)) although progeny from two independent par-ents in the ST77 line (ST77D and ST77F) exhibited lowerratios as well indicating that this characteristic alone cannot reliably indicate internal stresses or overall divergencefrom our controls The majority of detected base changeswere located in the third base of the codon likely gener-ating synonymous (silent) polymorphisms which were themajority of predicted SNP classifications in all events Lesscommon changes occurred in the first or second bases of thecodon generating a nonsynonymous or missense mutationlikely resulting in an amino acid change (Figures 2(e)ndash2(h))Missense to silent ratios detected in all experimental groupsappear to be higher than previously published results forsoybean indicated [40] although the higher nonsynonymousto synonymous mutations may be due to high linkagedisequilibrium exhibited in soy [41]

Interestingly the 764 line had the lowest overall missenseto silent polymorphism ratio demonstrating that while the764 line contained the highest overall number of SNPs ahigher percentage of the total polymorphisms were silentmutations (sim45) compared to the other three experimentalgroups (Figure 2(h)) This reveals the possibility that SNPsand INDELS detected here had originated from the origi-nal transformation event and have been highly conservedthrough self-crossed generations of progeny particularlyconsidering how remarkably well conserved the attributesof detected SNPs were between different biological samplesof the same transformation event Without sequencing datafrom prior generations of each parental line tested herefor comparison this can not be confirmed with completeconfidence however this does pose an interesting inquirythat even silent SNPsmay have the ability to alter gene expres-sion and peptide structure Indeed it has been suggestedthat although synonymous polymorphisms code for identicalamino acids subtle changes resulting from the utilization ofnonoptimal synonymous codons can modulate downstreamtranscription and translation rates [42] thereby altering thefolding conformations and proposed function of the finalpeptides This may explain why the 764 transgenic eventexhibited the highest rates of differentially expressed genesin conjunction with polymorphisms but still maintained ahealthy phenotype

While many SNP studies have excluded synonymousmutations and those present in noncoding regions recentworks have examined these sequences in light of theirpossible epigenetic effects on transcription and translationhowever more advanced tools for the prediction of possibleeffects are still in development and require further testingfor reliable forecasts [43] Nonsynonymous polymorphismsappear at similar rates across the experimental groupsvarying between 50 and 60 of total functional SNP clas-sifications While nonsynonymous mutations nearly alwaysalter amino acid sequences that can potentially yield non-functional protein products this phenomenonmay produce aneutral effect on the organism or protein or activate signaling

cascades in redundant systems [43] The effects of thesedetected nonsynonymous polymorphisms in each transgenicevent are not known or identifiable without extensive pro-teomic investigations into the altered downstream transcriptproductsThe 764 line which previously displayed the largestdegree of differential gene expression exhibited the lowestpercentage of missense and nonsense polymorphisms there-fore these alone are not likely to be the root cause of observedchanges in this transgenic line

Spontaneous mutations may also occur as a result of apreviously occurring stressful event such as parental lineAgrobacterium transformation Epigenetic alterations andnucleotide transpositions have been detected up to fivegenerations forward from transformation in Arabidopsis [4445] however all seeds examined here were bred to orbeyond the 5th generation of progeny The causes for thesemutations are unknown however each transformant was anindependent transformation event and was likely exposedto a varying degree of stress during the process This couldpotentially have introduced genomic variations that havebeen carried forward to the transcriptome as even largecopy number of variations have been reported in plants asa response to stress [46] Segregation of SNPs from existingparental heterozygous loci is expected in progeny howeverall samples examined here were remarkably consistent inSNP numbers location distributions and types within theirexperimental groups further demonstrating the genomicstability of these events This further solidifies the possibilitythat the detected variations in all transgenic lines arose froman early parental event possibly occurring during the originalAgrobacterium transformation or planting event and havebeen stably integrated and carried forward to the generationsexamined Considering the consistency of the SNP patternsthese polymorphisms have likely originated at the genomiclevel and are not likely to appear in such large quantities dueto transcriptional proofreading errors

Detected alterations occurred in a ldquobowtierdquo-shapeddistribution across the 20 chromosomes with lowerinstances of SNPs near the centromeres and the highestincidence of SNPs near the euchromatic telomeric ends of thechromosomal arms This is where gene density is the highestwhich has been illustrated previously by several worksconducted on soybean polymorphism rates [36 38 47] OursnpEff dataset summaries available in the Supporting Dataof this manuscript show the same distribution pattern withno significant alterations of SNP rates between biologicalreplicates of progeny Without complete characterization ofthe targeted SNPs in this work future directions can invokefunctional characterizations to cross-reference detectedpolymorphisms with possible connections to differentiallyexpressed genes and also to improve existing annotatedWilliams 82 cultivar SNPs In order to fully deduce the originof SNP variations with high confidence multiple generationanalyses will need to be conducted Approximately 70 ofthe genes that were differentially expressed also containedSNPs or INDELS however due to the extreme disparityin size between each gene list (a maximum of sim3000 andsim40000 genes resp) this could simply be due to chanceand can not definitively hold concrete biological significance

10 International Journal of Plant Genomics

without further investigation This suggests that the SNPsexamined here may directly impact native gene expressionlevels possibly due to their locations within gene regulatoryregions or enhancer elements located near the telomericends of the chromosomal arms

Gene ontology terms from SNP gene lists of all experi-mental groups were very similar returning many enrichedterms regarding protein transport and localization as wellas RNA and transcriptional processes even in the wild typegroup (Figure 3 and Supplemental Figure 4) Interestinglythese relationships are highly similar to the enriched GOterms derived from the differentially expressed gene sets ofthe transgenic lines indicating the majority of functionalrelationships between these polymorphisms are likely arisingfrom intercultivar variations and are not specifically relatedto transgenesis effects However given the possibility thatmutations from transgenesis could occur within QTLs thatcontrol traits such as plant size flower count and seedquantity identification of novel polymorphisms in theseregions of soybeanrsquos transcriptome is of great importanceto both plant breeders and in biotechnology applicationsIn fact recent investigations have revealed an excess of9 million unique SNPs that occur between 106 differentcultivated soybean varieties [48] Here we have providedlists of exonic SNPs unique to each of the three transgenicgroups and specific to transformation as well as novel SNPsdetected in our Williams 82 wild type controls Furtherinvestigations of the events described will identify the sitesof T-DNA integration and characterize the surroundingsequences to determine whether increased incidences ofpolymorphisms occur within the immediate vicinities of theinsertions Unique SNPs identified and reported here may beused to aide these investigations of crucial trait loci in thefuture to maximize yields and assist in breeding programs oftransgenic specimens

5 Conclusions

This study describes the first direct comparison of poly-morphism rates present in the seed transcriptome of threeindependent transgenic lines of Glycine max with a non-transgenic wild type line all derived from the Williams 82soybean cultivar Previously we examined gene expressionchanges in these transgenic events finding only the 764line to be significantly altered when compared to the othertransgenic lines In the same manner we have expanded ourprevious work and reported transcriptome single nucleotidepolymorphism rates for the ST77 ST111 and 764 trans-genic events compared to wild type Mirroring the resultsfrom our differential expression study the samples of the764 transformation event exhibited transcript SNP ratesnearly double that of the wild type ST77 and ST111 eventswhile also demonstrating higher amounts of transversionmutations While the exact cause for both the large num-ber of differentially expressed genes and polymorphismspresent in the 764 event is not known it is highly likelythat the two phenomena are correlated Our results showthat in addition to differential expression transgenesis mayresult in unpredicted base changes within gene transcripts

and regulatory regions without impacting plant pheno-type

Availability of Supporting Data

The RNA sequencing data described herein may be accessedat httpwwwncbinlmnihgovgeoqueryacccgiacc=GS-E64620 through NCBIrsquos Gene Expression Omnibus underaccession number GSE64620 Summary files for bcftoolsand snpEff outputs encompassing variant calls and predictedvariant effects are available from the iPlant CollaborativeDiscovery Environment directory available at httpdeiplan-tcollaborativeorgdld44044BEE-69B8-4129-A6B5-4C9291-B6B4A1snpeffsummarieszip and httpdeiplantcollabora-tiveorgdldAD464040-9169-4E52-87D5-DEF7466A2C1Evariantsummarieszip Lists of genes containing detectedvariants from all samples are available from iPlant in acompressed zip archive available at httpdeiplantcollabor-ativeorgdld533570A3-1EFB-4864-B9A9-9D82F17E09A8snpeffgeneszip

Competing Interests

Theauthors of this manuscript declare no conflict of interests

Authorsrsquo Contributions

Kevin C Lambirth designed the experiment conducted theSNP detection and effect prediction and generated figuresAdamMWhaley compiled and installed software formattedsequencing data files and was consulted on analysis KevinC Lambirth Kenneth J Piller and Kenneth L Bost wrote themanuscript with input from all authors

Acknowledgments

The authors would like to acknowledge the sequencingefforts of Dr Mike Wang (David H Murdock ResearchInstitute) and Dr Laura Reichenberg (Pfeiffer University) forassistance with plant propagation and selection and Dr KanWang (Iowa State University Agronomy Department) for theestablishment of the ST111 transgenic soybean line

References

[1] A Ziemienowicz ldquoAgrobacterium-mediated plant transforma-tion factors applications and recent advancesrdquoBiocatalysis andAgricultural Biotechnology vol 3 no 4 pp 95ndash102 2014

[2] M S Koch J M Ward S L Levine J A Baum J L Viciniand B G Hammond ldquoThe food and environmental safety of Btcropsrdquo Frontiers in Plant Science vol 6 article 283 2015

[3] G M Dill C A CaJacob and S R Padgette ldquoGlyphosate-resistant crops adoption use and future considerationsrdquo PestManagement Science vol 64 no 4 pp 326ndash331 2008

[4] M A S Valente J A Q A Faria J R L Soares-Ramos et alldquoThe ER luminal binding protein (BiP) mediates an increase indrought tolerance in soybean and delays drought-induced leafsenescence in soybean and tobaccordquo Journal of ExperimentalBotany vol 60 no 2 pp 533ndash546 2009

[5] L C Hudson B S Seabolt J Odle K L Bost C H Stahl andK J Piller ldquoSublethal staphylococcal enterotoxin B challenge

International Journal of Plant Genomics 11

model in pigs to evaluate protection following immunizationwith a soybean-derived vaccinerdquoClinical andVaccine Immunol-ogy vol 20 no 1 pp 24ndash32 2013

[6] Y Zhang D Li X Jin and Z Huang ldquoFighting Ebola withZMapp spotlight on plant-made antibodyrdquo Science China LifeSciences vol 57 no 10 pp 987ndash988 2014

[7] R Powell L C Hudson K C Lambirth et al ldquoRecombinantexpression of homodimeric 660 kDa human thyroglobulin insoybean seeds an alternative source of human thyroglobulinrdquoPlant Cell Reports vol 30 no 7 pp 1327ndash1338 2011

[8] K L Bost K C Lambirth L C Hudson and K J PillerldquoSoybean-derived thyroglobulin as an analyte specific reagentfor in vitro diagnostic tests and devicesrdquo Advances in Medicineand Biology vol 80 pp 23ndash40 2014

[9] R Abiri A Valdiani M Maziah et al ldquoA critical review ofthe concept of transgenic plants insights into pharmaceuticalbiotechnology andmolecular farmingrdquo Current Issues in Molec-ular Biology vol 18 no 1 pp 21ndash42 2016

[10] J L Oakes K L Bost and K J Piller ldquoStability of a soybeanseed-derived vaccine antigen following long-term storage pro-cessing and transport in the absence of a cold chainrdquo Journal ofthe Science of Food andAgriculture vol 89 no 13 pp 2191ndash21992009

[11] R Garg M Tolbert J L Oakes T E Clemente K L Bostand K J Piller ldquoChloroplast targeting of FanC the majorantigenic subunit of Escherichia coli K99 fimbriae in transgenicsoybeanrdquo Plant Cell Reports vol 26 no 7 pp 1011ndash1023 2007

[12] K J Piller T E Clemente S M Jun et al ldquoExpression andimmunogenicity of an Escherichia coli K99 fimbriae subunitantigen in soybeanrdquo Planta vol 222 no 1 pp 6ndash18 2005

[13] L C Hudson R Garg K L Bost and K J Piller ldquoSoybeanseeds a practical host for the production of functional subunitvaccinesrdquo BioMed Research International vol 2014 Article ID340804 13 pages 2014

[14] K L Bost and K J Piller ldquoProtein expression systems whysoybean seedsrdquo in SoybeanmdashMolecular Aspects of Breeding ASudaric Ed Intech Open 2011

[15] D Tilman K G Cassman P A Matson R Naylor and SPolasky ldquoAgricultural sustainability and intensive productionpracticesrdquo Nature vol 418 no 6898 pp 671ndash677 2002

[16] G N De La FuenteU K Frei and T Lubberstedt ldquoAcceleratingplant breedingrdquo Trends in Plant Science vol 18 no 12 pp 667ndash672 2013

[17] L C Hudson K C Lambirth K L Bost and K L PillerAdvancements in Transgenic Soy From Field to Bedside InTech2013

[18] J D Ray T C Kilen C A Abel and R L Paris ldquoSoybeannatural cross-pollination rates under field conditionsrdquo Environ-mental Biosafety Research vol 2 no 2 pp 133ndash138 2003

[19] K L Hart S L Kimura V Mushailov Z M BudimlijaM Prinz and E Wurmbach ldquoImproved eye- and skin-colorprediction based on 8 SNPsrdquo Croatian Medical Journal vol 54no 3 pp 248ndash256 2013

[20] S E Medland T Zayats B Glaser et al ldquoA variant inLIN28B is associatedwith 2D 4D finger-length ratio a putativeretrospective biomarker of prenatal testosterone exposurerdquoTheAmerican Journal of HumanGenetics vol 86 no 4 pp 519ndash5252010

[21] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[22] Y-H Li S-C Zhao J-X Ma et al ldquoMolecular footprints ofdomestication and improvement in soybean revealed by wholegenome re-sequencingrdquo BMC Genomics vol 14 article 5792013

[23] T Joshi B Valliyodan J-H Wu S-H Lee D Xu and H TNguyen ldquoGenomic differences between cultivated soybean Gmax and its wild relative G sojardquo BMCGenomics vol 14 articleS5 2013

[24] K C Lambirth AMWhaley I C Blakley et al ldquoA comparisonof transgenic andwild type soybean seeds analysis of transcrip-tome profiles using RNA-Seqrdquo BMC Biotechnology vol 15 no1 article 89 2015

[25] H Li ldquoA statistical framework for SNP callingmutation discov-ery association mapping and population genetical parameterestimation from sequencing datardquo Bioinformatics vol 27 no 21pp 2987ndash2993 2011

[26] H Li B Handsaker A Wysoker et al ldquoThe sequence align-mentMap format and SAMtoolsrdquo Bioinformatics vol 25 no16 pp 2078ndash2079 2009

[27] D M Goodstein S Shu R Howson et al ldquoPhytozome acomparative platform for green plant genomicsrdquo Nucleic AcidsResearch vol 40 no 1 pp D1178ndashD1186 2012

[28] C Trapnell A Roberts L Goff et al ldquoDifferential gene andtranscript expression analysis of RNA-seq experiments withTopHat and Cufflinksrdquo Nature Protocols vol 7 no 3 pp 562ndash578 2012

[29] PDanecek A AutonG Abecasis et al ldquoThe variant call formatand VCFtoolsrdquo Bioinformatics vol 27 no 15 pp 2156ndash21582011

[30] M Krzywinski J Schein I Birol et al ldquoCircos an informationaesthetic for comparative genomicsrdquo Genome Research vol 19no 9 pp 1639ndash1645 2009

[31] P Cingolani A Platts L L Wang et al ldquoA program forannotating and predicting the effects of single nucleotidepolymorphisms SnpEff SNPs in the genome of Drosophilamelanogaster strain w 1118 iso-2 iso-3rdquo Fly vol 6 no 2 pp80ndash92 2012

[32] Z Du X Zhou Y Ling Z Zhang and Z Su ldquoagriGO A GOanalysis toolkit for the agricultural communityrdquo Nucleic AcidsResearch vol 38 no 2 Article ID gkq310 pp W64ndashW70 2010

[33] S AGoffM Vaughn SMcKay et al ldquoThe iPlant collaborativecyberinfrastructure for plant biologyrdquo Frontiers in Plant Sciencevol 2 article 34 2011

[34] Y L Zhu Q J Song D L Hyten et al ldquoSingle-nucleotidepolymorphisms in soybeanrdquo Genetics vol 163 no 3 pp 1123ndash1134 2003

[35] Y Shu Y Li Z Zhu et al ldquoSNPs discovery and CAPS markerconversion in soybeanrdquo Molecular Biology Reports vol 38 no3 pp 1841ndash1846 2011

[36] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[37] X Wu C Ren T Joshi T Vuong D Xu and H T NguyenldquoSNP discovery by high-throughput sequencing in soybeanrdquoBMC Genomics vol 11 article 469 2010

[38] Y-H Li J C Reif S A Jackson Y-S Ma R-Z Chang and L-J Qiu ldquoDetecting SNPs underlying domestication-related traitsin soybeanrdquo BMC Plant Biology vol 14 article 251 2014

12 International Journal of Plant Genomics

[39] L C Hudson K L Bost and K J Piller ldquoOptimizing recom-binant protein expression in soybeanrdquo in SoybeanmdashMolecularAspects of Breeding InTech Rijeka Croatia 2011

[40] W Goettel E Xia R Upchurch M-L Wang P Chen andY-Q C An ldquoIdentification and characterization of transcriptpolymorphisms in soybean lines varying in oil composition andcontentrdquo BMC Genomics vol 15 article 299 2014

[41] C Chan X Qi M-W Li F-L Wong and H-M Lam ldquoRecentdevelopments of genomic research in soybeanrdquo Journal ofGenetics and Genomics vol 39 no 7 pp 317ndash324 2012

[42] A A Komar ldquoSNPs silent but not invisiblerdquo Science vol 315no 5811 pp 466ndash467 2007

[43] P Katsonis A Koire S J Wilson et al ldquoSingle nucleotidevariations biological impact and theoretical interpretationrdquoProtein Science vol 23 no 12 pp 1650ndash1666 2014

[44] W J Haun D L Hyten W W Xu et al ldquoThe composition andorigins of genomic variation among individuals of the soybeanreference cultivar williams 82rdquo Plant Physiology vol 155 no 2pp 645ndash655 2011

[45] S DeBolt ldquoCopy number variation shapes genome diversityin Arabidopsis over immediate family generational scalesrdquoGenome Biology and Evolution vol 2 no 1 pp 441ndash453 2010

[46] A Zmienko A Samelak P Kozłowski and M FiglerowiczldquoCopy number polymorphism in plant genomesrdquo Theoreticaland Applied Genetics vol 127 no 1 pp 1ndash18 2014

[47] Y-G Lee N Jeong J H Kim et al ldquoDevelopment validationand genetic analysis of a large soybean SNP genotyping arrayrdquoPlant Journal vol 81 no 4 pp 625ndash636 2015

[48] B Valliyodan Q Dan G Patil et al ldquoLandscape of genomicdiversity and trait discovery in soybeanrdquo Scientific Reports vol6 article 23598 2016

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 10: Research Article Transcript Polymorphism Rates in …downloads.hindawi.com/journals/ijpg/2016/1562041.pdfResearch Article Transcript Polymorphism Rates in Soybean Seed Tissue Are Increased

10 International Journal of Plant Genomics

without further investigation This suggests that the SNPsexamined here may directly impact native gene expressionlevels possibly due to their locations within gene regulatoryregions or enhancer elements located near the telomericends of the chromosomal arms

Gene ontology terms from SNP gene lists of all experi-mental groups were very similar returning many enrichedterms regarding protein transport and localization as wellas RNA and transcriptional processes even in the wild typegroup (Figure 3 and Supplemental Figure 4) Interestinglythese relationships are highly similar to the enriched GOterms derived from the differentially expressed gene sets ofthe transgenic lines indicating the majority of functionalrelationships between these polymorphisms are likely arisingfrom intercultivar variations and are not specifically relatedto transgenesis effects However given the possibility thatmutations from transgenesis could occur within QTLs thatcontrol traits such as plant size flower count and seedquantity identification of novel polymorphisms in theseregions of soybeanrsquos transcriptome is of great importanceto both plant breeders and in biotechnology applicationsIn fact recent investigations have revealed an excess of9 million unique SNPs that occur between 106 differentcultivated soybean varieties [48] Here we have providedlists of exonic SNPs unique to each of the three transgenicgroups and specific to transformation as well as novel SNPsdetected in our Williams 82 wild type controls Furtherinvestigations of the events described will identify the sitesof T-DNA integration and characterize the surroundingsequences to determine whether increased incidences ofpolymorphisms occur within the immediate vicinities of theinsertions Unique SNPs identified and reported here may beused to aide these investigations of crucial trait loci in thefuture to maximize yields and assist in breeding programs oftransgenic specimens

5 Conclusions

This study describes the first direct comparison of poly-morphism rates present in the seed transcriptome of threeindependent transgenic lines of Glycine max with a non-transgenic wild type line all derived from the Williams 82soybean cultivar Previously we examined gene expressionchanges in these transgenic events finding only the 764line to be significantly altered when compared to the othertransgenic lines In the same manner we have expanded ourprevious work and reported transcriptome single nucleotidepolymorphism rates for the ST77 ST111 and 764 trans-genic events compared to wild type Mirroring the resultsfrom our differential expression study the samples of the764 transformation event exhibited transcript SNP ratesnearly double that of the wild type ST77 and ST111 eventswhile also demonstrating higher amounts of transversionmutations While the exact cause for both the large num-ber of differentially expressed genes and polymorphismspresent in the 764 event is not known it is highly likelythat the two phenomena are correlated Our results showthat in addition to differential expression transgenesis mayresult in unpredicted base changes within gene transcripts

and regulatory regions without impacting plant pheno-type

Availability of Supporting Data

The RNA sequencing data described herein may be accessedat httpwwwncbinlmnihgovgeoqueryacccgiacc=GS-E64620 through NCBIrsquos Gene Expression Omnibus underaccession number GSE64620 Summary files for bcftoolsand snpEff outputs encompassing variant calls and predictedvariant effects are available from the iPlant CollaborativeDiscovery Environment directory available at httpdeiplan-tcollaborativeorgdld44044BEE-69B8-4129-A6B5-4C9291-B6B4A1snpeffsummarieszip and httpdeiplantcollabora-tiveorgdldAD464040-9169-4E52-87D5-DEF7466A2C1Evariantsummarieszip Lists of genes containing detectedvariants from all samples are available from iPlant in acompressed zip archive available at httpdeiplantcollabor-ativeorgdld533570A3-1EFB-4864-B9A9-9D82F17E09A8snpeffgeneszip

Competing Interests

Theauthors of this manuscript declare no conflict of interests

Authorsrsquo Contributions

Kevin C Lambirth designed the experiment conducted theSNP detection and effect prediction and generated figuresAdamMWhaley compiled and installed software formattedsequencing data files and was consulted on analysis KevinC Lambirth Kenneth J Piller and Kenneth L Bost wrote themanuscript with input from all authors

Acknowledgments

The authors would like to acknowledge the sequencingefforts of Dr Mike Wang (David H Murdock ResearchInstitute) and Dr Laura Reichenberg (Pfeiffer University) forassistance with plant propagation and selection and Dr KanWang (Iowa State University Agronomy Department) for theestablishment of the ST111 transgenic soybean line

References

[1] A Ziemienowicz ldquoAgrobacterium-mediated plant transforma-tion factors applications and recent advancesrdquoBiocatalysis andAgricultural Biotechnology vol 3 no 4 pp 95ndash102 2014

[2] M S Koch J M Ward S L Levine J A Baum J L Viciniand B G Hammond ldquoThe food and environmental safety of Btcropsrdquo Frontiers in Plant Science vol 6 article 283 2015

[3] G M Dill C A CaJacob and S R Padgette ldquoGlyphosate-resistant crops adoption use and future considerationsrdquo PestManagement Science vol 64 no 4 pp 326ndash331 2008

[4] M A S Valente J A Q A Faria J R L Soares-Ramos et alldquoThe ER luminal binding protein (BiP) mediates an increase indrought tolerance in soybean and delays drought-induced leafsenescence in soybean and tobaccordquo Journal of ExperimentalBotany vol 60 no 2 pp 533ndash546 2009

[5] L C Hudson B S Seabolt J Odle K L Bost C H Stahl andK J Piller ldquoSublethal staphylococcal enterotoxin B challenge

International Journal of Plant Genomics 11

model in pigs to evaluate protection following immunizationwith a soybean-derived vaccinerdquoClinical andVaccine Immunol-ogy vol 20 no 1 pp 24ndash32 2013

[6] Y Zhang D Li X Jin and Z Huang ldquoFighting Ebola withZMapp spotlight on plant-made antibodyrdquo Science China LifeSciences vol 57 no 10 pp 987ndash988 2014

[7] R Powell L C Hudson K C Lambirth et al ldquoRecombinantexpression of homodimeric 660 kDa human thyroglobulin insoybean seeds an alternative source of human thyroglobulinrdquoPlant Cell Reports vol 30 no 7 pp 1327ndash1338 2011

[8] K L Bost K C Lambirth L C Hudson and K J PillerldquoSoybean-derived thyroglobulin as an analyte specific reagentfor in vitro diagnostic tests and devicesrdquo Advances in Medicineand Biology vol 80 pp 23ndash40 2014

[9] R Abiri A Valdiani M Maziah et al ldquoA critical review ofthe concept of transgenic plants insights into pharmaceuticalbiotechnology andmolecular farmingrdquo Current Issues in Molec-ular Biology vol 18 no 1 pp 21ndash42 2016

[10] J L Oakes K L Bost and K J Piller ldquoStability of a soybeanseed-derived vaccine antigen following long-term storage pro-cessing and transport in the absence of a cold chainrdquo Journal ofthe Science of Food andAgriculture vol 89 no 13 pp 2191ndash21992009

[11] R Garg M Tolbert J L Oakes T E Clemente K L Bostand K J Piller ldquoChloroplast targeting of FanC the majorantigenic subunit of Escherichia coli K99 fimbriae in transgenicsoybeanrdquo Plant Cell Reports vol 26 no 7 pp 1011ndash1023 2007

[12] K J Piller T E Clemente S M Jun et al ldquoExpression andimmunogenicity of an Escherichia coli K99 fimbriae subunitantigen in soybeanrdquo Planta vol 222 no 1 pp 6ndash18 2005

[13] L C Hudson R Garg K L Bost and K J Piller ldquoSoybeanseeds a practical host for the production of functional subunitvaccinesrdquo BioMed Research International vol 2014 Article ID340804 13 pages 2014

[14] K L Bost and K J Piller ldquoProtein expression systems whysoybean seedsrdquo in SoybeanmdashMolecular Aspects of Breeding ASudaric Ed Intech Open 2011

[15] D Tilman K G Cassman P A Matson R Naylor and SPolasky ldquoAgricultural sustainability and intensive productionpracticesrdquo Nature vol 418 no 6898 pp 671ndash677 2002

[16] G N De La FuenteU K Frei and T Lubberstedt ldquoAcceleratingplant breedingrdquo Trends in Plant Science vol 18 no 12 pp 667ndash672 2013

[17] L C Hudson K C Lambirth K L Bost and K L PillerAdvancements in Transgenic Soy From Field to Bedside InTech2013

[18] J D Ray T C Kilen C A Abel and R L Paris ldquoSoybeannatural cross-pollination rates under field conditionsrdquo Environ-mental Biosafety Research vol 2 no 2 pp 133ndash138 2003

[19] K L Hart S L Kimura V Mushailov Z M BudimlijaM Prinz and E Wurmbach ldquoImproved eye- and skin-colorprediction based on 8 SNPsrdquo Croatian Medical Journal vol 54no 3 pp 248ndash256 2013

[20] S E Medland T Zayats B Glaser et al ldquoA variant inLIN28B is associatedwith 2D 4D finger-length ratio a putativeretrospective biomarker of prenatal testosterone exposurerdquoTheAmerican Journal of HumanGenetics vol 86 no 4 pp 519ndash5252010

[21] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[22] Y-H Li S-C Zhao J-X Ma et al ldquoMolecular footprints ofdomestication and improvement in soybean revealed by wholegenome re-sequencingrdquo BMC Genomics vol 14 article 5792013

[23] T Joshi B Valliyodan J-H Wu S-H Lee D Xu and H TNguyen ldquoGenomic differences between cultivated soybean Gmax and its wild relative G sojardquo BMCGenomics vol 14 articleS5 2013

[24] K C Lambirth AMWhaley I C Blakley et al ldquoA comparisonof transgenic andwild type soybean seeds analysis of transcrip-tome profiles using RNA-Seqrdquo BMC Biotechnology vol 15 no1 article 89 2015

[25] H Li ldquoA statistical framework for SNP callingmutation discov-ery association mapping and population genetical parameterestimation from sequencing datardquo Bioinformatics vol 27 no 21pp 2987ndash2993 2011

[26] H Li B Handsaker A Wysoker et al ldquoThe sequence align-mentMap format and SAMtoolsrdquo Bioinformatics vol 25 no16 pp 2078ndash2079 2009

[27] D M Goodstein S Shu R Howson et al ldquoPhytozome acomparative platform for green plant genomicsrdquo Nucleic AcidsResearch vol 40 no 1 pp D1178ndashD1186 2012

[28] C Trapnell A Roberts L Goff et al ldquoDifferential gene andtranscript expression analysis of RNA-seq experiments withTopHat and Cufflinksrdquo Nature Protocols vol 7 no 3 pp 562ndash578 2012

[29] PDanecek A AutonG Abecasis et al ldquoThe variant call formatand VCFtoolsrdquo Bioinformatics vol 27 no 15 pp 2156ndash21582011

[30] M Krzywinski J Schein I Birol et al ldquoCircos an informationaesthetic for comparative genomicsrdquo Genome Research vol 19no 9 pp 1639ndash1645 2009

[31] P Cingolani A Platts L L Wang et al ldquoA program forannotating and predicting the effects of single nucleotidepolymorphisms SnpEff SNPs in the genome of Drosophilamelanogaster strain w 1118 iso-2 iso-3rdquo Fly vol 6 no 2 pp80ndash92 2012

[32] Z Du X Zhou Y Ling Z Zhang and Z Su ldquoagriGO A GOanalysis toolkit for the agricultural communityrdquo Nucleic AcidsResearch vol 38 no 2 Article ID gkq310 pp W64ndashW70 2010

[33] S AGoffM Vaughn SMcKay et al ldquoThe iPlant collaborativecyberinfrastructure for plant biologyrdquo Frontiers in Plant Sciencevol 2 article 34 2011

[34] Y L Zhu Q J Song D L Hyten et al ldquoSingle-nucleotidepolymorphisms in soybeanrdquo Genetics vol 163 no 3 pp 1123ndash1134 2003

[35] Y Shu Y Li Z Zhu et al ldquoSNPs discovery and CAPS markerconversion in soybeanrdquo Molecular Biology Reports vol 38 no3 pp 1841ndash1846 2011

[36] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[37] X Wu C Ren T Joshi T Vuong D Xu and H T NguyenldquoSNP discovery by high-throughput sequencing in soybeanrdquoBMC Genomics vol 11 article 469 2010

[38] Y-H Li J C Reif S A Jackson Y-S Ma R-Z Chang and L-J Qiu ldquoDetecting SNPs underlying domestication-related traitsin soybeanrdquo BMC Plant Biology vol 14 article 251 2014

12 International Journal of Plant Genomics

[39] L C Hudson K L Bost and K J Piller ldquoOptimizing recom-binant protein expression in soybeanrdquo in SoybeanmdashMolecularAspects of Breeding InTech Rijeka Croatia 2011

[40] W Goettel E Xia R Upchurch M-L Wang P Chen andY-Q C An ldquoIdentification and characterization of transcriptpolymorphisms in soybean lines varying in oil composition andcontentrdquo BMC Genomics vol 15 article 299 2014

[41] C Chan X Qi M-W Li F-L Wong and H-M Lam ldquoRecentdevelopments of genomic research in soybeanrdquo Journal ofGenetics and Genomics vol 39 no 7 pp 317ndash324 2012

[42] A A Komar ldquoSNPs silent but not invisiblerdquo Science vol 315no 5811 pp 466ndash467 2007

[43] P Katsonis A Koire S J Wilson et al ldquoSingle nucleotidevariations biological impact and theoretical interpretationrdquoProtein Science vol 23 no 12 pp 1650ndash1666 2014

[44] W J Haun D L Hyten W W Xu et al ldquoThe composition andorigins of genomic variation among individuals of the soybeanreference cultivar williams 82rdquo Plant Physiology vol 155 no 2pp 645ndash655 2011

[45] S DeBolt ldquoCopy number variation shapes genome diversityin Arabidopsis over immediate family generational scalesrdquoGenome Biology and Evolution vol 2 no 1 pp 441ndash453 2010

[46] A Zmienko A Samelak P Kozłowski and M FiglerowiczldquoCopy number polymorphism in plant genomesrdquo Theoreticaland Applied Genetics vol 127 no 1 pp 1ndash18 2014

[47] Y-G Lee N Jeong J H Kim et al ldquoDevelopment validationand genetic analysis of a large soybean SNP genotyping arrayrdquoPlant Journal vol 81 no 4 pp 625ndash636 2015

[48] B Valliyodan Q Dan G Patil et al ldquoLandscape of genomicdiversity and trait discovery in soybeanrdquo Scientific Reports vol6 article 23598 2016

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 11: Research Article Transcript Polymorphism Rates in …downloads.hindawi.com/journals/ijpg/2016/1562041.pdfResearch Article Transcript Polymorphism Rates in Soybean Seed Tissue Are Increased

International Journal of Plant Genomics 11

model in pigs to evaluate protection following immunizationwith a soybean-derived vaccinerdquoClinical andVaccine Immunol-ogy vol 20 no 1 pp 24ndash32 2013

[6] Y Zhang D Li X Jin and Z Huang ldquoFighting Ebola withZMapp spotlight on plant-made antibodyrdquo Science China LifeSciences vol 57 no 10 pp 987ndash988 2014

[7] R Powell L C Hudson K C Lambirth et al ldquoRecombinantexpression of homodimeric 660 kDa human thyroglobulin insoybean seeds an alternative source of human thyroglobulinrdquoPlant Cell Reports vol 30 no 7 pp 1327ndash1338 2011

[8] K L Bost K C Lambirth L C Hudson and K J PillerldquoSoybean-derived thyroglobulin as an analyte specific reagentfor in vitro diagnostic tests and devicesrdquo Advances in Medicineand Biology vol 80 pp 23ndash40 2014

[9] R Abiri A Valdiani M Maziah et al ldquoA critical review ofthe concept of transgenic plants insights into pharmaceuticalbiotechnology andmolecular farmingrdquo Current Issues in Molec-ular Biology vol 18 no 1 pp 21ndash42 2016

[10] J L Oakes K L Bost and K J Piller ldquoStability of a soybeanseed-derived vaccine antigen following long-term storage pro-cessing and transport in the absence of a cold chainrdquo Journal ofthe Science of Food andAgriculture vol 89 no 13 pp 2191ndash21992009

[11] R Garg M Tolbert J L Oakes T E Clemente K L Bostand K J Piller ldquoChloroplast targeting of FanC the majorantigenic subunit of Escherichia coli K99 fimbriae in transgenicsoybeanrdquo Plant Cell Reports vol 26 no 7 pp 1011ndash1023 2007

[12] K J Piller T E Clemente S M Jun et al ldquoExpression andimmunogenicity of an Escherichia coli K99 fimbriae subunitantigen in soybeanrdquo Planta vol 222 no 1 pp 6ndash18 2005

[13] L C Hudson R Garg K L Bost and K J Piller ldquoSoybeanseeds a practical host for the production of functional subunitvaccinesrdquo BioMed Research International vol 2014 Article ID340804 13 pages 2014

[14] K L Bost and K J Piller ldquoProtein expression systems whysoybean seedsrdquo in SoybeanmdashMolecular Aspects of Breeding ASudaric Ed Intech Open 2011

[15] D Tilman K G Cassman P A Matson R Naylor and SPolasky ldquoAgricultural sustainability and intensive productionpracticesrdquo Nature vol 418 no 6898 pp 671ndash677 2002

[16] G N De La FuenteU K Frei and T Lubberstedt ldquoAcceleratingplant breedingrdquo Trends in Plant Science vol 18 no 12 pp 667ndash672 2013

[17] L C Hudson K C Lambirth K L Bost and K L PillerAdvancements in Transgenic Soy From Field to Bedside InTech2013

[18] J D Ray T C Kilen C A Abel and R L Paris ldquoSoybeannatural cross-pollination rates under field conditionsrdquo Environ-mental Biosafety Research vol 2 no 2 pp 133ndash138 2003

[19] K L Hart S L Kimura V Mushailov Z M BudimlijaM Prinz and E Wurmbach ldquoImproved eye- and skin-colorprediction based on 8 SNPsrdquo Croatian Medical Journal vol 54no 3 pp 248ndash256 2013

[20] S E Medland T Zayats B Glaser et al ldquoA variant inLIN28B is associatedwith 2D 4D finger-length ratio a putativeretrospective biomarker of prenatal testosterone exposurerdquoTheAmerican Journal of HumanGenetics vol 86 no 4 pp 519ndash5252010

[21] B S Shastry ldquoSNP alleles in human disease and evolutionrdquoJournal of Human Genetics vol 47 no 11 pp 561ndash566 2002

[22] Y-H Li S-C Zhao J-X Ma et al ldquoMolecular footprints ofdomestication and improvement in soybean revealed by wholegenome re-sequencingrdquo BMC Genomics vol 14 article 5792013

[23] T Joshi B Valliyodan J-H Wu S-H Lee D Xu and H TNguyen ldquoGenomic differences between cultivated soybean Gmax and its wild relative G sojardquo BMCGenomics vol 14 articleS5 2013

[24] K C Lambirth AMWhaley I C Blakley et al ldquoA comparisonof transgenic andwild type soybean seeds analysis of transcrip-tome profiles using RNA-Seqrdquo BMC Biotechnology vol 15 no1 article 89 2015

[25] H Li ldquoA statistical framework for SNP callingmutation discov-ery association mapping and population genetical parameterestimation from sequencing datardquo Bioinformatics vol 27 no 21pp 2987ndash2993 2011

[26] H Li B Handsaker A Wysoker et al ldquoThe sequence align-mentMap format and SAMtoolsrdquo Bioinformatics vol 25 no16 pp 2078ndash2079 2009

[27] D M Goodstein S Shu R Howson et al ldquoPhytozome acomparative platform for green plant genomicsrdquo Nucleic AcidsResearch vol 40 no 1 pp D1178ndashD1186 2012

[28] C Trapnell A Roberts L Goff et al ldquoDifferential gene andtranscript expression analysis of RNA-seq experiments withTopHat and Cufflinksrdquo Nature Protocols vol 7 no 3 pp 562ndash578 2012

[29] PDanecek A AutonG Abecasis et al ldquoThe variant call formatand VCFtoolsrdquo Bioinformatics vol 27 no 15 pp 2156ndash21582011

[30] M Krzywinski J Schein I Birol et al ldquoCircos an informationaesthetic for comparative genomicsrdquo Genome Research vol 19no 9 pp 1639ndash1645 2009

[31] P Cingolani A Platts L L Wang et al ldquoA program forannotating and predicting the effects of single nucleotidepolymorphisms SnpEff SNPs in the genome of Drosophilamelanogaster strain w 1118 iso-2 iso-3rdquo Fly vol 6 no 2 pp80ndash92 2012

[32] Z Du X Zhou Y Ling Z Zhang and Z Su ldquoagriGO A GOanalysis toolkit for the agricultural communityrdquo Nucleic AcidsResearch vol 38 no 2 Article ID gkq310 pp W64ndashW70 2010

[33] S AGoffM Vaughn SMcKay et al ldquoThe iPlant collaborativecyberinfrastructure for plant biologyrdquo Frontiers in Plant Sciencevol 2 article 34 2011

[34] Y L Zhu Q J Song D L Hyten et al ldquoSingle-nucleotidepolymorphisms in soybeanrdquo Genetics vol 163 no 3 pp 1123ndash1134 2003

[35] Y Shu Y Li Z Zhu et al ldquoSNPs discovery and CAPS markerconversion in soybeanrdquo Molecular Biology Reports vol 38 no3 pp 1841ndash1846 2011

[36] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[37] X Wu C Ren T Joshi T Vuong D Xu and H T NguyenldquoSNP discovery by high-throughput sequencing in soybeanrdquoBMC Genomics vol 11 article 469 2010

[38] Y-H Li J C Reif S A Jackson Y-S Ma R-Z Chang and L-J Qiu ldquoDetecting SNPs underlying domestication-related traitsin soybeanrdquo BMC Plant Biology vol 14 article 251 2014

12 International Journal of Plant Genomics

[39] L C Hudson K L Bost and K J Piller ldquoOptimizing recom-binant protein expression in soybeanrdquo in SoybeanmdashMolecularAspects of Breeding InTech Rijeka Croatia 2011

[40] W Goettel E Xia R Upchurch M-L Wang P Chen andY-Q C An ldquoIdentification and characterization of transcriptpolymorphisms in soybean lines varying in oil composition andcontentrdquo BMC Genomics vol 15 article 299 2014

[41] C Chan X Qi M-W Li F-L Wong and H-M Lam ldquoRecentdevelopments of genomic research in soybeanrdquo Journal ofGenetics and Genomics vol 39 no 7 pp 317ndash324 2012

[42] A A Komar ldquoSNPs silent but not invisiblerdquo Science vol 315no 5811 pp 466ndash467 2007

[43] P Katsonis A Koire S J Wilson et al ldquoSingle nucleotidevariations biological impact and theoretical interpretationrdquoProtein Science vol 23 no 12 pp 1650ndash1666 2014

[44] W J Haun D L Hyten W W Xu et al ldquoThe composition andorigins of genomic variation among individuals of the soybeanreference cultivar williams 82rdquo Plant Physiology vol 155 no 2pp 645ndash655 2011

[45] S DeBolt ldquoCopy number variation shapes genome diversityin Arabidopsis over immediate family generational scalesrdquoGenome Biology and Evolution vol 2 no 1 pp 441ndash453 2010

[46] A Zmienko A Samelak P Kozłowski and M FiglerowiczldquoCopy number polymorphism in plant genomesrdquo Theoreticaland Applied Genetics vol 127 no 1 pp 1ndash18 2014

[47] Y-G Lee N Jeong J H Kim et al ldquoDevelopment validationand genetic analysis of a large soybean SNP genotyping arrayrdquoPlant Journal vol 81 no 4 pp 625ndash636 2015

[48] B Valliyodan Q Dan G Patil et al ldquoLandscape of genomicdiversity and trait discovery in soybeanrdquo Scientific Reports vol6 article 23598 2016

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 12: Research Article Transcript Polymorphism Rates in …downloads.hindawi.com/journals/ijpg/2016/1562041.pdfResearch Article Transcript Polymorphism Rates in Soybean Seed Tissue Are Increased

12 International Journal of Plant Genomics

[39] L C Hudson K L Bost and K J Piller ldquoOptimizing recom-binant protein expression in soybeanrdquo in SoybeanmdashMolecularAspects of Breeding InTech Rijeka Croatia 2011

[40] W Goettel E Xia R Upchurch M-L Wang P Chen andY-Q C An ldquoIdentification and characterization of transcriptpolymorphisms in soybean lines varying in oil composition andcontentrdquo BMC Genomics vol 15 article 299 2014

[41] C Chan X Qi M-W Li F-L Wong and H-M Lam ldquoRecentdevelopments of genomic research in soybeanrdquo Journal ofGenetics and Genomics vol 39 no 7 pp 317ndash324 2012

[42] A A Komar ldquoSNPs silent but not invisiblerdquo Science vol 315no 5811 pp 466ndash467 2007

[43] P Katsonis A Koire S J Wilson et al ldquoSingle nucleotidevariations biological impact and theoretical interpretationrdquoProtein Science vol 23 no 12 pp 1650ndash1666 2014

[44] W J Haun D L Hyten W W Xu et al ldquoThe composition andorigins of genomic variation among individuals of the soybeanreference cultivar williams 82rdquo Plant Physiology vol 155 no 2pp 645ndash655 2011

[45] S DeBolt ldquoCopy number variation shapes genome diversityin Arabidopsis over immediate family generational scalesrdquoGenome Biology and Evolution vol 2 no 1 pp 441ndash453 2010

[46] A Zmienko A Samelak P Kozłowski and M FiglerowiczldquoCopy number polymorphism in plant genomesrdquo Theoreticaland Applied Genetics vol 127 no 1 pp 1ndash18 2014

[47] Y-G Lee N Jeong J H Kim et al ldquoDevelopment validationand genetic analysis of a large soybean SNP genotyping arrayrdquoPlant Journal vol 81 no 4 pp 625ndash636 2015

[48] B Valliyodan Q Dan G Patil et al ldquoLandscape of genomicdiversity and trait discovery in soybeanrdquo Scientific Reports vol6 article 23598 2016

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 13: Research Article Transcript Polymorphism Rates in …downloads.hindawi.com/journals/ijpg/2016/1562041.pdfResearch Article Transcript Polymorphism Rates in Soybean Seed Tissue Are Increased

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology