Computer Lab Day: Sample Analysis in QIIME

66
Computer Lab Day: Sample Analysis in QIIME Joslynn Lee [email protected]

Transcript of Computer Lab Day: Sample Analysis in QIIME

Computer Lab Day:Sample Analysis in QIIME

Joslynn [email protected]

Let’sgotoouragendatoday!

Pleasegotothewebsite:https://joslynnlee.github.io/2017-05-25-GSLI/

ChangesandmostlyonlineJ

WhatisCyVerse?

• CyberinfrastructurefortheUniverse• NSFFundedproject• Providestorageandcomputing• Large-scaleanalysisandbigdata• Educationalresources• Atmosphere:Cloud-Computing

Login

Atmosphere

ClickonProjectsTab

ClickonCreateNewProject

Addanameandinformation,clickcreate

ClickonDashboardTab

Clickonlaunchannewinstance

Under‘imagesearch’type“qiime”,under‘allimages’clickonQiime-1.9

ClickonLaunch

Usepull-downtoselectthenameofyourprojectyoujustcreated

Filloutthenecessaryitems,selectyourprojectname!

Instancewillbuildandtheresultwillshowlikethisone...

Openupyourterminal

Typein:ssh username@ipaddress

Openupyourterminal

Typein:ssh username@ipaddress

Ifallisconnected,youshouldseethis

Nowenter:ezj -3

Enteryourcyverse password

ThislaunchedourJupyter Notebook!

Copyandpastethewebaddressingooglechrome

ThislaunchedourJupyter Notebook!

Recapofsamplecollectionsites

Needtounderstandalittlemoreaboutthissoitallmakessense!

MetadataisessentialOrganizefilenames

WhystudythewaterwaysofNMandCO?

https://commons.wikimedia.org/wiki/File:Sanjuanrivermap.jpg

GKM

BoomtoBust:ResultedinAbandonment

• Abandonedmineshavebeenleakingtoxicacidmineralwasteintothestreams

• Federalandstateofficialsestimatethatabout1,500gallonsofacidminedrainageadayflowintotheriverfromtheseSilverton-areamines.

• GoldKingMinehadbeenleaking200gallonsofwasteaminutebeforetheEPAcrewbegandiggingtoinvestigatethesource.

https://www.abqjournal.com/941984/dont-blame-epa-for-gold-king-spill.html

EPAexplainswhathappened• TheAugust2015spillattheGoldKingMineinsouthwesternColorado

released3milliongallonsofwastewatertaintedwithiron,aluminum,manganese,lead,copperandothermetals.RiversinColorado,NewMexicoandUtahwerepolluted,withstretchesofwaterwayturninganeerieorange-yellow.

• Whileexcavatingabovetheoldadit,pressurizedwaterbeganleakingabovetheminetunnel,spillingaboutthreemilliongallonsofwaterstoredbehindthecollapsedmaterialintoCementCreek,atributaryoftheAnimasRiver.

• EPAtakesresponsibilityfortheGoldKingMinereleaseandiscommittedtocontinueworkinghand-in-handwiththeimpactedlocalgovernments,statesandtribes.

https://www.epa.gov/goldkingmine/follow-monitoring-data-gold-king-mine-incidenthttps://www.epa.gov/goldkingmine%20

YellowminewastewateratentrancetotheGKMonAug.5

Environmental Protection Agency via Reutershttps://www.washingtonpost.com/news/morning-mix/wp/2015/08/10/what-the-epa-was-doing-when-it-sent-yellow-sludge-spilling-into-a-colorado-creek/?utm_term=.6d8369e091d3

TimelineofwastehittingvarioustownsinCO/NM/AZ

August5

AztecAug7

FarmingtonAug8

Shiprock Aug10

LakePowellAug14

MapofSampleCollectionSites

Videoof

pumpin

action

Twodaysamplecollection,7sites

• Day 1o Gold King Mine, Silverton, COo Baker’s Bridge, Animas River, Hermosa, COo FLC Bathroom, Durango, CO

• Day 2o San Juan River (not affected), Bloomfield, NMo San Juan River, Upper Fruitland, NMo San Juan River (canal), Nenahnezad, NMo San Juan River, Hogback, NMo Skate Park, Animas River, Durango, CO

Firstsite:GoldKingMine(GKM)SitenorthofSilverton

Firstsite:GoldKingMine(GKM)SiteTreatmentcenter

March2017 May2017

Firstsite:GoldKingMine(GKM)SiteTreatmentcenter

Firstsite:GoldKingMine(GKM)SiteSamplinginsidetreatmentcenter

Pre-treatedwater Filterafterrunningpre-treatedwater

Sedimentsamplecollectionofsludge

GoldKingMinewatertreatmentplant,pretreatmentwaterpipe,estimatedpH5.8GoldKingMinewatertreatmentplant,posttreatment[flocculant Drewfloc 2499addedforwatertreatment]

Firstsite:GoldKingMine(GKM)SiteArialview

May2017

Firstsite:GoldKingMine(GKM)SiteLookingtowardsthemine

March2017 May2017

Firstsite:GoldKingMine(GKM)SiteOldMineArea

March2017 May2017

Secondsite:Baker’sBridge(BB)sitenorthofDurango

Secondsite:Baker’sBridge(BB)sitenorthofDurango

March2017 May2017

Secondsite:Baker’sBridge(BB)sitenorthofDurango

August2015 May2017

Secondsite:Baker’sBridge(BB)sitenorthofDurango

Outdoorlab:pumpset-up Dr.Lowellwasoursedimentcollector.

Secondsite:Baker’sBridge(BB)sitenorthofDurango

March2017 May2017

Lower water level earlier this year

Thirdsite:FortLewisCollegeWomen’sBathroom

Fourthsite:Bloomfield(BLM)NMandSanJuanRiver(beforeAnimasRiver)

Fourthsite:Bloomfield(BLM)NMandSanJuanRiver(beforeAnimasRiver)

Publicsite:RiverwalknearbridgeRiverwaterappearedturbid;clogged2filters;locationisSanJuanRiverbeforejoiningofAnimasRiver

Fourthsite:UpperFruitland(UF)beginningofNavajoNation

Fourthsite:UpperFruitland(UF)beginningofNavajoNation

AfterAnimasRiverjoinsSanJuan,NNcanalsystembegins

CanalstartinUpperFruitlandChapter

Fifthsite:Nenahnezad (NZ)canalonNavajoNation

Fifthsite:Nenahnezad (NZ)canalonNavajoNation

Canallocation:DivertwaterupstreamforfarmsandreturnstoSanJuanRiver

SedimentcollectionatdroppointbyTaraandJacob

Fifthsite:Nenahnezad (NZ)canalonNavajoNation

Truck Lab: Brandon demonstrating the processWater was easier to

collect at drop point

Sixthsite:Hogback(HB)riverlocation

Sixthsite:Hogback(HB)riverlocation

Steadyflowoftheriverbeforemorecanals Cloudywater

Seventhsite:AnimasRiverDurangolocation

Seventhsite:AnimasRiverDurango(DRO)location

Rocksinthearea

Waterneartheskatepark

Murkyriverwaterbeforebridge

Allfinishedcollectingsamples,wentfast!

DNAExtractionatDNALC,Sendoffforsequencing!

Powersoil tubes(smallerthanPowerWater)extractDNA

Firstcleaning

MicrobiomeAnalysis

Yongwook Choi(JCVI)

Introducestudentstometadata• Metadataisstructuredinformationthatdescribes,explains,locates,orotherwisemakesiteasiertoretrieve,use,ormanageaninformationresource.Metadataisoftencalleddataaboutdataorinformationaboutinformation.

• Samplecollection:o GPSlocationo Temperatureo Windo pHo controls

**Iwillprepareasheetwithinformation

UnderstandingMetadata,NationalInformationStandardsOrganization

INPUT:Sequencereadformat• Knowwhatformyoursequencingreadscomein• A.fastq formatstoresbothanucleotidesequenceanditscorrespondingqualityscores.

• A.fastq filenormallyusesfourlinespersequence:

@SRR2146911.11length=78ACGAGTGCGTTTAGATAACCTGGTAGCTAGCTCAGTACGAGACTGCCAAGGAAGTCGTAACAAGGTAACTAGCTCAGT+SRR2146911.11length=78IIIIIIIHD666?IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHCCCCIIIIIIIIIIIIIIIIIIIIIIIIIII@SRR2146911.22length=66ACGAGTGCGTATTAGATACCCGGTAGCTAGCTCAGTACTAAGTCGTAACAAGGTACCTAGCTCAGT+SRR2146911.22length=66FFFHHHHHHIEBBBDE<5/////50/17<<<?AAAD??<99;<?;8445////76;<<:;;[email protected]=55ACGAGTGCGTATTAGATACCCAGGTAGGAAGTCGTAACAAGGTACCTAGCTCAGT+SRR2146911.33length=55IIIIIIIIIIFFDBGHD:///7449<////<CD?>>571133CFAAB>AAADFDG

INPUT:Sequencereadformat

Line1beginswitha'@'characterandisfollowedbyasequenceidentifierandanoptionaldescription(likeaFASTAtitleline)Line2istherawsequencelettersLine3beginswitha'+'characterandisoptionallyfollowedbythesamesequenceidentifier(andanydescription)again.Line4encodesthequalityvaluesforthesequenceinLine2,andmustcontainthesamenumberofsymbolsaslettersinthesequence.

@SRR2146911.11length=78ACGAGTGCGTTTAGATAACCTGGTAGCTAGCTCAGTACGAGACTGCCAAGGAAGTCGTAACAAGGTAACTAGCTCAGT+SRR2146911.11length=78IIIIIIIHD666?IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHCCCCIIIIIIIIIIIIIIIIIIIIIIIIIII@SRR2146911.22length=66ACGAGTGCGTATTAGATACCCGGTAGCTAGCTCAGTACTAAGTCGTAACAAGGTACCTAGCTCAGT+SRR2146911.22length=66FFFHHHHHHIEBBBDE<5/////50/17<<<?AAAD??<99;<?;8445////76;<<:;;[email protected]=55ACGAGTGCGTATTAGATACCCAGGTAGGAAGTCGTAACAAGGTACCTAGCTCAGT+SRR2146911.33length=55IIIIIIIIIIFFDBGHD:///7449<////<CD?>>571133CFAAB>AAADFDG

16SrRNA AnalysisPlatforms

• Bioinformaticstools,manyoptions• QIIME– (canonicallypronouncedchime)standsforQuantitativeInsightsIntoMicrobialEcology

• open-sourcebioinformaticspipelineforperformingmicrobiome analysisfromrawDNAsequencingdata

• Writteninpython

Whatarethesteps?

Porazinska,D&Xu,Z.

Usingaknowndatabasetoclustersequencesintooperationaltaxonomic

units(OTUs)

Porazinska,D&Xu,Z.

16SrRNA genedatabasestoclassifysequences

Greengenes offersannotated,chimera-checked,full-length16SrRNA genesequencesinstandardalignmentformats.

http://aem.asm.org/content/72/7/5069.full

OUTPUT:OTUbiom table

TeachingtoolwithJupyter Notebooks

Thenotebookextendstheconsole-basedapproachtointeractivecomputinginaqualitativelynewdirection,providingaweb-basedapplicationsuitableforcapturingthewholecomputationprocess:developing,documenting,andexecutingcode,aswellascommunicatingtheresults.

http://nbviewer.jupyter.org/github/biocore/qiime/blob/1.9.1/examples/ipynb/illumina_overview_tutorial.ipynb