Download - Sequence Conservation/Variation Analysis · Sequence Conservation/Variation Analysis •Analyze sequence polymorphism at the nucleotide or amino acid level. •Calculate concensus

On the ViPR homepage, choose a virus family or a Featured Virus to start. 1. Mouse-over the “Analyze & Visualize” tab and click

“Analyze Sequence Variation (SNP)”. 2. On the SNP landing page, use one of the three

options to input sequences: 2.1 Upload a sequence file in FASTA format OR 2.2 Paste sequences in FASTA format OR 2.3 Use a working set from your Workbench.

Then click “Run” to run the analysis. 3. As soon as the analysis is finished, a report similar to the

above sample report will be displayed on the screen.

ViPR is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between Northrop Grumman Health IT, J. Craig Venter Institute and Vecna Technologies. Comments, questions, suggestions? Contact us at [email protected]

http://www.viprbrc.org/

Freely available Integrated datasets Bioinformatics tool suite

Sequence Conservation/Variation Analysis

• Analyze sequence polymorphism at the nucleotide or amino acid level. • Calculate concensus sequence and polymorphism of ViPR sequences or your own

Sequence Variation Analysis Sample Report

The analysis report page shows the polymorphism score, consensus, and counts for each different base/amino acid at each position.

Consensus sequence and raw alignment are available for download.

At each position, the consensus is the allele with

frequency greater than 50%. If no allele exceeds 50%, N (for nucleotide) or Xaa (for

amino acid) is used to indicate ambiguity.

Download consensus sequence

in FASTA format

Download raw alignment of all

sequences

Score ranges from 0 (no

polymorphism) to 232 (highest

polymorphism).

Count for different

nucleotides at each position

Save SNP result to Workbench for future retrieval

Cite ViPR Tutorials Report a Bug Request Web Training Contact Us Release Date: Sep 7, 2012

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library, Wellcome Images, U.S. Department of Veterans Affairs ,Science of the Invisible and ViralZone, Swiss Institute of Bioinformatics.

Upload a file containing my sequences in FASTA format.

Paste sequences in FASTA format.

Use working sets

INPUT SEQUENCES

Sequences can also be selected from search results or a working set in your workbench.

File Path:Browse…

The minimum number of sequences is 2.

RunClear

For polymorphism calculation MUSCLE is used for multiple sequence alignment. A consensus sequence is created by "majority rule". At each position, the consensus is theallele with frequency greater than 50%, regardless of coverage. If no allele exceeds 50%, N (for nucleotide) or Xaa (for amino acid) indicates ambiguity. Sequences in thealignment are then compared to the consensus to identify polymorphisms.

To score polymorphism at each position, a formula modified from the one cited in Crooks et al. is used.S = -100 * Sum (Pi * logPi) where Pi is the frequency of the ith allele

The score is the normalized entropy of the observed allele distribution. For nucleotides, scores can range from 0 (no polymorphism) to 232 (4 alleles and an indel, 20%frequency each).

Analyze Sequence Variation (SNP) Home Analyze Sequence Variation

DengueSEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA VIRUS FAMILIES HOME

About Us Community Announcements Links Resources Support Workbench Sign InOption 1: Calculate consensus sequence and sequence variation of your own sequences 2

2.1

Three options to input sequences

Cite ViPR Tutorials Report a Bug Request Web Training Contact Us Release Date: Jul 18, 2011

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between NorthropGrumman Health IT, University of Texas Southwestern Medical Center and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library, Wellcome Images, U.S.Department of Veterans Affairs , Science of the Invisible and ViralZone, Swiss Institute of Bioinformatics.

Upload a file containing my sequences in FASTA or Phylipformat.

Paste sequence in FASTA or Phylip format.Defline in your FASTA file will be used to label the display

Use working set.

Choose a Working Set

TREE GENERATION

Quick Tree Custom Tree (I want to set my own parameters)

SEQUENCE TYPE *

Nucleotide Amino Acid (Protein)

SOURCE OF SEQUENCES TO BE ANALYZED *Sequences can also be selected from search results or a working set in yourworkbench.

LABELINGDefline in your FASTA file will be used to label the display

FORMAT OF SEQUENCES PROVIDED *

Unaligned FASTA Aligned FASTA Phylip (interleaved)

Build TreeClear

Generate Phylogenetic Tree The "Quick Tree" option uses the FastME [ ]. This algorithm uses a fast, distance-basedapproach and is good for generating trees from datasets containing 1) more than 1,000 sequences of short or medium length sequences, 2) more than 100 very long sequences,or 3) to reconstruct a "quick and dirty" tree.The "Custom Tree" option incorporates PhyML [ ] to infer a more evolutionarily-accurate phylogenetic topologyby applying a substitution model to the nucleotide sequences. This algorithm is best applied to datasets containing 1) fewer than 100 very long sequences, 2) between 100 and1,000 small or medium length sequences. (Note: An asterisk (*

Home Generate Phylogenetic Tree

FlaviviridaeSEARCH DATA

About Us Announcements Links Resources Support Sign Out

You are logged in as [email protected]

Cancel Select

Choose Working Set

Name Type Number ofSequences

Date

Dengue2_genome_human-1999-2000

Genome 32 08/05/2011 3:37PM

DENV1-4_99-00_human_Genomes Genome 82 06/24/2011 10:43AM

hepatitis c Genome 1 03/29/2011 11:10AM

Virus Pathogen Database and Analysis Resource (ViPR) - Flavivir... http://www.viprbrc.org/brc/tree.do?method=ShowCleanInputPage...

1 of 1 8/11/11 12:28 PM

2.3



Upload a file containing my sequences in FASTA format.

Paste sequence in FASTA format.

Use working set.

INPUT SEQUENCESSequences can also be selected from search results or a working set in yourworkbench. HTML

SELECT OUTPUT FORMAT

Aligned

SELECT OUTPUTORDER

RunClear

Align Sequences (MSA) ViPR uses the MUSCLE (Multiple Sequence Comparison by Log-Expectation) algorithm to align the sequences you select from a search result or a working set on yourworkbench or that you provide in an uploaded file.

Home Align Sequences (MSA)

DengueSEARCH DATA ANALYZE & VISUALIZE WORKBENCH VIRUS FAMILIES HOME

ANALYZE & VISUALIZE

Identify Similar Sequences (BLAST)

Align Sequences (MSA)

Visualize Aligned Sequences

Identify Short Peptides in Proteins

Genome Annotator (GATU)

Analyze Sequence Variation (SNP)

Metadata Sequence Analysis

Generate Phylogenetic Tree

HISTORY

Retrieve an Analysis

Retrieve a Download

Your Analysis History



Use MUSCLE to align nucleotide or amino acid sequences.

Virus Pathogen Database and Analysis Resource (ViPR) - Flavivir... http://www.viprbrc.org/brc/msa.do?method=ShowCleanInputPag...

1 of 1 10/13/11 3:15 PM

1

Cite ViPR Tutorials Report a Bug Request Web Training Contact Us Release Date: Jul 18, 2011


Upload a file containing my sequences in FASTA or Phylipformat.

Paste sequence in FASTA or Phylip format.Defline in your FASTA file will be used to label the display

Use working set.

TREE GENERATION

Quick Tree Custom Tree (I want to set my own parameters)

SEQUENCE TYPE *

Nucleotide Amino Acid (Protein)

SOURCE OF SEQUENCES TO BE ANALYZED *Sequences can also be selected from search results or a working set in yourworkbench.

>gb:FJ850072|Organism:Dengue virus DENV-2/BR/BID-V2376/2000|Subtype:2|Host:HumanACAAAGACAGATTCTTTGAGGGAGCTAAGCTCAACGTAGTTCTAACAGTTTTTTGATTAGAGAGCAGATCTCTGATGAATAACCAACGAAAAAAGGCGAGAAGTACGCCTTTCAATATGCTGAAACGCGAGAGAAACCGCGTGTCAACTGTGCAACAGCTGACAAAGAGATTCTCA

LABELINGDefline in your FASTA file will be used to label the display

FORMAT OF SEQUENCES PROVIDED *

Unaligned FASTA Aligned FASTA Phylip (interleaved)

TREE ALGORITHM

PHYML RAXML

Run ProtTest for recommendation of evolutionary modelthat best fits my data

I know which evolutionary model I want to use.

HKY

Proportion Invariant 0.0

Number of categories 1 Integer from 1 to 20 or use default value

Shape parameter 1.0 Positive real value or use default value

Outgroup (optional)

PARAMETERS FOR PHYLOGENETIC ANALYSIS

EVOLUTIONARY MODEL (Substitution DNA)

All models are optimized using the maximum likelihood criterion. We have set defaultvalues for 1-7 parameters for each model.

Real value between 0.00 and 1.00 or use default value. The models implementedhere allow you to specify that a proportion of the sites never vary.

GAMMA RATE VARIATION

Specify the defline of the uploaded/pasted sequences to build the tree

Build TreeClear

Generate Phylogenetic Tree The "Quick Tree" option uses the FastME [ Desper, R., Gascuel, O. (2002) Journal of Computational Biology 19(5), pp. 687-705. ]. This algorithm uses a fast, distance-basedapproach and is good for generating trees from datasets containing 1) more than 1,000 sequences of short or medium length sequences, 2) more than 100 very long sequences,or 3) to reconstruct a "quick and dirty" tree.The "Custom Tree" option incorporates PhyML [ Guindon, S. and Gascuel, O., (2003) Syst Biol. 52: 696-704 ] to infer a more evolutionarily-accurate phylogenetic topologyby applying a substitution model to the nucleotide sequences. This algorithm is best applied to datasets containing 1) fewer than 100 very long sequences, 2) between 100 and1,000 small or medium length sequences. (SOP)Note: An asterisk (*) = required field

Home Generate Phylogenetic Tree

FlaviviridaeSEARCH DATA ANALYZE & VISUALIZE WORKBENCH VIRUS FAMILIES HOME



Virus Pathogen Database and Analysis Resource (ViPR) - Flavivir... http://www.viprbrc.org/brc/tree.do?method=ShowCleanInputPage...

1 of 1 8/11/11 12:46 PM

2.2

2

http://www.viprbrc.org/

Freely available Integrated datasets Bioinformatics tool suite

ViPR is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between Northrop Grumman Health IT, J. Craig Venter Institute and Vecna Technologies. Comments, questions, suggestions? Contact us at [email protected]

Start to type strain to get suggestions Jump

Deselect All

Species: Dengue virus Select All(0/3097 strains selected) (7 Types - 3066 complete genomes)

Type: Dengue virus type 1 Select All(0/1288 strains selected) (1288 Strains - 1289 complete genomes)

Type: Dengue virus type 2 Select All(0/943 strains selected) (943 Strains - 929 complete genomes)

Type: Dengue virus type 2Thailand/16681/84

Select All(0/1 strains selected)

(1 Strain - 2 complete genomes)

T D i t 3 S l t All

SELECT VIRUS(ES) TO INCLUDE IN SEARCHExclude partially sequenced genomes

Jump to strain in taxonomy: Start: YYYY

End: YYYY

COLLECTIONYEAR

To add month tosearch, seeAdvance SearchOptions: MonthRange

SAMPLE LOCATIONAllAmerican SamoaAnguillaAustraliaBangladeshBelizeBrazilBritish Virgin IslandsBruneiBurkina FasoCambodiaChileChinaColombiaComorosCook IslandsCote D'IvoireCote d'IvoireC b

HOST SELECTIONAllHumanMosquitoMouseUnknown

Host Gender All Male Female

From

To

Host Age Range

PrimaryProbable primaryProbable secondarySecondary

Immune Status

HOST ATTRIBUTES

cell supernatantserumplasma

Sample Source

NoneC6/36 0C6/36 18

Passage History

SAMPLE ATTRIBUTES

DEN1DEN4DEN3DEN2

Virus Type

deathnot knownrecovery

Disease Course

VIRUS ATTRIBUTES

Submission Site Authors

* Use comma to separate multiple entries.Ex: McElroy, Jorge

Sample Collection Authors

* Use comma to separate multiple entries.Ex: Comach, Jarman

DENFRAMEPDVI NIsm-DVEICAKP

Cohort (Study) Population

ISOLATION EVENT

Genome Search Search for virus genomic sequences and related information. You can search for the whole virus family or search for specified genus, species etc. You can also find your strain orgenome record if you have its information, such as strain name, accession.

Home Genome Search




Virus Pathogen Database and Analysis Resource (ViPR) - Flavivir... http://www.viprbrc.org/brc/vipr_genome_search.do?method=Sho...

1 of 2 8/11/11 11:41 AM

Option 2. Calculate consensus sequence and sequence variation of ViPR sequences

1 2

Your search returned 32 genomes. Search Criteria Displaying 50 per pageDisplay Settings

Add to Working Set Save Search Download

Add to Working Set Save Search Download

Genome Search Result

Your Selected Items: 32 items selected | Deselect All

Select all 32 genomes

Strain Name Species Name Sequence Length Date Host GenBank Host Country Mol Type

DENV-2/BR/BID-V2376/2000 Dengue virus 2 10677 2000 Human Homo sapiens Brazil genomic RNA

DENV-2/CO/BID-V3369/1999 Dengue virus 2 10625 1999 Human Homo sapiens Colombia genomic RNA

DENV-2/NI/BID-V2344/2000 Dengue virus 2 FJ850060 10729 2000 Human Homo sapiens Nicaragua genomic RNA














DENV-2/NI/BID-V2683/1999 Dengue virus 2 GQ199895 10678 1999 Human Homo sapiens Nicaragua genomic RNA



DENV-2/US/BID-V1048/1999 Dengue virus 2 EU482557 10639 1999 Human Homo sapiens USA genomic RNA











DENV-2/VE/BID-V2942/2000 Dengue virus 2 FJ898466 10679 2000 Human Homo sapiens Venezuela genomic RNA

DF404 Dengue virus 2 FM210217 10685 1999 Human Homo sapiens Viet Nam genomic RNA

Your Selected Items: 32 items selected

Top

Run Analysis �

Run Analysis �

Home Genome Search Results

Identify Similar Sequences (BLAST)

Analyze Sequence Variation (SNP)

Align Sequences (MSA)

Metadata Genome Compare

Generate Phylogenetic Tree




Virus Pathogen Database and Analysis Resource (ViPR) - Flavivir... http://www.viprbrc.org/brc/vipr_genome_search.do

1 of 2 8/11/11 11:49 AM

3

Select sequences and add them to a working set for

future analysis. You’ll need to register for a Workbench account to use this feature.

• Select display fields • Custom-sort records

Click to view details of

the record



50 records were previously selected from search results

INPUT SEQUENCES

RunClear

For polymorphism calculation MUSCLE is used for multiple sequence alignment. A consensus sequence is created by "majority rule". At each position, the consensus is theallele with frequency greater than 50%, regardless of coverage. If no allele exceeds 50%, N (for nucleotide) or Xaa (for amino acid) indicates ambiguity. Sequences in thealignment are then compared to the consensus to identify polymorphisms.

To score polymorphism at each position, a formula modified from the one cited in Crooks et al. is used.S = -100 * Sum (Pi * logPi) where Pi is the frequency of the ith allele

The score is the normalized entropy of the observed allele distribution. For nucleotides, scores can range from 0 (no polymorphism) to 232 (4 alleles and an indel, 20%frequency each).

Analyze Sequence Variation (SNP) Home Genome Search Results Analyze Sequence Variation


About Us Community Announcements Links Resources Support Workbench Sign In

4



Save to Workbench

Request Notification

Processing...Data is still processing. Results will be shown when ready.

TICKET NUMBERIf you do not want to wait for the results, use your ticket number ( SA_925272578970 ) to come back to the Retrieve Results by Ticket Number page at a later timeand retrieve your results.

SAVE ANALYSIS TO WORKBENCHEnter the name you want to use and click Save to Workbench if you want to save the analysis when the results are ready.

NOTIFICATION OF COMPLETIONEnter your email and click Request Notification if you want to receive a notification when the results are ready.

Home Gene/Protein... Results Analyze Sequence Variation Processing...


About Us Community Announcements Links Resources Support Workbench Sign In

6

5

4. A “Select Sequence Type” lightbox will pop up. Select the appropriate sequence type and click “Continue”.

5. On the next page, you will see a brief description of the SNP tool. Click “Run” to proceed.

6. If you have a large number of long sequences to analyze, it may take a few minutes to run. While the analysis is running, you can choose to save it (upon completion) to your Workbench by entering a name for the analysis and then clicking the “Save to Workbench” button. Then you can move to other parts of the ViPR site, and retrieve the SNP analysis result later from your Workbench.

7. As soon as the analysis is finished, a report similar to the sample report on the reverse page will be displayed on the screen.

On the ViPR homepage, choose a virus family or a Featured Virus to start.

1. Search for nucleotide or protein sequences in ViPR by using the “Genomes” or “Genes & Proteins” search option available from the “Search Data” tab. For this example, we will use genome sequences.

2. Select search criteria on the Genome Search page and click the “Search” button to run your query.

3. On the search results page, select the desired sequences by clicking the checkboxes, mouse-over the yellow “Run Analysis” button and click “Analyze Sequence Variation (SNP)”. If you want to include sequences that are not in this search result or to use the sequences to do further analysis, select the desired sequences and click “Add to Working Set”. Then add other sequences to the same working set later by repeating the process. Click the “Workbench” tab and find the working set you saved. Click next to it to view the details of the working set. Then mouse-over the yellow “Run Analysis” button and click “Analyze Sequence Variation (SNP)”.