Influenza Research Database (IRD) November 2017 New Features … · 2020-04-20 · • December...

1
Visualizing Sequence Variations and Bootstrap Values on Trees The IRD/ViPR team continues to provide new capabilities for visualizing your phylogenetic trees using the newly developed Archaeop- teryx.js tree viewer. If you create a tree using sequences from the IRD/ViPR database, you can color-code tree nodes based on metadata characteristics associated with each sequence (e.g. host species, country, geographic region, isolation year, virus subtype, etc.). To sup- port more advanced comparative genomics analyses, we now provide the following new features: • Color-code tree nodes and/or labels by the sequence variation(s) observed at a specified position. To use this new feature, after you computed a phylogenetic tree, click View Tree to visualize the tree, and then use the MSA Residue Pos. option in the Visualizations panel to select the position to visualize (either via the slider, or by directly entering the position in the text box). • Display bootstrap values from RAxML trees with bootstrap values by clicking Confidence in the left panel. Host Factor Data Models To facilitate host factor data analysis, we added the ability to support data models in the September 2017 release. In this release, host factor data models has been computed for 24 microarray datasets and 3 RNAseq datasets in IRD. To recap, these data models are designed to help users identify modules of co-regulated genes that can be invisible at individual gene level and allow guilt-by-association predictions. They are constructed by the following key steps: Use WGCNA to cluster genes into modules based on their common gene expression profile across samples. (N.B. WGCNA is a popular method of identifying genes correlated in expression and then construct gene co-expression networks). • Correlate WGCNA clusters with metadata, such as virus, MOI and time. Visualize data matrices in heat maps. Data Models for Host Factor Experiment IRD_SV_ICL004-R (https://www.fludb.org > Search Data > Host Factor Experiments > ICL004-R). Data Model 1 is a heat map showing relationships between the modules identified by WGCNA. A module is a set of co-expressed genes. Each module is represented by a distinct color. Click on a color in the list of modules to obtain the gene list and connectivity within the module. Data Model 2 is a heat map showing the strength of correlation (Pearson’s R) and the significance of correlation (p-value) between modules and metadata values. The heat map is colored based on the p-values. The legend on the right shows the p-value color code; the p-value and correlation strength are also represented by numbers separated by a comma in each cell. Click on a cell in the heat map to obtain the gene list and gene significance values for the corresponding module-metadata relationship. Click on a module to view all host factors within The November 2017 release of IRD is now available, visit www.fludb.org Outreach Events December 5-8, 2017: Bioinformatics Workshop on Applications of Genomics & Bioinformatics to Infectious Diseases, Lyon, France The GABRIEL Network and J. Craig Venter Institute co-organized this workshop. Twelve attendees from seven countries (Bangladesh, Brazil, Cambodia, France, India, Laos and Paraguay) attended the workshop. The IRD/ViPR team taught next generation sequencing technologies, sequences and sequence annotations in public databases, evolutionary analysis and comparative genomics analysis. An H7N9 HA nucleotide tree calculated by RAxML. Tree nodes are color-coded based on nucleotide variations at the specified aligned position (704). The majority of the sequences carry T at this posi- tion (gray) which gives rise to the human adaptive substitution of 235L (226L in H3 numbering), while a few sequences carry A (red) or C (blue). Bootstrap values are displayed on the tree. MOI_0 MOI_1 REPLICATE1 REPLICATE2 REPLICATE3 TIMEPOINT0 TIMEPOINT12 TIMEPOINT18 TIMEPOINT24 TIMEPOINT3 TIMEPOINT7 green turquoise lightcyan red brown grey dnightblue cyan blue black tan magenta salmon eenyellow yellow purple pink 1 2 3 4 5 6 Metadata Categories Module 3.5582 (-0.5709) 3.5582 (0.5709) 0.0159 (0.0078) 0.2259 (0.0918) 0.2493 (-0.0996) 0.7895 (-0.2379) 0.6211 (0.2012) 0.2475 (0.099) 0.1533 (0.0659) 1.6513 (-0.3798) 0.8576 (0.2516) 4.1278 (-0.6114) 4.1278 (0.6114) 0.0182 (0.0089) 0.056 (-0.0263) 0.0364 (0.0174) 1.2351 (-0.3187) 0.1908 (0.0796) 1.0394 (0.2856) 1.3932 (0.3432) 0.8273 (-0.2456) 0.3962 (-0.1441) 0.3793 (-0.1394) 0.3793 (0.1394) 0.2565 (0.102) 0.7891 (-0.2378) 0.367 (0.1358) 0.2885 (-0.1122) 0.6699 (-0.2123) 0.1618 (0.0691) 0.8038 (0.2408) 0.1335 (0.0583) 0.0969 (-0.0437) 6.3206 (-0.7283) 6.3206 (0.7283) 0.0121 (0.006) 0.0302 (0.0146) 0.0432 (-0.0205) 0.1197 (-0.0529) 0.881 (0.2562) 0.0424 (0.0202) 1.1105 (-0.298) 0.0952 (-0.0431) 0.3059 (0.1176) 4.4283 (0.6308) 4.4283 (-0.6308) 0.0678 (-0.0315) 0.0803 (0.0368) 0.0109 (-0.0054) 0.072 (-0.0333) 0.4948 (0.1705) 0.0501 (0.0237) 0.0509 (-0.024) 0.4721 (-0.1647) 0.0593 (0.0278) 0.3585 (0.1334) 0.3585 (-0.1334) 0.3409 (-0.1282) 0.0562 (-0.0264) 0.4342 (0.1546) 3.2128 (-0.5435) 0.4445 (0.1574) 0.3762 (-0.1385) 0.1125 (-0.0501) 0.889 (0.2577) 1.2247 (0.317) 0.7647 (-0.2328) 0.7647 (0.2328) 0.4886 (0.1689) 0.4236 (-0.1517) 0.0359 (-0.0172) 0.0624 (0.0291) 0.8079 (-0.2417) 0.4405 (-0.1563) 0.6144 (-0.1997) 0.8109 (0.2423) 1.2828 (0.3262) 0.6729 (0.213) 0.6729 (-0.213) 0.0449 (-0.0213) 0.0595 (0.0279) 0.0133 (-0.0066) 1.5044 (-0.3595) 0.5633 (0.1876) 1.625 (0.3763) 3.1224 (0.536) 3.3263 (-0.5528) 0.5631 (-0.1875) 1.7162 (0.3884) 1.7162 (-0.3884) 0.0025 (0.0013) 0.1357 (0.0592) 0.139 (-0.0605) 0.9459 (0.2686) 0.0696 (0.0322) 1.1399 (-0.303) 2.5304 (-0.4815) 0.4469 (0.158) 1.2791 (0.3257) 2.0473 (-0.4294) 2.0473 (0.4294) 0.0911 (0.0414) 0.1783 (-0.0751) 0.0731 (0.0338) 0.3123 (0.1196) 1.9774 (0.4211) 0.2625 (0.1039) 1.3269 (-0.3331) 0.0513 (-0.0242) 1.049 (-0.2873) 0.2318 (-0.0938) 0.2318 (0.0938) 0.0656 (-0.0305) 0.1165 (0.0517) 0.0445 (-0.0211) 0.0177 (-0.0086) 2.5789 (-0.4863) 0.1775 (0.0748) 4.2733 (0.6209) 0.0114 (-0.0056) 0.5954 (-0.1952) 4.1568 (-0.6133) 4.1568 (0.6133) 0.3489 (0.1306) 0.2922 (-0.1134) 0.0359 (-0.0172) 0.7427 (-0.2281) 0.3898 (-0.1423) 0.1621 (0.0692) 0.2779 (0.1089) 0.0316 (0.0152) 0.5212 (0.1772) 2.2437 (-0.4515) 2.2437 (0.4515) 0.007 (0.0034) 0.1883 (0.0787) 0.198 (-0.0821) 0.3165 (0.1209) 0.1334 (0.0583) 0.9954 (-0.2777) 2.9892 (-0.5245) 0.3615 (0.1343) 2.6034 (0.4887) 0.1545 (0.0663) 0.1545 (-0.0663) 0.0051 (0.0025) 0.238 (0.0959) 0.2457 (-0.0984) 0.0615 (-0.0287) 0.81 (0.2421) 0.1416 (-0.0615) 1.0053 (-0.2795) 0.7416 (-0.2279) 1.4767 (0.3555) 0.1634 (0.0696) 0.1634 (-0.0696) 0.0357 (0.0171) 0.1189 (-0.0526) 0.0772 (0.0355) 1.6188 (0.3754) 1.1838 (0.3103) 0.695 (-0.2178) 5.5467 (-0.6928) 1.0714 (0.2912) 0.1546 (-0.0664) 0.2422 (-0.0973) 0.2422 (0.0973) 0.1106 (-0.0493) 0.1177 (0.0522) 0.0057 (-0.0028) 1.6638 (-0.3815) 2.9464 (0.5207) 1.018 (0.2818) 0.2389 (0.0962) 2.358 (-0.4638) 0.1206 (-0.0533) 2.7079 (0.4988) 2.7079 (-0.4988) 0.0688 (-0.0319) 0.1373 (0.0598) 0.0596 (-0.0279) 0.9442 (0.2682) 3.0475 (-0.5295) 1.0002 (-0.2785) 0.744 (0.2284) 0.9068 (0.2612) 0.113 (0.0503) New Features in IRD Influenza Research Database (IRD) November 2017 Questions? Problems? Suggestions? Click Here

Transcript of Influenza Research Database (IRD) November 2017 New Features … · 2020-04-20 · • December...

Page 1: Influenza Research Database (IRD) November 2017 New Features … · 2020-04-20 · • December 5-8, 2017: Bioinformatics Workshop on Applications of Genomics & Bioinformatics to

Visualizing Sequence Variations and Bootstrap Values on TreesThe IRD/ViPR team continues to provide new capabilities for visualizing your phylogenetic trees using the newly developed Archaeop-teryx.js tree viewer. If you create a tree using sequences from the IRD/ViPR database, you can color-code tree nodes based on metadata characteristics associated with each sequence (e.g. host species, country, geographic region, isolation year, virus subtype, etc.). To sup-port more advanced comparative genomics analyses, we now provide the following new features:

• Color-code tree nodes and/or labels by the sequence variation(s) observed at a specified position. To use this new feature, after you computed a phylogenetic tree, click View Tree to visualize the tree, and then use the MSA Residue Pos. option in the Visualizations panel to select the position to visualize (either via the slider, or by directly entering the position in the text box).

• Display bootstrap values from RAxML trees with bootstrap values by clicking Confidence in the left panel.

Host Factor Data Models

To facilitate host factor data analysis, we added the ability to support data models in the September 2017 release. In this release, host factor data models has been computed for 24 microarray datasets and 3 RNAseq datasets in IRD.

To recap, these data models are designed to help users identify modules of co-regulated genes that can be invisible at individual gene level and allow guilt-by-association predictions. They are constructed by the following key steps:

• Use WGCNA to cluster genes into modules based on their common gene expression profile across samples. (N.B. WGCNA is a popular method of identifying genes correlated in expression and then construct gene co-expression networks).

• Correlate WGCNA clusters with metadata, such as virus, MOI and time.

• Visualize data matrices in heat maps.

Data Models for Host Factor Experiment IRD_SV_ICL004-R (https://www.fludb.org > Search Data > Host Factor Experiments > ICL004-R).

Data Model 1 is a heat map showing relationships between the modules identified by WGCNA. A module is a set of co-expressed genes. Each module is represented by a distinct color. Click on a color in the list of modules to obtain the gene list and connectivity within the module.

Data Model 2 is a heat map showing the strength of correlation (Pearson’s R) and the significance of correlation (p-value) between modules and metadata values. The heat map is colored based on the p-values. The legend on the right shows the p-value color code; the p-value and correlation strength are also represented by numbers separated by a comma in each cell. Click on a cell in the heat map to obtain the gene list and gene significance values for the corresponding module-metadata relationship.

12/21/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Host Factor Experiment [IRD_SV_ICL004-R]

https://www.viprbrc.org/brc/svExperimentDetails.spg?expAccession=IRD_SV_ICL004-R&decorator=flavi_hcv&context=1513889701915 3/4

Two data models generated using WGCNA are shown. Model 1 is a heat map showing relationships between modules. A module is a set of co­expressed genes. Each module is represented by a distinct color. Click on a color in the list of modules to obtain the genes and connectivity withinthe module. Model 2 is a heatmap showing the strength of correlation (pearson's r) and the significance of correlation (­log10 p­value) betweenmodules and metadata. The heatmap is colored according to the ­log10 p­values. The legend on the right shows the ­log10 p­values while the ­log10 p­value and correlation strength are provided as numbers in each heat map cell. Click on a cell of the heat map to obtain the gene list andgene significance values for the corresponding module­metadata relationship.

Click on a module to view all host factors within

Host Factor Results

Data Models

0,0,+,+,+,0 83 +

0,0,+,0,0,­ 75 +

0,0,0,­,0,+ 70

0,0,­,0,+,+ 67 ­

0,0,+,+,0,+ 49 +

0,0,­,+,+,+ 40 ­

0,0,­,­,0,+ 29 ­

0,0,+,0,­,­ 28 +

0,0,+,0,+,0 27 +

0,0,­,0,­,0 26 ­

0,0,+,­,­,­ 25 +

0,0,+,­,0,0 12 +

0,0,+,­,0,­ 10 +

0,0,­,0,+,0 7 ­

0,0,+,+,0,­ 7 +

0,0,+,0,­,0 5 +

0,0,­,+,0,0 3 ­

0,0,0,­,­,+ 3

0,0,0,­,+,+ 3

0,0,+,­,­,0 3 +

0,0,+,­,0,+ 2 +

0,0,0,+,­,­ 2

0,0,­,+,0,+ 2 ­

0,0,­,+,+,0 1 ­

0,0,­,­,+,+ 1 ­

0,0,­,­,­,+ 1 ­

0,0,0,+,+,­ 1

The November 2017 release of IRD is now available, visit

www.fludb.org

Outreach Events• December 5-8, 2017:

Bioinformatics Workshop on Applications of Genomics & Bioinformatics to Infectious Diseases, Lyon, France

The GABRIEL Network and J. Craig Venter Institute co-organized this workshop. Twelve attendees from seven countries (Bangladesh, Brazil, Cambodia, France, India, Laos and Paraguay) attended the workshop. The IRD/ViPR team taught next generation sequencing technologies, sequences and sequence annotations in public databases, evolutionary analysis and comparative genomics analysis.

An H7N9 HA nucleotide tree calculated by RAxML. Tree nodes are color-coded based on nucleotide variations at the specified aligned position (704). The majority of the sequences carry T at this posi-tion (gray) which gives rise to the human adaptive substitution of 235L (226L in H3 numbering), while a few sequences carry A (red) or C (blue). Bootstrap values are displayed on the tree.

12/21/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Host Factor Experiment [IRD_SV_ICL004-R]

https://www.viprbrc.org/brc/svExperimentDetails.spg?expAccession=IRD_SV_ICL004-R&decorator=flavi_hcv&context=1513889701915 4/4

Release Date: Dec 6, 2017

This system is provided for authorized users only. Anyone using this system expressly consents to monitoring while using the system. Improper use of this system may be referred to lawenforcement officials. This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272201400028C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute, and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library, Wellcome Images, U.S. Department of VeteransAffairs, Science of the Invisible and ViralZone, Swiss Institute of Bioinformatics.

MOI_0 MOI_1

REPLICATE1

REPLICATE2

REPLICATE3

TIMEPOINT0

TIMEPOINT12

TIMEPOINT18

TIMEPOINT24

TIMEPOINT3

TIMEPOINT7

green

turquoise

lightcyan

red

brown

grey

midnightblue

cyan

blue

black

tan

magenta

salmon

greenyellow

yellow

purple

pink

1

2

3

4

5

6

Metadata Categories

Mod

ule

3.5582(­0.5709)

3.5582(0.5709)

0.0159(0.0078)

0.2259(0.0918)

0.2493(­0.0996)

0.7895(­0.2379)

0.6211(0.2012)

0.2475(0.099)

0.1533(0.0659)

1.6513(­0.3798)

0.8576(0.2516)

4.1278(­0.6114)

4.1278(0.6114)

0.0182(0.0089)

0.056(­0.0263)

0.0364(0.0174)

1.2351(­0.3187)

0.1908(0.0796)

1.0394(0.2856)

1.3932(0.3432)

0.8273(­0.2456)

0.3962(­0.1441)

0.3793(­0.1394)

0.3793(0.1394)

0.2565(0.102)

0.7891(­0.2378)

0.367(0.1358)

0.2885(­0.1122)

0.6699(­0.2123)

0.1618(0.0691)

0.8038(0.2408)

0.1335(0.0583)

0.0969(­0.0437)

6.3206(­0.7283)

6.3206(0.7283)

0.0121(0.006)

0.0302(0.0146)

0.0432(­0.0205)

0.1197(­0.0529)

0.881(0.2562)

0.0424(0.0202)

1.1105(­0.298)

0.0952(­0.0431)

0.3059(0.1176)

4.4283(0.6308)

4.4283(­0.6308)

0.0678(­0.0315)

0.0803(0.0368)

0.0109(­0.0054)

0.072(­0.0333)

0.4948(0.1705)

0.0501(0.0237)

0.0509(­0.024)

0.4721(­0.1647)

0.0593(0.0278)

0.3585(0.1334)

0.3585(­0.1334)

0.3409(­0.1282)

0.0562(­0.0264)

0.4342(0.1546)

3.2128(­0.5435)

0.4445(0.1574)

0.3762(­0.1385)

0.1125(­0.0501)

0.889(0.2577)

1.2247(0.317)

0.7647(­0.2328)

0.7647(0.2328)

0.4886(0.1689)

0.4236(­0.1517)

0.0359(­0.0172)

0.0624(0.0291)

0.8079(­0.2417)

0.4405(­0.1563)

0.6144(­0.1997)

0.8109(0.2423)

1.2828(0.3262)

0.6729(0.213)

0.6729(­0.213)

0.0449(­0.0213)

0.0595(0.0279)

0.0133(­0.0066)

1.5044(­0.3595)

0.5633(0.1876)

1.625(0.3763)

3.1224(0.536)

3.3263(­0.5528)

0.5631(­0.1875)

1.7162(0.3884)

1.7162(­0.3884)

0.0025(0.0013)

0.1357(0.0592)

0.139(­0.0605)

0.9459(0.2686)

0.0696(0.0322)

1.1399(­0.303)

2.5304(­0.4815)

0.4469(0.158)

1.2791(0.3257)

2.0473(­0.4294)

2.0473(0.4294)

0.0911(0.0414)

0.1783(­0.0751)

0.0731(0.0338)

0.3123(0.1196)

1.9774(0.4211)

0.2625(0.1039)

1.3269(­0.3331)

0.0513(­0.0242)

1.049(­0.2873)

0.2318(­0.0938)

0.2318(0.0938)

0.0656(­0.0305)

0.1165(0.0517)

0.0445(­0.0211)

0.0177(­0.0086)

2.5789(­0.4863)

0.1775(0.0748)

4.2733(0.6209)

0.0114(­0.0056)

0.5954(­0.1952)

4.1568(­0.6133)

4.1568(0.6133)

0.3489(0.1306)

0.2922(­0.1134)

0.0359(­0.0172)

0.7427(­0.2281)

0.3898(­0.1423)

0.1621(0.0692)

0.2779(0.1089)

0.0316(0.0152)

0.5212(0.1772)

2.2437(­0.4515)

2.2437(0.4515)

0.007(0.0034)

0.1883(0.0787)

0.198(­0.0821)

0.3165(0.1209)

0.1334(0.0583)

0.9954(­0.2777)

2.9892(­0.5245)

0.3615(0.1343)

2.6034(0.4887)

0.1545(0.0663)

0.1545(­0.0663)

0.0051(0.0025)

0.238(0.0959)

0.2457(­0.0984)

0.0615(­0.0287)

0.81(0.2421)

0.1416(­0.0615)

1.0053(­0.2795)

0.7416(­0.2279)

1.4767(0.3555)

0.1634(0.0696)

0.1634(­0.0696)

0.0357(0.0171)

0.1189(­0.0526)

0.0772(0.0355)

1.6188(0.3754)

1.1838(0.3103)

0.695(­0.2178)

5.5467(­0.6928)

1.0714(0.2912)

0.1546(­0.0664)

0.2422(­0.0973)

0.2422(0.0973)

0.1106(­0.0493)

0.1177(0.0522)

0.0057(­0.0028)

1.6638(­0.3815)

2.9464(0.5207)

1.018(0.2818)

0.2389(0.0962)

2.358(­0.4638)

0.1206(­0.0533)

2.7079(0.4988)

2.7079(­0.4988)

0.0688(­0.0319)

0.1373(0.0598)

0.0596(­0.0279)

0.9442(0.2682)

3.0475(­0.5295)

1.0002(­0.2785)

0.744(0.2284)

0.9068(0.2612)

0.113(0.0503)

New Features in IRD

Influenza Research Database (IRD) November2017

Questions? Problems? Suggestions?Click Here