GENOMICS AND BIOINFORMATICS APPROACHES TO FUNCTIONAL GENE ...
Influenza Research Database (IRD) November 2017 New Features … · 2020-04-20 · • December...
Transcript of Influenza Research Database (IRD) November 2017 New Features … · 2020-04-20 · • December...
Visualizing Sequence Variations and Bootstrap Values on TreesThe IRD/ViPR team continues to provide new capabilities for visualizing your phylogenetic trees using the newly developed Archaeop-teryx.js tree viewer. If you create a tree using sequences from the IRD/ViPR database, you can color-code tree nodes based on metadata characteristics associated with each sequence (e.g. host species, country, geographic region, isolation year, virus subtype, etc.). To sup-port more advanced comparative genomics analyses, we now provide the following new features:
• Color-code tree nodes and/or labels by the sequence variation(s) observed at a specified position. To use this new feature, after you computed a phylogenetic tree, click View Tree to visualize the tree, and then use the MSA Residue Pos. option in the Visualizations panel to select the position to visualize (either via the slider, or by directly entering the position in the text box).
• Display bootstrap values from RAxML trees with bootstrap values by clicking Confidence in the left panel.
Host Factor Data Models
To facilitate host factor data analysis, we added the ability to support data models in the September 2017 release. In this release, host factor data models has been computed for 24 microarray datasets and 3 RNAseq datasets in IRD.
To recap, these data models are designed to help users identify modules of co-regulated genes that can be invisible at individual gene level and allow guilt-by-association predictions. They are constructed by the following key steps:
• Use WGCNA to cluster genes into modules based on their common gene expression profile across samples. (N.B. WGCNA is a popular method of identifying genes correlated in expression and then construct gene co-expression networks).
• Correlate WGCNA clusters with metadata, such as virus, MOI and time.
• Visualize data matrices in heat maps.
Data Models for Host Factor Experiment IRD_SV_ICL004-R (https://www.fludb.org > Search Data > Host Factor Experiments > ICL004-R).
Data Model 1 is a heat map showing relationships between the modules identified by WGCNA. A module is a set of co-expressed genes. Each module is represented by a distinct color. Click on a color in the list of modules to obtain the gene list and connectivity within the module.
Data Model 2 is a heat map showing the strength of correlation (Pearson’s R) and the significance of correlation (p-value) between modules and metadata values. The heat map is colored based on the p-values. The legend on the right shows the p-value color code; the p-value and correlation strength are also represented by numbers separated by a comma in each cell. Click on a cell in the heat map to obtain the gene list and gene significance values for the corresponding module-metadata relationship.
12/21/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Host Factor Experiment [IRD_SV_ICL004-R]
https://www.viprbrc.org/brc/svExperimentDetails.spg?expAccession=IRD_SV_ICL004-R&decorator=flavi_hcv&context=1513889701915 3/4
Two data models generated using WGCNA are shown. Model 1 is a heat map showing relationships between modules. A module is a set of coexpressed genes. Each module is represented by a distinct color. Click on a color in the list of modules to obtain the genes and connectivity withinthe module. Model 2 is a heatmap showing the strength of correlation (pearson's r) and the significance of correlation (log10 pvalue) betweenmodules and metadata. The heatmap is colored according to the log10 pvalues. The legend on the right shows the log10 pvalues while the log10 pvalue and correlation strength are provided as numbers in each heat map cell. Click on a cell of the heat map to obtain the gene list andgene significance values for the corresponding modulemetadata relationship.
Click on a module to view all host factors within
Host Factor Results
Data Models
0,0,+,+,+,0 83 +
0,0,+,0,0, 75 +
0,0,0,,0,+ 70
0,0,,0,+,+ 67
0,0,+,+,0,+ 49 +
0,0,,+,+,+ 40
0,0,,,0,+ 29
0,0,+,0,, 28 +
0,0,+,0,+,0 27 +
0,0,,0,,0 26
0,0,+,,, 25 +
0,0,+,,0,0 12 +
0,0,+,,0, 10 +
0,0,,0,+,0 7
0,0,+,+,0, 7 +
0,0,+,0,,0 5 +
0,0,,+,0,0 3
0,0,0,,,+ 3
0,0,0,,+,+ 3
0,0,+,,,0 3 +
0,0,+,,0,+ 2 +
0,0,0,+,, 2
0,0,,+,0,+ 2
0,0,,+,+,0 1
0,0,,,+,+ 1
0,0,,,,+ 1
0,0,0,+,+, 1
The November 2017 release of IRD is now available, visit
www.fludb.org
Outreach Events• December 5-8, 2017:
Bioinformatics Workshop on Applications of Genomics & Bioinformatics to Infectious Diseases, Lyon, France
The GABRIEL Network and J. Craig Venter Institute co-organized this workshop. Twelve attendees from seven countries (Bangladesh, Brazil, Cambodia, France, India, Laos and Paraguay) attended the workshop. The IRD/ViPR team taught next generation sequencing technologies, sequences and sequence annotations in public databases, evolutionary analysis and comparative genomics analysis.
An H7N9 HA nucleotide tree calculated by RAxML. Tree nodes are color-coded based on nucleotide variations at the specified aligned position (704). The majority of the sequences carry T at this posi-tion (gray) which gives rise to the human adaptive substitution of 235L (226L in H3 numbering), while a few sequences carry A (red) or C (blue). Bootstrap values are displayed on the tree.
12/21/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Host Factor Experiment [IRD_SV_ICL004-R]
https://www.viprbrc.org/brc/svExperimentDetails.spg?expAccession=IRD_SV_ICL004-R&decorator=flavi_hcv&context=1513889701915 4/4
Release Date: Dec 6, 2017
This system is provided for authorized users only. Anyone using this system expressly consents to monitoring while using the system. Improper use of this system may be referred to lawenforcement officials. This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272201400028C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute, and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library, Wellcome Images, U.S. Department of VeteransAffairs, Science of the Invisible and ViralZone, Swiss Institute of Bioinformatics.
MOI_0 MOI_1
REPLICATE1
REPLICATE2
REPLICATE3
TIMEPOINT0
TIMEPOINT12
TIMEPOINT18
TIMEPOINT24
TIMEPOINT3
TIMEPOINT7
green
turquoise
lightcyan
red
brown
grey
midnightblue
cyan
blue
black
tan
magenta
salmon
greenyellow
yellow
purple
pink
1
2
3
4
5
6
Metadata Categories
Mod
ule
3.5582(0.5709)
3.5582(0.5709)
0.0159(0.0078)
0.2259(0.0918)
0.2493(0.0996)
0.7895(0.2379)
0.6211(0.2012)
0.2475(0.099)
0.1533(0.0659)
1.6513(0.3798)
0.8576(0.2516)
4.1278(0.6114)
4.1278(0.6114)
0.0182(0.0089)
0.056(0.0263)
0.0364(0.0174)
1.2351(0.3187)
0.1908(0.0796)
1.0394(0.2856)
1.3932(0.3432)
0.8273(0.2456)
0.3962(0.1441)
0.3793(0.1394)
0.3793(0.1394)
0.2565(0.102)
0.7891(0.2378)
0.367(0.1358)
0.2885(0.1122)
0.6699(0.2123)
0.1618(0.0691)
0.8038(0.2408)
0.1335(0.0583)
0.0969(0.0437)
6.3206(0.7283)
6.3206(0.7283)
0.0121(0.006)
0.0302(0.0146)
0.0432(0.0205)
0.1197(0.0529)
0.881(0.2562)
0.0424(0.0202)
1.1105(0.298)
0.0952(0.0431)
0.3059(0.1176)
4.4283(0.6308)
4.4283(0.6308)
0.0678(0.0315)
0.0803(0.0368)
0.0109(0.0054)
0.072(0.0333)
0.4948(0.1705)
0.0501(0.0237)
0.0509(0.024)
0.4721(0.1647)
0.0593(0.0278)
0.3585(0.1334)
0.3585(0.1334)
0.3409(0.1282)
0.0562(0.0264)
0.4342(0.1546)
3.2128(0.5435)
0.4445(0.1574)
0.3762(0.1385)
0.1125(0.0501)
0.889(0.2577)
1.2247(0.317)
0.7647(0.2328)
0.7647(0.2328)
0.4886(0.1689)
0.4236(0.1517)
0.0359(0.0172)
0.0624(0.0291)
0.8079(0.2417)
0.4405(0.1563)
0.6144(0.1997)
0.8109(0.2423)
1.2828(0.3262)
0.6729(0.213)
0.6729(0.213)
0.0449(0.0213)
0.0595(0.0279)
0.0133(0.0066)
1.5044(0.3595)
0.5633(0.1876)
1.625(0.3763)
3.1224(0.536)
3.3263(0.5528)
0.5631(0.1875)
1.7162(0.3884)
1.7162(0.3884)
0.0025(0.0013)
0.1357(0.0592)
0.139(0.0605)
0.9459(0.2686)
0.0696(0.0322)
1.1399(0.303)
2.5304(0.4815)
0.4469(0.158)
1.2791(0.3257)
2.0473(0.4294)
2.0473(0.4294)
0.0911(0.0414)
0.1783(0.0751)
0.0731(0.0338)
0.3123(0.1196)
1.9774(0.4211)
0.2625(0.1039)
1.3269(0.3331)
0.0513(0.0242)
1.049(0.2873)
0.2318(0.0938)
0.2318(0.0938)
0.0656(0.0305)
0.1165(0.0517)
0.0445(0.0211)
0.0177(0.0086)
2.5789(0.4863)
0.1775(0.0748)
4.2733(0.6209)
0.0114(0.0056)
0.5954(0.1952)
4.1568(0.6133)
4.1568(0.6133)
0.3489(0.1306)
0.2922(0.1134)
0.0359(0.0172)
0.7427(0.2281)
0.3898(0.1423)
0.1621(0.0692)
0.2779(0.1089)
0.0316(0.0152)
0.5212(0.1772)
2.2437(0.4515)
2.2437(0.4515)
0.007(0.0034)
0.1883(0.0787)
0.198(0.0821)
0.3165(0.1209)
0.1334(0.0583)
0.9954(0.2777)
2.9892(0.5245)
0.3615(0.1343)
2.6034(0.4887)
0.1545(0.0663)
0.1545(0.0663)
0.0051(0.0025)
0.238(0.0959)
0.2457(0.0984)
0.0615(0.0287)
0.81(0.2421)
0.1416(0.0615)
1.0053(0.2795)
0.7416(0.2279)
1.4767(0.3555)
0.1634(0.0696)
0.1634(0.0696)
0.0357(0.0171)
0.1189(0.0526)
0.0772(0.0355)
1.6188(0.3754)
1.1838(0.3103)
0.695(0.2178)
5.5467(0.6928)
1.0714(0.2912)
0.1546(0.0664)
0.2422(0.0973)
0.2422(0.0973)
0.1106(0.0493)
0.1177(0.0522)
0.0057(0.0028)
1.6638(0.3815)
2.9464(0.5207)
1.018(0.2818)
0.2389(0.0962)
2.358(0.4638)
0.1206(0.0533)
2.7079(0.4988)
2.7079(0.4988)
0.0688(0.0319)
0.1373(0.0598)
0.0596(0.0279)
0.9442(0.2682)
3.0475(0.5295)
1.0002(0.2785)
0.744(0.2284)
0.9068(0.2612)
0.113(0.0503)
New Features in IRD
Influenza Research Database (IRD) November2017
Questions? Problems? Suggestions?Click Here