Benjamin L. King 1, W. Kelley Thomas 2, James Vincent 3, Zahir Shaikh 4, Shawn W. Polson 5, Cathy Wu...

1
Benjamin L. King 1 , W. Kelley Thomas 2 , James Vincent 3 , Zahir Shaikh 4 , Shawn W. Polson 5 , Cathy Wu 5 and The North East Bioinformatics Collaborative 1 Maine INBRE, 2 New Hampshire INBRE, 3 , Vermont Genetics Network, 4 Rhode Island INBRE, 5 Delaware INBRE Bioinformatics Training and Research Collaborations Among North East IDeA Programs Leverage bioinformatics expertise and resources across region to expand research capacity of individual states. Coordinate and conduct bioinformatics research training through regional workshops and courses. Conduct collaborative cyber-enabled research to benefit IDeA-funded investigators in multiple states. Goals of North East Bioinformatics Collaborative Abstract Bioinformatics experts in Maine, New Hampshire, Vermont, Rhode Island and Delaware collaborate to deliver training and conduct research through the NEBC of the innovative North East Cyberinfrastructure Consortium. Analyses of large datasets are fundamental to biomedical research and bioinformatics skills are in high demand. Leveraging bioinformatics expertise and resources across regional IDeA programs has expanded the capacity of individual states to deliver training and conduct collaborative research. Examples include the Applied Bioinformatics Course that has trained a total of 97 faculty and students over four years representing each state. In addition to other courses and workshops, NEBC experts support graduate degree programs in bioinformatics. NEBC cyber-enabled collaborative research began with the skate (Leucoraja erinacea) genome project that has informed comparative studies of genes important to human development and health. New functional studies of non-coding genes using the zebrafish (Danio rerio) will advance this powerful regional collaboration. Abstract Goals of North East Bioinformatics Collaborative (NEBC) Collaborative Training in Bioinformatics Collaborative Research Background North East Cyberinfrastructure Consortium (NECC) The NECC successfully built regional cyber- infrastructure and conducted coordinated cyber-enabled bioinformatics research and training through ongoing research projects. The NEBC was created by the NECC to facilitate collaborations in bioinformatics across member states. Skate genome sequencing was funded by: NIH NCRR ARRA Supplements to P20 RR016463-12 (MDIBL), P20 RR016472-12 (UD), P20 RR16462 (UVM). The North East Cyberinfrasturcture Consortium is funded by: - NIH National Center for Research Resources grants: P20 RR016463 (MDIBL), P20 RR016472 (UD), P20 RR16462 (UVM), P20 RR016457 (URI), P20 RR030360 (UNH) - NIH National Institute of General Medical Sciences grants: P20 GM103423 (MDIBL), P20 GM103446 (UD), P20 GM103449 (UVM), P20 GM103430 (URI), P20 GM103506 (Dartmouth) Funding Continue to collaboratively: Develop and deliver bioinformatics training materials. - Metagenomics workshop in 2016. Improve skate genome assembly and analyze zebrafish non-coding RNAs. Share regional computational and data storage infrastructure. Conclusions Future Directions http://www.necyberconsortium.org NEBC is a powerful model for collaborations and research sharing among IDeA programs. Workshops and courses have increased knowledge and application of bioinformatics across region. Impact of collaborative research projects: - Three publications cited over 50 times - King BL et al., Science (2011) 334:6062, 1517-1517 - Wang Q et al., Database (2012) 2012:bar064 - Wyffels J et al., F1000 Research (2014) 3:191 Leucoraja erinacea Cartilaginous fishes most basal extant jawed vertebrates New fiber Existing fiber Presque Isle / to CANADA Brunswick Mancheste r Ellsworth Orono Warwick, Narragansett, Kingston Burlingt on MAINE DATA CENTER White River Junction/ Hanover DELAWARE DATA CENTER Newark Dover Facilitate Bioinformatics Training Jointly host and teach workshops and courses Share training materials Support bioinformatics educational programs Foster research collaborations within and across states Collaborative Bioinformatics Workshops and Courses Year Training Participants 2010 Genome Sequence and Assembly Workshop, May (DE) 27 (All 5 states) Skate Genome Annotation Workshop, Oct. (ME) 22 (All 5 states) 2011 Skate Genome Annotation Workshop, May (DE) 30 (All 5 states) Mitochondria Genome Annotation, Sept. (On-Line) 29 (All 5 states) 2012 Applied Bioinformatics Course, Oct. (ME) 25 (NH, ME) 2013 Applied Bioinformatics Course, Oct. (ME) 23 (DE, NH, VT, ME) 2014 Applied Bioinformatics Course, Oct. (ME) 23 (NH, ME) 2015 Genome Informatics Workshop, Apr. (ME) 28 (DE, VT, ME) Applied Bioinformatics Course, Aug. (ME) 24 (NH, VT, RI, ME) 2016 Metagenomics Workshop (planning) TOTAL: 231 Strongly Agree Agree Undecided Disagree Strongly Disagree 0 2 4 6 8 10 12 14 Increased Knowledge of High-Throughput Sequence Data Analysis Among 2015 Applied Bioinformatics Course Participants Pre-Course Post-Course Self Evaluation Number of Participants Skate Genome Project Skate is a model for renal physiology, stem cell biology, immunology, developmental biology and toxicology. 11kb read that bridges two scaffolds Two scaffolds Distribution of PacBio Read Lengths Zebrafish RNA-Seq dataset for the study of non-coding RNAs Diversity of Gene Types in Zebrafish Genome New Resource Available for Future Collaborative Efforts New PacBio Genome Sequence Data Hox genes regulate patterning during embryonic development. The skate HoxA and HoxD clusters are well characterized in the current genome assembly, but the HoxB cluster has six scaffolds. PacBio reads improve the HoxB cluster assembly by bridging scaffolds together. Length (bp) Number of Reads Protein-Coding Genes (25,642) Small Non-Coding (3,172) Long Non-Coding (2,741) Other Non-Coding (95) Example Application: lncRNA Expression During Epimorphic Appendage Regeneration log 2 (Read Counts Per Million) log 2 (Fold Change)

Transcript of Benjamin L. King 1, W. Kelley Thomas 2, James Vincent 3, Zahir Shaikh 4, Shawn W. Polson 5, Cathy Wu...

Page 1: Benjamin L. King 1, W. Kelley Thomas 2, James Vincent 3, Zahir Shaikh 4, Shawn W. Polson 5, Cathy Wu 5 and The North East Bioinformatics Collaborative.

Benjamin L. King1, W. Kelley Thomas2, James Vincent3, Zahir Shaikh4, Shawn W. Polson5, Cathy Wu5 and The North East Bioinformatics Collaborative1 Maine INBRE, 2 New Hampshire INBRE, 3, Vermont Genetics Network, 4 Rhode Island INBRE, 5 Delaware INBRE

Bioinformatics Training and Research Collaborations Among North East IDeA Programs

• Leverage bioinformatics expertise and resources across region to expand research capacity of individual states.• Coordinate and conduct bioinformatics research training through regional workshops and courses.• Conduct collaborative cyber-enabled research to benefit IDeA-funded investigators in multiple states.

Goals of North East Bioinformatics Collaborative

Abstract

Bioinformatics experts in Maine, New Hampshire, Vermont, Rhode Island and Delaware collaborate to deliver training and conduct research through the NEBC of the innovative North East Cyberinfrastructure Consortium. Analyses of large datasets are fundamental to biomedical research and bioinformatics skills are in high demand. Leveraging bioinformatics expertise and resources across regional IDeA programs has expanded the capacity of individual states to deliver training and conduct collaborative research. Examples include the Applied Bioinformatics Course that has trained a total of 97 faculty and students over four years representing each state. In addition to other courses and workshops, NEBC experts support graduate degree programs in bioinformatics. NEBC cyber-enabled collaborative research began with the skate (Leucoraja erinacea) genome project that has informed comparative studies of genes important to human development and health. New functional studies of non-coding genes using the zebrafish (Danio rerio) will advance this powerful regional collaboration.

Abstract

Goals of North East Bioinformatics Collaborative (NEBC)

Collaborative Training in Bioinformatics

Collaborative ResearchBackground

North East Cyberinfrastructure Consortium (NECC)

The NECC successfully built regional cyber-infrastructure and conducted coordinated cyber-enabled bioinformatics research and training through ongoing research projects. The NEBC was created by the NECC to facilitate collaborations in bioinformatics across member states.

Skate genome sequencing was funded by: NIH NCRR ARRA Supplements to P20 RR016463-12 (MDIBL), P20 RR016472-12 (UD), P20 RR16462 (UVM).The North East Cyberinfrasturcture Consortium is funded by: - NIH National Center for Research Resources grants: P20 RR016463 (MDIBL), P20 RR016472 (UD), P20 RR16462 (UVM), P20 RR016457 (URI), P20 RR030360 (UNH) - NIH National Institute of General Medical Sciences grants: P20 GM103423 (MDIBL), P20 GM103446 (UD), P20 GM103449 (UVM), P20 GM103430 (URI), P20 GM103506 (Dartmouth) - National Science Foundation EPSCoR grants: EPS-0904155 (UM), EPS-081425 (UD), EPS-1101317 (UVM), EPS-1004057 (URI), EPS-1101245 (UNH).

Funding

Continue to collaboratively:• Develop and deliver bioinformatics training materials. - Metagenomics workshop in 2016.

• Improve skate genome assembly and analyze zebrafish non-coding RNAs.

• Share regional computational and data storage infrastructure.

Conclusions Future Directions

http://www.necyberconsortium.org

• NEBC is a powerful model for collaborations and research sharing among IDeA programs.

• Workshops and courses have increased knowledge and application of bioinformatics across region.

• Impact of collaborative research projects: - Three publications cited over 50 times

- King BL et al., Science (2011) 334:6062, 1517-1517- Wang Q et al., Database (2012) 2012:bar064- Wyffels J et al., F1000 Research (2014) 3:191

Leucoraja erinacea

Cartilaginous fishes most basal extant jawed vertebrates

New fiber

Existing fiber

Presque Isle / to CANADA

Brunswick

Manchester

Ellsworth

Orono

Warwick, Narragansett, Kingston

BurlingtonMAINE

DATA CENTER

White River Junction/Hanover

DELAWARE DATA CENTER

Newark

Dover

Facilitate Bioinformatics TrainingJointly host and teach workshops and coursesShare training materialsSupport bioinformatics educational programsFoster research collaborations within and across states

Collaborative Bioinformatics Workshops and CoursesYear Training Participants

2010Genome Sequence and Assembly Workshop, May (DE) 27 (All 5 states)Skate Genome Annotation Workshop, Oct. (ME) 22 (All 5 states)

2011Skate Genome Annotation Workshop, May (DE) 30 (All 5 states)Mitochondria Genome Annotation, Sept. (On-Line) 29 (All 5 states)

2012 Applied Bioinformatics Course, Oct. (ME) 25 (NH, ME)2013 Applied Bioinformatics Course, Oct. (ME) 23 (DE, NH, VT, ME)2014 Applied Bioinformatics Course, Oct. (ME) 23 (NH, ME)

2015Genome Informatics Workshop, Apr. (ME) 28 (DE, VT, ME)Applied Bioinformatics Course, Aug. (ME) 24 (NH, VT, RI, ME)

2016 Metagenomics Workshop (planning) TOTAL: 231

Strongly Agree Agree Undecided Disagree Strongly Disagree

0

2

4

6

8

10

12

14

Increased Knowledge of High-Throughput Sequence Data Analysis Among 2015 Applied Bioinformatics Course Participants

Pre-Course Post-Course

Self Evaluation

Num

ber o

f Par

ticip

ants

Skate Genome Project

Skate is a model for renal physiology, stem cell biology, immunology, developmental biology and toxicology.

11kb read that bridges two scaffolds

Two scaffoldsDistribution of PacBio Read Lengths

Zebrafish RNA-Seq dataset for the study of non-coding RNAs

Diversity of Gene Types in Zebrafish Genome

New Resource Available for Future Collaborative Efforts

New PacBio Genome Sequence DataHox genes regulate patterning during embryonic development. The skate HoxA and HoxD clusters are well characterized in the current genome assembly, but the HoxB cluster has six scaffolds. PacBio reads improve the HoxB cluster assembly by bridging scaffolds together.

Length (bp)

Nu

mb

er

of

Re

ads

Protein-Coding Genes (25,642)Small Non-Coding (3,172)Long Non-Coding (2,741)Other Non-Coding (95)Pseudogenes (293)

Example Application: lncRNA Expression During

Epimorphic Appendage Regeneration

log2(Read Counts Per Million)

log

2(F

old

Ch

an

ge

)