Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
-
Upload
gigascience-bgi-hong-kong -
Category
Technology
-
view
120 -
download
3
description
Transcript of Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Revolutionizing data dissemination.
www.gigasciencejournal.com
GSC13, ShenzhenScott Edmunds
www.gigasciencejournal.com
Large-Scale Data Journal/Database
Editor-in-Chief: Laurie Goodman, PhDEditor: Scott Edmunds, PhDAssistant Editor: Alexandra Basford, PhDLead Curator: Tam Sneddon D.Phil
In conjunction with:
Now taking submissions…
www.gigaDB.org
Associated Database
BGIData Reuse
Funders
Databases
Journals
Data Producers
Users
…Data Flow
Data Re-use
($)
Effort
Usability
Need to lower the hurdles…
($)
Effort
Usability
Need to lower the hurdles…
($)
Effort
Usability
Need to lower the hurdles…
Cloud solutions?
Better tools for assessing data quality…
Better handling of metadata…
Cloud?
Need to lower the hurdles…More efficient handling of data…
Do we need to keep everything?
Compression?
Better incentives?
($)
Effort
Usability
?
New incentives/credit
Credit where credit is overdue:“One option would be to provide researchers who release data to public repositories with a means of accreditation.”“An ability to search the literature for all online papers that used a particular data set would enable appropriate attribution for those who share. “Nature Biotechnology 27, 579 (2009)
Prepublication data sharing (Toronto International Data Release Workshop)“Data producers benefit from creating a citable reference, as it can later be used to reflect impact of the data sets.” Nature 461, 168-170 (2009)
Datacitation: Datacite and DOIs
“increase acceptance of research data as legitimate, citable contributions to the scholarly record”.
Aims to:
“data generated in the course of research are just as valuable to the ongoing academic discourse as papers and monographs”.
For data citation to work, needs:
• Proven utility/potential user base.
• Acceptance/inclusion by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
>1.3 million DOIs since Dec 2009
Datacitation: utility/user base.
BGI Datasets Get DOI®s
doi:10.5524/100004
PLANTSChinese cabbageCucumberFoxtail milletPigeonpeaPotatoSorghum
MicrobeE. Coli O104:H4 TY-2482
Cell-LineChinese Hamster Ovary
Human Asian individual (YH) - DNA Methylome - Genome Assembly- TranscriptomeAncient DNA (coming soon)- Saqqaq Eskimo - Aboriginal Australian
VertebratesGiant panda Macaque - Chinese rhesus - Crab-eatingNaked mole rat Penguin - Emperor penguin- Adelie penguinPigeon, domesticPolar bearSheepTibetan antelope
InvertebrateAnt - Florida carpenter ant- Jerdon’s jumping ant- Leaf-cutter antRoundwormSilkworm
Many released pre-publication…
To maximize its utility to the research community and aid those fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001
Our first DOI:
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
Data Citation: acceptance by journals
Data Citation: acceptance by journals
Data+Citation: inclusion in the references
• Data submitted to NCBI databases:
• Submission to public databases complemented by its citable form in GigaDB.
Published 21st November 2011
- Raw data SRA:SRA046843 - Assemblies of 3 strains Genbank:AHAO00000000-AHAQ00000000 - SNPs dbSNP:1056306 - CNVs- InDels dbGAP:nstd63 - SV
}
In the references…
Is the DOI…
And now in Nature Biotech…
Datacitation: tracking?
Datacitation: tracking?
Plans in 2012 to link central metadata repository with WoS
- Will finally track and credit use!
To be continued…
DataCite metadata in harvestable form (OAI-PMH)
www.gigasciencejournal.com
Thanks to:
@gigascience
facebook.com/GigaScience
blogs.openaccesscentral.com/blogs/gigablog/
Contact us:
Laurie Goodman Alexandra BasfordTam Sneddon Shaoguang LiangTin-Lap Lee (CUHK) Qiong Luo (HKUST)
Follow us:
www.gigasciencejournal.comContact: [email protected]
GSC13 special series
• Rapid review - rolling publication after launch issue• High-visibility – published/promoted by BMC/GigaScience• Article Processing Charge covered by BGI• Hosting of any test datasets in GigaDB
Seeking submissions highlighting best practice in genomics research:
• Discussion/comment/white papers• Cloud computing, software for data handling• Research highlighting best practice