Data Formats Document - Complete Genomics Data file formats are expected to evolve over time....

download Data Formats Document - Complete Genomics Data file formats are expected to evolve over time. Backward

If you can't read please download the document

  • date post

    15-Aug-2020
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of Data Formats Document - Complete Genomics Data file formats are expected to evolve over time....

  • Data File Formats File format v1.7

    Software v1.12

    May 20, 2011

    Copyright © 2011 Complete Genomics Incorporated. All rights reserved.

    cPAL and DNB are trademarks of Complete Genomics, Inc. in the US and certain other countries. All other trademarks are the property of their respective owners.

    Disclaimer of Warranties. COMPLETE GENOMICS, INC. PROVIDES THESE DATA IN GOOD FAITH TO THE RECIPIENT “AS IS.” COMPLETE GENOMICS, INC. MAKES NO REPRESENTATION OR WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE OR USE, OR ANY OTHER STATUTORY WARRANTY. COMPLETE GENOMICS, INC. ASSUMES NO LEGAL LIABILITY OR RESPONSIBILITY FOR ANY PURPOSE FOR WHICH THE DATA ARE USED.

    Any permitted redistribution of the data should carry the Disclaimer of Warranties provided above.

    Data file formats are expected to evolve over time. Backward compatibility of any new file format is not guaranteed.

    Complete Genomics data is for Research Use Only and not for use in the treatment or diagnosis of any human subject. Information, descriptions and specifications in this publication are subject to change without notice.

  • Data Formats Document Table of Contents

    © Complete Genomics, Inc. ii

    Table of Contents Preface ........................................................................................................................................................................................... 6

    Conventions .................................................................................................................................................................................................. 6 Analysis Tools .............................................................................................................................................................................................. 6 References ..................................................................................................................................................................................................... 6

    Introduction ................................................................................................................................................................................ 8 Sequencing Approach ............................................................................................................................................................................... 8 Mapping Reads and Calling Variations ............................................................................................................................................. 8 Read Data Format....................................................................................................................................................................................... 8 Data Delivery ................................................................................................................................................................................................ 9

    Data File Formats and Conventions ................................................................................................................................. 10 Data File Structure ................................................................................................................................................................................... 10 Header Format........................................................................................................................................................................................... 10 Sequence Coordinate System .............................................................................................................................................................. 14 Data File Content and Organization ................................................................................................................................................. 14

    ASM Results .............................................................................................................................................................................. 16 Small Variations and Annotations Files.......................................................................................................................................... 16

    Variations: var-[ASM-ID].tsv.bz2 ................................................................................................................................................... 18 Variations File Content ................................................................................................................................................................ 18 Variations Type Description ...................................................................................................................................................... 18 Interpreting the Variations File ............................................................................................................................................... 19

    Master Variations: masterVarBeta-[ASM-ID].tsv.bz2 ........................................................................................................... 23 Annotated Variants within Genes: gene-[ASM-ID].tsv.bz2 ................................................................................................. 28 Annotated Variants within Non-coding RNAs: ncRNA-[ASM-ID].tsv.bz2 .................................................................... 33 Count of Variations by Gene: geneVarSummary-[ASM-ID].tsv ......................................................................................... 35 Variations at Known dbSNP Loci: dbSNPAnnotated-[ASM-ID].tsv.bz2 ......................................................................... 37 Sequencing Metrics and Variations Summary: summary-[ASM-ID].tsv ....................................................................... 40

    Copy Number Variation Files .............................................................................................................................................................. 44 Copy Number Segmentation: cnvSegmentsBeta-[ASM-ID].tsv ......................................................................................... 45 Detailed Ploidy and Coverage Information: cnvDetailsBeta-[ASM-ID].tsv.bz2 ......................................................... 48

    Genomic Copy Number Analysis of Tumor Samples Files ..................................................................................................... 51 Tumor CNV Segments: cnvTumorSegmentsBeta-[ASM-ID].tsv.bz2 ................................................................................ 51 Detailed Tumor Coverage Level Information: cnvTumorDetailsBeta-[ASM-ID].tsv.bz2 ....................................... 54 Depth of Coverage Report: depthOfCoverage_100000-[ASM-ID].tsv ............................................................................. 56

    Structural Variation Files ...................................................................................................................................................................... 59 Detected Junctions and Associated Annotations: allJunctionsBeta-[ASM-ID].tsv .................................................... 60 High-confidence Junctions and Associated Annotations: highConfidenceJunctionBeta-[ASM-ID].tsv ............ 65 Alignments of DNBs in Junction Cluster: evidenceJunctionDnbBeta-[ASM-ID].tsv.bz2 .......................................... 65 Evidence Junctions and Annotations: evidenceJunctionClustersBeta-[ASM-ID].tsv ................................................ 68

    Mobile Element Insertion Files .......................................................................................................................................................... 70 Mobile Element Insertion Sites: mobileElementInsertionsBeta-[ASM-ID].tsv ........................................................... 71 Mobile Element Insertion ROC Graph: mobileElementInsertionsROCBeta-[ASM-ID].png .................................... 75 Mobile Element Insertion Reference Counts Graph:

    mobileElementInsertionsRefCountsBeta-[ASM-ID].png ............................................................................................. 76

  • Data Formats Document List of Tables

    © Complete Genomics, Inc. iii

    Assemblies Underlying Called Variants Files ............................................................................................................................... 77 Alignment CIGAR Format ............................................................................................................................................................ 78

    Results from Assembled Intervals: evidenceIntervals-[CHROMOSOME-ID]-[ASM-ID].tsv.bz2 ........................... 79 Individual Reads Aligned to Assembled Sequences:

    evidenceDnbs-[CHROMOSOME-ID]-[ASM-ID].tsv.bz2 ................................................................................................. 81 Correlation of Evidence between Assemblies: correlation.tsv.bz ................................................................................... 84

    Coverage and Reference Scores Files: coverageRefScore-[CHROMOSOME ID]-[ASM ID].tsv.bz2 ........................... 86 Quality and Characteristics of Sequenced Genome Files ........................................................................................................ 88

    Coverage Distribution Report File: coverage-[ASM-ID].tsv and cove