Ulysses J. Balis, MD Professor of Pathology University of Michigan [email protected].

download Ulysses J. Balis, MD Professor of Pathology University of Michigan ulysses@umich.edu.

If you can't read please download the document

Transcript of Ulysses J. Balis, MD Professor of Pathology University of Michigan [email protected].

  • Slide 1
  • Ulysses J. Balis, MD Professor of Pathology University of Michigan [email protected]
  • Slide 2
  • * A brief history of digital imaging, digital annotation, markup and metadata encoding * Primer on image data encoding concepts and requirements * Primer on Image storage data formats * Primer on DICOM * Primer on image metadata concepts * Exploration of OME as a representative image metadata lexicon * Case studies in image metadata challenges, as created by a contemporary lack of a single common framework
  • Slide 3
  • Corona Satellite Image Program (1959-1972) Film based, but with digital assistance its latest phase. The challenge of image search as experienced with this project provided the first insights as to the difficulty of this type of computational problem.
  • Slide 4
  • Modern Remote Sensing Era (1972-present)
  • Slide 5
  • Taken together, this was, and continues to be, a big mess. * With the advent of massive remote sensing repositories, it became necessary to have both image file formats and standard conventions for storing the accompanying image markup or annotation data. * Initially, images were stored as raw information, with separate associated metadata files * Over time, it became obvious that there was an organizational and ontological advantage in blending image data with its respective descriptive information * This blend directly leads to the contemporary reality, as there is no one right way to blend such data: there are many ways * Some are public knowledge * Some are proprietary * Some are blends of open and proprietary formats
  • Slide 6
  • Big Data Cloud Based Repository Investigator Collaborating Scientist Query modes: By image itself By ROI By free text By metadata tag Health Management Team Challenge: to allow for these disparate user classes and groups to effectively make use of this data, it will be necessary to leverage a common set of definitions such that concept-based retrieval is always consistent and complete
  • Slide 7
  • * A brief history of digital imaging, digital annotation, markup and metadata encoding * Primer on image data encoding concepts and requirements * Primer on Image storage data formats * Primer on DICOM * Primer on image metadata concepts * Exploration of OME as a representative image metadata lexicon * Case studies in image metadata challenges, as created by a contemporary lack of a single common framework
  • Slide 8
  • * Image information must be encoded into files in some standard format in order to be correctly decoded * A key challenge arises from the contemporary reality that there are literally hundreds of image file formats, and even more formats of embedded metadata * This reality is greatly compounded by the often presence of proprietary data elements, whose meaning and syntax are difficult, if not impossible, to elucidate without proprietary documentation
  • Slide 9
  • * Key modules of a typical image file will include the following: * Image Descriptive header used to describe the type, geometry, color order and bit depth of the image * Additional metadata header image creation date, capture conditions, proprietary data elements * Image Payload the image itself (may be more than one image) * Image Appended Data any type of image data or metadata that needs to be added following image creation (e.g. chain of custody information)
  • Slide 10
  • Header Data General and Proprietary Metadata Image Payload File Start Optional Appended /Journal Data
  • Slide 11
  • Header Data General and Proprietary Metadata Image Payload Header Data is typically utilized to represent the following top-level concepts: Image Type Image Dimensions Image Bit Depth Encoding Sequence Number of planes Alpha Channel detail Optional Appended /Journal Data
  • Slide 12
  • Header Data Image Payload Image Metadata can include anything beyond the usual image-level descriptors Acquisition conditions Experimental conditions (in vivo / in vitro) Reagents and lot numbers Observations /diagnoses Categorical data SNPs Variants Small Molecules Computational findings Anything else Optional Appended /Journal Data General and Proprietary Metadata
  • Slide 13
  • Header Data General and Proprietary Metadata The Image Payload typically utilized to house the actual image data itself, including: Individual color/fluorescence channels Alpha channels Annotation channels Collaborative annotation channels Spatially gated image markup (global markup is usually in the Metadata section) Mask data Optional Appended /Journal Data Image Payload
  • Slide 14
  • Header Data General and Proprietary Metadata Image Payload The optional Appended / Journal Data section can house: Chain-of-Custody events Post-acquisition image transformation and normalization operations Logged data sharing events Post-capture analytic information such as derived from multi-parametric analyses and machine learning operations Optional Appended /Journal Data
  • Slide 15
  • * A brief history of digital imaging, digital annotation, markup and metadata encoding * Primer on image data encoding concepts and requirements * Primer on Image storage data formats * Primer on DICOM * Primer on image metadata concepts * Exploration of OME as a representative image metadata lexicon * Case studies in image metadata challenges, as created by a contemporary lack of a single common framework
  • Slide 16
  • First Part of header
  • Slide 17
  • Second Part of header
  • Slide 18
  • Slide 19
  • Slide 20
  • From http://bigtiff.orghttp://bigtiff.org
  • Slide 21
  • Slide 22
  • Slide 23
  • * All with distinct Header, Metadata and Payload formats * Many have proprietary data elements * Tools to cross-walk the images exist but a universal metadata translator is not yet available.
  • Slide 24
  • Slide 25
  • Slide 26
  • * A brief history of digital imaging, digital annotation, markup and metadata encoding * Primer on image data encoding concepts and requirements * Primer on Image storage data formats * Primer on DICOM * Primer on image metadata concepts * Exploration of OME as a representative image metadata lexicon * Case studies in image metadata challenges, as created by a contemporary lack of a single common framework
  • Slide 27
  • * Formal ISO Name: * ISO standard 12052:2006 "Health informatics -- Digital imaging and communication in medicine (DICOM) including workflow and data management" * Created as a direct consequence of proprietary manufacture standards that arose in the late 1970s as a result of new native-digital radiography modalities (CT, MRI, US etc.) * A standard for handling, storing, printing, and transmitting information in medical imaging * A result of a longstanding partnership between the American College of Radiology and the National Electrical Manufacturers Association (ACR/NEMA) * Now in its third major version release - PS3 (recognizing that versions 1 and 2 were unsuccessful) * Essentially a TCP/IP protocol (but more recently, also a storage format and even a media storage specification)
  • Slide 28
  • * The standard is both a technical specification for the encoding of image (and waveform data) and the encoding of any additional metadata that supports the image data * Metadata is divided into three classes * Mandatory: the element must be included by the encoding instrument, in a format that is constrained to the normative specification * User: the element is optional and conforms to the vendor-provided normative specification, that may or may not be proprietary. The format need not be constrained to any particular normative specification, other than the global requirement that it fits in the allotted allocation space. * Conditional: the element may be required for inclusion, if certain image acquisition conditions apply. If included, the element must be encoded in a format that is constrained to the normative specification * From the above list, if can be understood that DICOM did not solve the problems of proprietary data elements; rather, it provided a compromise by which manufacturers could still encode some data elements in proprietary format, if they agreed to encode a minimum essential set of data into a commonly agreed upon framework and ontology as such, DICOM is a hybrid open/closed standard
  • Slide 29
  • * Parts of the Standard * PS 3.1: Introduction and Overview * PS 3.2: Conformance * PS 3.3: Information Object * PS 3.4: Service Class Specifications * PS 3.5: Data Structure and Encoding * PS 3.6: Data Dictionary * PS 3.7: Message Exchange * PS 3.8: Network Communication Support for Message Exchange * PS 3.9: Retired (formerly Point-to-Point Communication Support for Message Exchange) * PS 3.10: Media Storage and File Format for Media Interchange * PS 3.11: Media Storage Application Profiles * PS 3.12: Media Formats and Physical Media for Media Interchange * PS 3.13: Retired (formerly Print Management Point-to-Point Communication Support) * PS 3.14: Grayscale Standard Display Function * PS 3.15: Security and System Management Profiles * PS 3.16: Content Mapping Resource * PS 3.17: Explanatory Information * PS 3.18: Web Access to DICOM Persistent Objects (WADO) * PS 3.19: Application Hosting * PS 3.20: Transformation of DICOM to and from HL7 Standards
  • Slide 30
  • * Parts of the Standard * PS 3.1: Introduction and Overview * PS 3.2: Conformance * PS 3.3: Information Object * PS 3.4: Service Class Specifications * PS 3.5: Data Structure and Encoding * PS 3.6: Data Dictionary * PS 3.7: Message Exchange * PS 3.8: Network Communication Support for Message Exchange * PS 3.9: Retired (formerly Point-to-Point Communication Support for Message Exchange) * PS 3.10: Media Storage and File Format for Media Interchange * PS 3.11: Media Storage Application Profiles * PS 3.12: Media Formats and Physical Media for Media Interchange * PS 3.13: Retired (formerly Print Management Point-to-Point Communication Support) * PS 3.14: Grayscale Standard Display Function * PS 3.15: Security and System Management Profiles * PS 3.16: Content Mapping Resource * PS 3.17: Explanatory Information * PS 3.18: Web Access to DICOM Persistent Objects (WADO) * PS 3.19: Application Hosting * PS 3.20: Transformation of DICOM to and from HL7 Standards
  • Slide 31
  • * Specification Document Structure * IODs Image Object Definitions * The core of DICOM * Defines the storage specification for each distinct storage modality * CT Computed Tomography * MR Magnetic Resonance Imaging * US Ultrasound * VL Visible Light (Used for endoscopy, ophthalmology and single field microscopy)
  • Slide 32
  • * Parts of the Standard * PS 3.1: Introduction and Overview * PS 3.2: Conformance * PS 3.3: Information Object * PS 3.4: Service Class Specifications * PS 3.5: Data Structure and Encoding * PS 3.6: Data Dictionary * PS 3.7: Message Exchange * PS 3.8: Network Communication Support for Message Exchange * PS 3.9: Retired (formerly Point-to-Point Communication Support for Message Exchange) * PS 3.10: Media Storage and File Format for Media Interchange * PS 3.11: Media Storage Application Profiles * PS 3.12: Media Formats and Physical Media for Media Interchange * PS 3.13: Retired (formerly Print Management Point-to-Point Communication Support) * PS 3.14: Grayscale Standard Display Function * PS 3.15: Security and System Management Profiles * PS 3.16: Content Mapping Resource * PS 3.17: Explanatory Information * PS 3.18: Web Access to DICOM Persistent Objects (WADO) * PS 3.19: Application Hosting * PS 3.20: Transformation of DICOM to and from HL7 Standards
  • Slide 33
  • * Specification Document Structure * Data Structure and Encoding / Service Classes * The core specification of DICOM interoperability * Extremely technical and detailed * Learning curve is steep
  • Slide 34
  • * Parts of the Standard * PS 3.1: Introduction and Overview * PS 3.2: Conformance * PS 3.3: Information Object * PS 3.4: Service Class Specifications * PS 3.5: Data Structure and Encoding * PS 3.6: Data Dictionary * PS 3.7: Message Exchange * PS 3.8: Network Communication Support for Message Exchange * PS 3.9: Retired (formerly Point-to-Point Communication Support for Message Exchange) * PS 3.10: Media Storage and File Format for Media Interchange * PS 3.11: Media Storage Application Profiles * PS 3.12: Media Formats and Physical Media for Media Interchange * PS 3.13: Retired (formerly Print Management Point-to-Point Communication Support) * PS 3.14: Grayscale Standard Display Function * PS 3.15: Security and System Management Profiles * PS 3.16: Content Mapping Resource * PS 3.17: Explanatory Information * PS 3.18: Web Access to DICOM Persistent Objects (WADO) * PS 3.19: Application Hosting * PS 3.20: Transformation of DICOM to and from HL7 Standards
  • Slide 35
  • * Specification Document Structure * Data Dictionary * Conceived in a pre-XML time * Fixed length binary word mapping terms * Meshes with the overall philosophy of DICOM to be a fixed-field data format * Will need significant updating to be compatible with modern Ontologic framework concepts
  • Slide 36
  • * A brief history of digital imaging, digital annotation, markup and metadata encoding * Primer on image data encoding concepts and requirements * Primer on Image storage data formats * Primer on DICOM * Primer on image metadata concepts * Exploration of OME as a representative image metadata lexicon * Case studies in image metadata challenges, as created by a contemporary lack of a single common framework
  • Slide 37
  • * Metadata is, in essence, data about data with there being two general classes: * Structural Metadata: the design and specification of data structures (i.e. data about the containers of data) * Descriptive Metadata: specification of individual instances of application data (i.e. the data content itself) * Term coined in 1968 by Philip Bagley, in the text "Extension of programming language concepts
  • Slide 38
  • * Key Concepts * Create an encoding system that is both human and machine readable * Whenever possible, constrain concepts and terms to a normative namespace * Unfortunately, namespaces are either non-existent or reduplicated * Moreover, there are a plurality of ways of representing namespaces
  • Slide 39
  • * Select an ontology representation model * Select the plurality of namespaces to reference for images, from a vast field of candidates * Identify domains where a suitable namespace does not exist and build it, consortially, from the ground up * Curate the construct, in perpetuity, recognizing that standards are dynamic constructs All this is hard work
  • Slide 40
  • * A brief history of digital imaging, digital annotation, markup and metadata encoding * Primer on image data encoding concepts and requirements * Primer on Image storage data formats * Primer on DICOM * Primer on image metadata concepts * Exploration of OME as a representative image metadata lexicon * Case studies in image metadata challenges, as created by a contemporary lack of a single common framework
  • Slide 41
  • Slide 42
  • * Another possible starting point for ontology development
  • Slide 43
  • * A brief history of digital imaging, digital annotation, markup and metadata encoding * Primer on image data encoding concepts and requirements * Primer on Image storage data formats * Primer on DICOM * Primer on image metadata concepts * Exploration of OME as a representative image metadata lexicon * Case studies in image metadata challenges, as created by a contemporary lack of a single common framework
  • Slide 44
  • * An investigator seeks to automate the laser capture micro- dissection (LCM) process by converting manual workflow to computer-aided region-of-interest selected workflow * Upon developing the image segmentation algorithms, the investigator determines that laser cut maps used by the Acturis XT instrument are derived from proprietary coordinate maps buried within the multiple field-captured jpg images used by the platform. * Effective turnkey integration will require reverse engineering of the file format.
  • Slide 45
  • Slide 46
  • Slide 47
  • * What are possible mitigating strategies for this type of hidden metadata? * How would an open ontology solve this problem? * Is it reasonable for vendors of such platforms to contractually set in place stipulations banning reverse- engineering of proprietary file formats and if so, what remedies remain available to the investigator?
  • Slide 48
  • * A whole slide imaging vendor makes a new high-throughput scanner available at a substantial discount to a community surgical pathology department, with the proviso (one of many, actually) that said department will exclusively make use of that vendors image/case viewer application. * The department agrees to the terms of the contract and soon discovers that there is no programmatic pathway by when non- proprietary images can be exported out of the system, for consultative review by outside locations. * When approached, the vendor indicates that a image file format conversion software package is available, but the licensing model is per-image, with no discount for volume.
  • Slide 49
  • * What measures could have prevented this interoperability challenge, in the first place? * Are vendors legally able to restrict the use of data comes off of their systems in proprietary format? Who owns the data, anyway? * Are reverse-engineering contractual limitations legally binding / enforceable?
  • Slide 50
  • * A histomorphology investigative team seeks to create a consortial network of investigative partners that will use a common image viewing / image analysis framework for distributed case review. * A review of the contemporary offerings reveals that no software package offers what is needed.
  • Slide 51
  • * What are possible mitigating actions the investigative team can carry out to address this interoperability need? * What standards can be brought to bear immediately to help address the need? * What interoperability needs will remain, after the deployment of a partial image format solution?