KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage...

20
KNIME for Open-Source BioImage Analysis - A Tutorial Christian Dietz and Michael R. Berthold Abstract The open analytics platform KNIME is a modular environment that en- ables easy visual assembly and interactive execution of workflows. KNIME is al- ready widely used in various areas of research, for instance in cheminformatics or classical data analysis. In this tutorial the KNIME Image Processing Extension is introduced, which adds the capabilities to process and analyze huge amounts of images. In combination with other KNIME extensions, KNIME Image Processing opens up new possibilities for inter-domain analysis of image data in an understand- able and reproducible way. 1 Introduction Every day, research involves recording increasing numbers of images as a result of the constantly improving imaging techniques, making them key to life science research. Advanced microscopy allows the acquisition of multidimensional images almost without any user interaction and can therefore generate a plethora of het- erogeneous image data. However, to make sense of the generated image data and finally draw conclusions, an exhaustive analysis of the images has to be conducted. In addition to classical image processing techniques, more sophisticated algorithms are increasingly being applied - from the field of machine learning and data mining (Eliceiri et al, 2012). The extracted information is then further analysed with estab- lished statistical analysis techniques. For instance, detecting objects within images (i.e. segmentation) and the detailed statistical evaluation of the collected results are Christian Dietz University of Konstanz, Chair for Bioinformatics and Information Mining, Universitaetsstrasse 10, 78464 Konstanz, Germany e-mail: [email protected] Michael R. Berthold University of Konstanz, Chair for Bioinformatics and Information Mining, Universitaetsstrasse 10, 78464 Konstanz, Germany e-mail: [email protected] 1

Transcript of KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage...

Page 1: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

KNIME for Open-Source BioImage Analysis -A Tutorial

Christian Dietz and Michael R. Berthold

Abstract The open analytics platform KNIME is a modular environment that en-ables easy visual assembly and interactive execution of workflows. KNIME is al-ready widely used in various areas of research, for instance in cheminformatics orclassical data analysis. In this tutorial the KNIME Image Processing Extension isintroduced, which adds the capabilities to process and analyze huge amounts ofimages. In combination with other KNIME extensions, KNIME Image Processingopens up new possibilities for inter-domain analysis of image data in an understand-able and reproducible way.

1 Introduction

Every day, research involves recording increasing numbers of images as a resultof the constantly improving imaging techniques, making them key to life scienceresearch. Advanced microscopy allows the acquisition of multidimensional imagesalmost without any user interaction and can therefore generate a plethora of het-erogeneous image data. However, to make sense of the generated image data andfinally draw conclusions, an exhaustive analysis of the images has to be conducted.In addition to classical image processing techniques, more sophisticated algorithmsare increasingly being applied - from the field of machine learning and data mining(Eliceiri et al, 2012). The extracted information is then further analysed with estab-lished statistical analysis techniques. For instance, detecting objects within images(i.e. segmentation) and the detailed statistical evaluation of the collected results are

Christian DietzUniversity of Konstanz, Chair for Bioinformatics and Information Mining, Universitaetsstrasse 10,78464 Konstanz, Germany e-mail: [email protected]

Michael R. BertholdUniversity of Konstanz, Chair for Bioinformatics and Information Mining, Universitaetsstrasse 10,78464 Konstanz, Germany e-mail: [email protected]

1

Page 2: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

2 Christian Dietz and Michael R. Berthold

essential stages of a typical image analysis process (Saha et al, 2013; Ljosa et al,2012; Aligeti et al, 2014). For a full exploitation of the outcome, an appropriate vi-sualization of the information or a linkage to other information sources from otherdomains may be necessary to gain new insights.

A large number of monolithic and highly task-oriented software solutions hasbeen proposed to tackle the problems that occur in each step of bio-image analysistasks (Eliceiri et al, 2012). As a result, researchers are required to choose froma set of stand-alone tools, which have to be orchestrated to solve the given task.Typically, two approaches are used that link these kinds of tools: one approach isto transfer the data manually between the tools while the other approach involveswriting a customized program or script to automate a particular process. However,these approaches typically lead to a number of critical problems.

• Transferring the data manually involves a human being and is therefore time-consuming and does not scale with the amount of the acquired images.

• Customized scripts are prone to errors. Furthermore, results calculated with thesehighly problem-specific scripts are frequently unable to be reproduced or reusedby others.

A straightforward, but infeasible solution to the described problems is to builda single monolithic platform that covers the complete range of functionalities re-quired by a bio-image analysis workflow. However, future demands are yet unknownand therefore a closed, proprietary software solution does not scale with the newrequirements that evolve with technological advance. Therefore, the open-sourcecommunity has realized the great need for, and benefit of, closer cooperation byfostering interoperability among individual projects and open, extensible platforms.Following this approach, the open-source analytics platform KNIME (Berthold et al,2008) provides the ability to seamlessly integrate a diverse and powerful collectionof existing software tools and libraries. KNIME is a user-friendly and comprehen-sive open-source data integration, processing, analysis, and exploration platform de-signed to handle large amounts of heterogeneous data. It has been developed since2006 and is used by professionals in industry and academia. As an integration plat-form, KNIME directly combines the advantages of several different tools and do-mains. The integrated tools are encapsulated KNIME nodes, the basic processingunits in KNIME, which in turn can be combined to form so-called workflows. KN-IME workflows not only inherently document the entire analysis process, but theycan also be exported and easily made available to others, who can subsequently re-produce the results or use the workflows as a starting point for their own analysis.To guarantee reproducibility, KNIME makes sure that whenever any of the mod-ules change in any way, for example the change of a version an integrated tool, theprevious version of that module is carefully deprecated but remains part of the plat-form. Hence, workflows published years ago still run with the most recent releasesof KNIME. Once a workflow has been created, it can be applied to hundreds ofthousands of images and other large data sets - even on small-scale devices thanksto the intelligent caching technology of KNIME. This makes KNIME well-suitedfor high-throughput screenings, in which the analysis results can also be quite large.

Page 3: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

KNIME for Open-Source BioImage Analysis - A Tutorial 3

The KNIME Image Processing Extension enhances KNIME by providing algo-rithms and data structures to process and analyse images. To avoid reinventing thewheel, KNIME Image Processing uses and integrates state-of-the-art libraries suchas ImageJ1 (Schindelin et al, 2012) and ImageJ21, SCIFIO2, OMERO (Allan et al,2012), ClearVolume (Royer et al, 2015), ImgLib2 (Pietzsch et al, 2012), CellProfiler(Kamentsky et al, 2011), TrackMate3 and others. These well-known image process-ing tools can not only exchange data and therefore be used in combination, it is alsopossible to link their output to other extensions from completely different domains.For example, once interesting hits have been identified in the image data, the re-spective molecules can be explored with one of the many KNIME cheminformaticsextensions, for instance the KNIME RDKit extension4.

An image processing and analysis workflow typically consists of a subset of sev-eral consecutive steps: Loading images, (pre)-processing, segmentation, tracking,feature extraction, model learning and the subsequent visualization and statisticalanalysis of the information gathered in the previous steps. Different problems canbe incurred in each of these steps, depending on the image analysis task itself. How-ever, by combining KNIME Image Processing nodes with nodes from other avail-able KNIME extensions, it is easy to orchestrate these comprehensible workflows,which can span multiple domains, to solve the issues in KNIME without needingto program a single line of code. In Section 2 the main concepts of KNIME andKNIME Image Processing are introduced. Taking this as a basis, Section 3 goes onto explain an image processing workflow example in a step by step process.

2 Basic Concepts

This section explains how KNIME and its extensions are downloaded and installed.Next, the KNIME User Interface is described, while the last part of this sectioncovers the most fundamental concepts of KNIME Image Processing, which are im-portant for understanding the image processing workflow explained in Section 3.

2.1 Download and Installation

The open analytics platform KNIME can be downloaded and installed from theKNIME website5. KNIME comes packed with an installer for Windows and Macsystems. Linux users simply have to extract KNIME. As KNIME is a plugin-

1 http://imagej.net.2 http://scif.io.3 http://fiji.sc/TrackMate.4 http://tech.knime.org/community/rdkit.5 http://www.knime.org.

Page 4: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

4 Christian Dietz and Michael R. Berthold

based system, there are several extensions that are not part of the basic KNIMEinstallation. These extensions are easily installed via so-called update-sites. KN-IME Image Processing6, for example, is installed from the Trusted CommunityContributions site. For details on how to install additional plugins, please seehttp://tech.knime.org/community.

2.2 KNIME User Interface

Fig. 1 KNIME User Interface

Figure 1 shows the KNIME User Interface. The KNIME Explorer (A) depictsthe various locations where workflows can be stored or uploaded. By default, twolocations are available: (i) The KNIME Example server on which several exam-ple workflows can be found. (ii) The LOCAL workspace, which was selected onthe first start-up of KNIME. A new workflow can be created with File > New >New KNIME Workflow. This new, empty workflow is accessed via the LOCALworkspace. Workflows in KNIME are essentially graphs that connect nodes (atomicprocessing units in KNIME), and visually model the individual processing steps ofa certain task. A Double-Click on the workflow in the KNIME Explorer (A) opens itin the Workflow Editor (C). The user is now able to drag&drop nodes from the NodeRepository (B) onto the canvas of the workflow editor, to compose complex yet clearworkflows, for example to process and analyse images. The nodes can then be con-nected by drawing a line from the output node to the input node, enabling the datato be passed from node to node. Additionally, each KNIME node provides a Node

6 http://knime.imagej.net.

Page 5: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

KNIME for Open-Source BioImage Analysis - A Tutorial 5

Description (D) explaining which input data it requires, explanations of the requiredparameters, what the node does with the incoming data and the output of the node.The Node Repository (B) contains all of the KNIME nodes that are part of the cur-rently installed KNIME extensions. The default KNIME Open Analytics Platforminstallation provides a basic set of nodes for data manipulation, data mining, a se-lection of data views, node control, time series analytics and basic IO and Databasenodes. KNIME nodes for image analysis can be added by installing more KNIMEextensions, as described in Section 2.1. The KNIME Console (E) view displays errorand warning messages in order to provide feedback to the user. Finally, the Outline(F) view provides an overview of the whole workflow even if only a small part isvisible in the workflow editor and the Favorite Nodes (G) provide quick access topersonal favorite, frequently and recently used nodes.

2.3 Handling of Images and Labelings in KNIME

Fig. 2 A typical KNIME table with five columns. Each column of the table has a certain data-type,e.g. numbers, text, molecules or images.

Page 6: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

6 Christian Dietz and Michael R. Berthold

A workflow usually starts with a node, which represents a data source, e.g.connecting a database, reading a text file, or reading images. The data are trans-ported between the connected nodes, typically organized in data tables, consistingof columns of certain (extensible) data-types and an arbitrary number of rows. Atypical data table is depicted in Figure 2, with each column of the table comprisingan arbitrary object type, e.g. numbers, text or molecules. KNIME Image Processingadds two new column types to the mix: images and labelings. Labelings representthe segmentation of an image - the partitioning of an image into segments. As op-posed to images, labelings store one or more labels for each pixel, instead of numericvalues. A label associates each pixel with an object, class value, track number or anyother information.

Contrary to what might be assumed initially, images and labelings stored in asingle cell of a data table can be of arbitrary dimensionality. For example, a table cellmay contain a multi-channel video or z-stack. To accomplish n-dimensional imageprocessing, KNIME uses ImgLib2 as its underlying programming framework.

2.4 Image Processing Specific Dialog Components

2.4.1 Dimension Selection

Fig. 3 The configuration dialog of the Image Normalizer node.

In order to provide the user with the flexibility to choose how images and la-belings with more than just two dimensions are to be processed, most of the nodesprovided in the KNIME Image Processing Extension offer a so-called Dimension

Page 7: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

KNIME for Open-Source BioImage Analysis - A Tutorial 7

Selection dialog (see Figure 3). This dialog enables users to select the dimensionson which an algorithm will operate. For instance in the case of a simple Z-Stack, theImage Normalization node can be configured so as to apply normalization to eachX,Y plane either independently or for the entire X,Y,Z cube by selecting X,Y or X,Y,Zin the dimension selection.

2.4.2 Column Selection

Many KNIME Image Processing nodes, whose input is an image or labeling, operateon a row-to-row basis. This means that - given an input image - another image orlabeling is calculated based on the algorithm implemented in the node. The user candetermine the layout of the output table of these nodes with a dialog componentcalled Column Selection. Generally, a user has three options: the resulting columnwith images or labelings can either be appended to the incoming table, replace theexisting column or an entirely new table can be created.

2.5 Visualization of Images and Labelings

Fig. 4 The KNIME Image Processing Image Viewer allows users to inspect the images in moredetail. Users can browse through the various dimensions of an image, inspect the values of thepixels and obtain information about important meta-data.

Page 8: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

8 Christian Dietz and Michael R. Berthold

KNIME Image Processing enables users to explore images and labelings in moredetail, which is especially useful if an image or labeling comprises more than twodimensions (e.g. z-stacks or videos). The user can access this view by Right Click >Open Image Viewer (see Figure 4) on a KNIME Image Processing node. Another,more specific view is the Interactive Segmentation View node. It can be used tovalidate segmentation, classification or tracking results as it offers an overlay viewfor images and labelings. Additional visualization plugins can be installed to extendKNIME Image Processing. For instance the ClearVolume Integration offers fast,GPU accelerated 5D volume rendering and can easily be used within KNIME.

3 Step by Step to Phenotype Classification

In this section we walk step by step through an image processing workflow. Theworkflow classifies cells as either positive or negative according to their phenotype(see Figure 5)7.

Fig. 5 Two images from the publicly available high-content screening image data provided in(Ljosa et al, 2012). The left image contains positive cells and the right, negative cells.

The cells in this example stem from images from the publicly available high-content screening image data provided in (Ljosa et al, 2012) (human cytoplasm-nucleus translocation assay, available from the Broad Bioimage Benchmark Collec-tion). The images were taken from stably transfected osteosarcoma cells seeded ina 96 well plate and contain the information about the translocalization of the Fork-head (FKHR-EGFP) fusion protein from the cytoplasm (Channel 2) to the nucleus(Channel 1)8.

7 The entire Phenotype Classification workflow is available for download athttp://knime.imagej.net/aaec.8 For detailed information see http://www.broadinstitute.org/bbbc/BBBC013/. Please note: TheBMP images available on the website are already splitted into the individual channels.

Page 9: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

KNIME for Open-Source BioImage Analysis - A Tutorial 9

The example workflow is depicted in Figure 6. The individual parts of the work-flow are organized in so-called meta nodes to reduce the complexity of the workflow.Meta nodes are nodes that contain subworkflows, i.e. in the workflow they look likea single node and yet they can contain many nodes and even more meta nodes. Thisprovides a series of advantages such as enabling the user to design much larger,more complex workflows and the encapsulation of specific actions.

Fig. 6 The workflow discussed in this tutorial.

A Double-Click on the meta node allows the user to have a closer look at whatis inside. In the following, the content of each meta node will be explained in moredetail. However, it is important to note that the individual parts of the workflow areeasily replaced by other nodes possibly more suitable for other image processingtasks. Besides KNIME Image Processing, integrations with for example R, Python,Weka and especially KNIME itself offer a wide range of functionality for morecomplex visualizations and advanced machine learning and data mining techniques.

3.1 Loading Images

Fig. 7 Detailed look inside the Loading Images meta node.

To date, the proprietary file formats of microscope image analysis software havemade it difficult for open-source platforms to load images generically. However,SCIFIO with its integration of the BioFormats (Linkert et al, 2010) library, canconvert approx. 125 file formats used by various microscope manufacturers, such as

Page 10: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

10 Christian Dietz and Michael R. Berthold

Zeiss LSM, Metamorph Stack, Leica LCS or DICOM, into a KNIME compatibleformat. In KNIME, this functionality can be accessed via the Image Reader node,which integrates these libraries. A user can either select the images in the ImageReader configuration dialog (Right Click > Configure) or provide URLs to theimages as an input table coming from another node, e.g. the List Files node. Theresulting workflow of the latter approach is illustrated in Figure 7.

The List Files node is used to list the URLs of all images of a certain folder. Con-necting the node to the Image Reader node enables the user to configure the ImageReader node (Right-Click on Image Reader > Configure), such that it loads allimages into KNIME from these URLs. In this configuration: Tab: Additional Op-tions > File name column in optional column has to be set to the column of theincoming table that contains the URLs to the requested images.

3.2 Preprocessing Images

Fig. 8 Detailed look into the Preprocessing meta node. The images are split into Nuclei and Cy-toplasm channels and renamed accordingly.

KNIME Image Processing offers a range of general (pre-)processing techniquesto enhance image quality: Standard linear and non-linear filters are available as aremorphological and binary operations, pixel-wise image arithmetics, edge-detectors,background subtraction algorithms, projections or the nodes for the manipulation ofthe dimensionality, such as splitting and merging images. Additionally, the ImageJ-Macro node, which is part of the KNIME Image Processing - ImageJ Integration9,allows the execution of arbitrary ImageJ1 macros on a huge amount of image data.

The image preprocessing used in this tutorial is implemented in the Preprocess-ing meta node (see Figure 6). First of all, the channels of the images are split, whichresults in a table with two columns, the Nuclei and Cytoplasm (see Figure 8). Next,each channel is preprocessed individually. The images in the Nuclei column suffer

9 For details and installation instructions see https://tech.knime.org/community/imagej.

Page 11: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

KNIME for Open-Source BioImage Analysis - A Tutorial 11

Fig. 9 Detailed look into the Background Subtraction of Images in Nuclei Channel meta node.

from non-uniform illumination, which makes it difficult to apply automated seg-mentation methods. Therefore in the Background Subtraction of Images in NucleiChannel meta node the quality of the images in the Nuclei column is enhanced.Here, a very simple background subtraction technique was chosen, especially todemonstrate how individual KNIME nodes are easily combined to create an exist-ing or completely new processing or analysis technique without any programming.Figure 9 depicts the workflow implementing the algorithm: The images in the Nu-clei column are filtered with a very large kernel (sigma=100.0) using the GaussianConvolution node and the output is appended as an additional column to the inputtable. The mean value of the pixel intensities is calculated for each of the filteredimages using the Image Feature node.

Fig. 10 Configuration dialog of the Image Calculator node.

Page 12: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

12 Christian Dietz and Michael R. Berthold

Finally, the resulting background corrected images are obtained by subtractingthe sum of the filtered image and the mean of the filtered image at each pixel positionfrom the original image using the Image Calculator node.

The configuration of the Image Calculator node is shown in Figure 10. The lastnode in the Background Subtraction of Images in Nuclei Channel meta node is theImage Converter. This node can be used to normalize and scale the intensity valuesof the images to a certain range. In this tutorial we normalize and scale the valuesbetween 0 and 255 (=UnsignedByteType) to reduce the amount of required memory.

Finally, both the background-corrected images from the Nuclei column andthe images from the Cytoplasm column are filtered with a small Gaussian kernel(sigma=2.0) in the Preprocessing meta node. The results are appended to the table.

3.3 Segmentation

In order to classify the cells into positive and negative ones according to their pheno-types, both the nuclei and their cytoplasm have to be segmented. The subworkflowfor this segmentation is encapsulated in the Segmentation meta node and is shownin Figure 11.

Fig. 11 Workflow to segment the nuclei and cytoplasm, respectively.

The images in the column Nuclei are segmented using the well-known Otsu(Otsu, 1975) thresholding algorithm, which is implemented in the Global Thresh-older node. The output of the node is an image consisting only of black and whitepixels. At each position of this binary image indication is given of whether the pixelbelongs to a nucleus or to the background of the image. In order to split potentialtouching objects into the individual nuclei, the ImageJ1 Watershed Macro is exe-cuted on the binary images using the ImageJ1 Macro node, which is part of theKNIME Image Processing - ImageJ Integration. The subsequent Connected Com-ponent Analysis node derives a labeling from the binary images, which determineswhether each pixel belongs to an individual nuclei as opposed to determining merelywhether the pixel belongs to the nuclei or the background. The result is appended

Page 13: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

KNIME for Open-Source BioImage Analysis - A Tutorial 13

to the table. Thanks to the connected Labeling Filter node, objects that are eithertoo small or too big can be removed from the labeling by manually defining theexpected size of the nuclei. In this workflow we set the minimum size of nuclei to50. The remaining nuclei now serve as seeding points for the segmentation of thecytoplasm. Starting at each nucleus in parallel, the region growing algorithm im-plemented in the Voronoi Segmentation node extends the seeding segments until nomore pixels can be added to the individual segments. This is the case if a pixel hasalready been added to another segment or the intensity value of a pixel is lower thana manually defined threshold. The Voronoi Segmentation was configured to returnthe segmentation of the cytoplasm without the seeds, obtained with a threshold of25 and Fill Holes activated.

Figure 12 shows the resulting segmentation of the images in the Cytoplasm chan-nel, using the Interactive Segmentation View node.

Fig. 12 Results of the Voronoi Segmentation.

For other segmentation tasks, KNIME Image Processing offers a wide range ofsimple and more advanced segmentation techniques. Besides established algorithmssuch as Graph Cuts or Local Thresholding, the KNIME Image Processing - Super-vised Image Segmentation (SUISE) extension comprises nodes for supervised pixeland segment classification10.

10 For details see the example workflows on http://tech.knime.org/supervised-image-segmentation.

Page 14: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

14 Christian Dietz and Michael R. Berthold

3.4 Feature Extraction

After certain objects have been identified and segmented, features can be calculatedfrom either the derived labelings alone or the combination with their source images.Features numerically describe the individual objects and are instrumental in drawingconclusions from the acquired images. These features are therefore part of most ofthe image processing and analysis tasks.

Fig. 13 Detailed look into the Feature Extraction meta node. First Order Statistics, Geometric andHaralick Features are extracted individually for the Cytoplasm and the Nuclei channel.

KNIME Image Processing provides several feature implementations, for exam-ple simple first order statistics of the intensity values of a segment (mean intensity,standard deviation, kurtosis etc.) or geometric properties of a segment (roundness,size, convexity, Zernike Moments (Khotanzad and Hong, 1990), Fourier shape de-scriptors, etc.), as well as more complex texture measurements (Haralick (Haralicket al, 1973), Tamura (Tamura et al, 1978), etc.).

Figure 14 shows the output table of the Joiner node in Figure 13, which combinesthe results from the preceding Image Segment Feature nodes by joining the rows ofthe individual tables according to their RowId. Given the nuclei (Channel 1), thecytoplasm (Channel 2) and their corresponding labelings, which were derived in theprevious Segmentation meta node, these nodes calculate for each identified objectthe first order statistics, haralick texture features and several geometric properties.Each row of the output table of the Feature Extraction meta node corresponds to thenumerical descriptions of a single object.

Page 15: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

KNIME for Open-Source BioImage Analysis - A Tutorial 15

Fig. 14 Output table of Image Segment Feature node. Each row corresponds to the numericalmeasurements of a cells nucleus and cytoplasm.

3.5 Model Learning

The output of the Feature Extraction meta node can now be connected to KNIMEnodes, which allow operations to be performed on numerical data.

Fig. 15 Detailed look into the Model Learning meta node.

Typical examples include nodes for statistical testing, machine learning and datamining or visualization. In this example, a supervised classification of the nucleiand cytoplasm is performed based on the calculated features, in order to determinewhether it is considered positive or negative (see Figure 15). Therefore, as a first

Page 16: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

16 Christian Dietz and Michael R. Berthold

step, the ground-truth data, which is part of the publicly available high-contentscreening image data set, is read into KNIME as a text file and joined with thealready loaded image data. The ground-truth data contains indications of the classesfor several cells, which then serve as the training data for a supervised learningalgorithm11.

However, if this ground-truth data is not available, users can manually createthis information by using the Interactive Labeling Editor node. Given the ground-truth and the numerical description of the nuclei and cytoplasm, the Decision TreeLearner node can be used to train a decision tree model, which in turn can be ap-plied to cells as yet unseen using the Decision Tree Predictor node (see 15). Theoutput of the Decision Tree Predictor node comprises an additional column withthe classification result.

The Decision Tree model with default configuration settings is used in this exam-ple. However, other machine learning techniques can easily be applied instead, forexample Support Vector Machines (Scholkopf and Smola, 2001), Random Forests(Breiman, 2001) or any other algorithm, which are either included in KNIME oravailable as a KNIME extension, such as Weka, R or Python.

Fig. 16 Boosting of a Naive-Bayes learner.

Figure 16, for instance, depicts the well-known Boosting algorithm, which isoffered with the KNIME Ensemble Learning plugin and also comprises nodes forBagging or Stacking.

3.6 Evaluation and Validation

Users often want to manually explore and validate the information extracted fromthe raw images, as the features or the results of a classification task. KNIME itselfoffers a wide range of functionalities to visualize numerical data. Scatter plots, line

11 For details see Phenotype Classification workflow at http://knime.imagej.net/aaec.

Page 17: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

KNIME for Open-Source BioImage Analysis - A Tutorial 17

Fig. 17 Detailed look into the Evaluation and Validation meta node.

plots, bar plots or histograms are just some examples of those that are offered. Evenmore plots are available with the R and Python extensions in KNIME. Furthermore,KNIME provides nodes for statistical significance testing, for example T-Tests orANOVA Testing.

Fig. 18 Line-Plot visualizing the classification results of the Model Learning meta node.

The Evaluation and Validation meta node comprises one example of how to visu-alize the results from the classification conducted in the Model Learning (see Figure17). The resulting line-plot (see Figure 18) contains the counts of positive and neg-ative cells of images which are part of row D in the well-plate. First, the Row Filternode removes all images that are not part of row D. The subsequent Group By nodecounts the number of positive and negative cells for each image in row D, while thePivoting node arranges the KNIME table, such that the cell counts appear next to

Page 18: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

18 Christian Dietz and Michael R. Berthold

each other. It can be observed that the number of positive cells increases over thecolumns of row D of the well-plate, which meets the expecations 12.

4 Conclusions

In this tutorial the basic concepts of KNIME Image Processing are introduced andthe advantages of combining different software packages in a single understandable,multi-domain workflow through KNIME are demonstrated by means of an exampleworkflow for phenotype classification. The applied techniques in this use-case aresimple and exemplary. However, the already published workflows in (Saha et al,2013; Lodermeyer et al, 2013; Gunkel et al, 2014; Strauch et al, 2014; Aligeti et al,2014) solve simple problems like counting cells or measuring the intensity of seg-mented cells, as well as more complex tasks involving machine learning and datamining techniques. For instance, in (Gunkel et al, 2014) the entire image acquisitionprocess was controlled by a KNIME workflow. The images are analysed on-the-flyand feedback is provided instantly to the microscope. Avoiding the acquisition andstorage of uninteresting images, screening costs can be reduced by 90 %.

KNIME has also been successfully applied in other fields of research in lifesciences, for instance to pharmacophore identification in classic HTS data (HighThroughput Screening) or outlier detection in medical claims. Use cases also existfrom entirely unrelated sectors. KNIME workflows for segmentation of customers,churn analysis, market basket analysis, sentiment analysis on social media data (us-ing the KNIME Text Processing and Network Analysis extensions) as well as creditscoring based on historical data are just a few examples13.

The wide range of application areas is a direct result of KNIME’s openness. Theneed for open platforms in classic data analytics is even more pressing now, whendata analysts have easy access to an ever-growing number of internal and externaldata sources. To tackle this challenge they need quick and easy access to best-of-breed tools to intuitively explore new analysis ideas unburdened by the artificialbarriers of closed environments. As a result, open platforms are much more power-ful than any monolithic application can ever be. Due to the simplicity of mixing andmatching inhouse, legacy, and external technology within the same intuitive envi-ronment, analysts can choose which data and tools they want to use instead of beingrestricted to the tools available in a proprietary toolbox.

12 see http://www.broadinstitute.org/bbbc/BBBC013/ for details on the plate design13 See https://www.knime.org/applications.

Page 19: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

KNIME for Open-Source BioImage Analysis - A Tutorial 19

References

Aligeti M, Behrens RT, Pocock GM, Schindelin J, Dietz C, Eliceiri KW, Swan-son CM, Malim MH, Ahlquist P, Sherer NM (2014) Cooperativity among rev-associated nuclear export signals regulates hiv-1 gene expression and is a deter-minant of virus species tropism. Journal of Virology 88(24):14,207–14,221, DOI10.1128/JVI.01897-14, http://jvi.asm.org/content/88/24/14207.full.pdf+html

Allan C, Burel JM, Moore J, Blackburn C, Linkert M, Loynton S, MacDonald D,Moore WJ, Neves C, Patterson A, et al (2012) Omero: flexible, model-driven datamanagement for experimental biology. Nature methods 9(3):245–253

Berthold MR, Cebron N, Dill F, Gabriel TR, Kotter T, Meinl T, Ohl P, Sieb C, ThielK, Wiswedel B (2008) KNIME: The Konstanz information miner. Springer

Breiman L (2001) Random forests. Machine learning 45(1):5–32Eliceiri KW, Berthold MR, Goldberg IG, Ibanez L, Manjunath B, Martone ME,

Murphy RF, Peng H, Plant AL, Roysam B, et al (2012) Biological imaging soft-ware tools. Nature methods 9(7):697–710

Gunkel M, Flottmann B, Heilemann M, Reymann J, Erfle H (2014) Integratedand correlative high-throughput and super-resolution microscopy. Histochem-istry and Cell Biology 141(6):597–603, DOI 10.1007/s00418-014-1209-y, URLhttp://dx.doi.org/10.1007/s00418-014-1209-y

Haralick RM, Shanmugam K, Dinstein IH (1973) Textural features for image clas-sification. Systems, Man and Cybernetics, IEEE Transactions on (6):610–621

Kamentsky L, Jones TR, Fraser A, Bray MA, Logan DJ, Madden KL, Ljosa V,Rueden C, Eliceiri KW, Carpenter AE (2011) Improved structure, function andcompatibility for cellprofiler: modular high-throughput image analysis software.Bioinformatics 27(8):1179–1180

Khotanzad A, Hong YH (1990) Invariant image recognition by zernike moments.Pattern Analysis and Machine Intelligence, IEEE Transactions on 12(5):489–497

Linkert M, Rueden CT, Allan C, Burel JM, Moore W, Patterson A, Loranger B,Moore J, Neves C, MacDonald D, et al (2010) Metadata matters: access to imagedata in the real world. The Journal of cell biology 189(5):777–782

Ljosa V, Sokolnicki KL, Carpenter AE (2012) Annotated high-throughput mi-croscopy image sets for validation. Nat Methods 9(7):637

Lodermeyer V, Suhr K, Schrott N, Kolbe C, Stuerzel CM, Krnavek D, Muench J, Di-etz C, Waldmann T, Kirchhoff F, Goffinet C (2013) 90k, an interferon-stimulatedgene product, reduces the infectivity of hiv-1

Otsu N (1975) A threshold selection method from gray-level histograms. Automat-ica 11(285-296):23–27

Pietzsch T, Preibisch S, Tomancak P, Saalfeld S (2012) Imglib2 generic imageprocessing in java. Bioinformatics 28(22):3009–3011

Royer LA, et al (2015) Clearvolume: open-source live 3d visualization for light-sheet microscopy. Nature methods doi:101038/nmeth3372 (in press)

Saha AK, Kappes F, Mundade A, Deutzmann A, Rosmarin DM, Legendre M,Chatain N, Al-Obaidi Z, Adams BS, Ploegh HL, Ferrando-May E, Mor-VakninN, Markovitz DM (2013) Intercellular trafficking of the nuclear oncoprotein dek.

Page 20: KNIME for Open-Source BioImage Analysis - A Tutorial€¦ · KNIME for Open-Source BioImage Analysis - A Tutorial ... classical data analysis. ... research involves recording increasing

20 Christian Dietz and Michael R. Berthold

Proceedings of the National Academy of Sciences 110(17):6847–6852, DOI10.1073/pnas.1220751110

Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T,Preibisch S, Rueden C, Saalfeld S, Schmid B, et al (2012) Fiji: an open-sourceplatform for biological-image analysis. Nature methods 9(7):676–682

Scholkopf B, Smola AJ (2001) Learning with kernels: support vector machines,regularization, optimization, and beyond

Strauch M, Luedke A, Muench D, Laudes T, Galizia CG, Martinelli E, Lavra L,Paolesse R, Ulivieri A, Catini A, Capuano R, Di Natale C (2014) More thanapples and oranges : detecting cancer with a fruit fly’s antenna

Tamura H, Mori S, Yamawaki T (1978) Textural features corresponding to visualperception. Systems, Man and Cybernetics, IEEE Transactions on 8(6):460–473