Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas...

51
Balloon Extraction from Complex Comic Books Using Edge Detection and Histogram Scoring João Miguel Coronha Correia Submitted to University of Beira Interior in partial fulfillment of the requirement for the degree of Master of Science in Tecnologia e Sistemas de Informação Supervised by Prof. Dr. Abel João Padrão Gomes Departmento de Informática Universidade da Beira Interior Covilhã, Portugal http://www.di.ubi.pt

Transcript of Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas...

Page 1: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

Balloon Extraction from Complex ComicBooks Using Edge Detection and

Histogram Scoring

João Miguel Coronha Correia

Submitted to University of Beira Interior in partial fulfillment of the requirement for the

degree of

Master of Science in Tecnologia e Sistemas de Informação

Supervised by Prof. Dr. Abel João Padrão Gomes

Departmento de Informática

Universidade da Beira Interior

Covilhã, Portugal

http://www.di.ubi.pt

Page 2: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado
Page 3: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

Acknowledgments

I would like to express my gratitude to my advisor, Prof. Dr. Abel João Padrão Gomes, for

his scientific guidance during the development of this master thesis and related affairs.

João Correia

Covilhã, Portugal

October 22, 2012

iii

Page 4: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

iv

Page 5: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

Resumo

Ao longo dos anos várias tecnologias disruptivas tem alterado a maneira como acedemos

aos conteúdos e informações presentes nos vários meios de comunicação e formas de ex-

pressão cultural. Recentemente, com o aumento do mercado dos dispositivos portáteis como

smartphones, tablets, portáteis e ultrabooks, tornou-se necessário adaptar os conteúdos para

este formato de ecrã reduzido. Adaptações em livros, filmes, ou música foram relativamente

rápidas e bem sucedidas porque as suas caracteristicas intrinsecas puderam ser modificadas

para uma correcta representação em ecrãs pequenos e de manuseamento táctil. Os livros

de texto foram particularmente bem sucedidos nestes novos formatos. Os blocos de texto

moldam-se facilmente a qualquer espaço, e o tamanho de letra pode ser ajustado para permi-

tir maior visibilidade. Mas alguns tipos de livros não se adaptaram tão bem, nomeadamente

os livros, ou revistas, de banda desenhada. A sua estrutura, composta por arte e texto, não

é moldável, e as suas formas complexas não são tão adaptáveis como livros de texto. No

entanto, existe um mercado potencial cada vez maior de jovens com acesso a dispositivos

inteligentes que são, simultaneamente, o público alvo preferencial para as revistas de banda

desenhada. Torna-se relevante, então, arranjar forma de adaptar, o melhor possível, os

conteúdos aos dispositivos.

Um dos maiores problemas da adaptação aos dispositivos móveis é a legibilidade do texto

das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o

texto está representado como parte do desenho, dentro de balões, como fazer para encontrar

apenas esses elementos relevantes? Como realçar os balões, tornando-os mais visíveis, ou

maiores, sem perder o contexto visual dentro da página? Como evitar aumentar o tamanho

da imagem através de um nível de zoom excessívo e permitir ao mesmo tempo discernir o

conteúdo? Como faze-lo para bandas desenhadas mais complexas que web comics com três

paineis e estilos artísticos minimalistas?

v

Page 6: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

Esta dissertação apresenta uma possível solução para este problema, ao apresentar um

algoritmo para identificação de balões dentro das páginas de banda desenhada, com um

procedimento aplicável a qualquer tipo de banda desenhada, mesmo as mais elaboradas.

vi

Page 7: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

Abstract

Over the years, several disruptive technologies have changed the way we access content and

information on various media and means of cultural expression. Recently, with the expan-

sion of the market for hand held devices like smart phones, tablets, laptops and ultra books,

it has become necessary to adapt content to this reduced screen size. Changes in books,

movies or music were relatively fast and successful because their intrinsic characteristics

could be modified so that they were correctly represented in small screen sizes and handled

with tactile interfaces. Text books were particularly successful in this new formats. Text

blocks are easily molded to any space, and font size can be adjusted to allow for better

visibility. But some types of books were not so easily adapted, namely comic books. Its

structure, with art and text, is not adjustable, and its complex shapes are not as easy to

change as text books. Nevertheless, there is a growing potential market of young people

with access to smart devices which, simultaneously, are the preferential target audience for

comic books. It is relevant, then, to find ways to adapt, as best as possible, the content to

the devices.

One of the greatest problems in the adaptation to mobile devices is the readability of comic

book text when viewing it in smaller than normal sizes. Since the text is embedded in the

art, inside balloons, how to find just those relevant elements? How to enhance the balloons,

making them more visible, or bigger, without losing visual context inside the page? How to

avoid enlarging the image with excessive zoom level and at the same time allow the content

to be understood? How to do it for comic books more complex than three panel web comics

with minimalistic art styles?

This dissertation presents a possible solution to this problem, by introducing an algorithm to

identify balloons inside comic book pages, with a method that works for any type of comic

book, even the most complex ones.

vii

Page 8: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado
Page 9: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

Contents

Resumo v

Abstract vii

Contents ix

List of Figures xi

List of Tables xiii

1 Introduction 11.1 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Research Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Target Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Balloon Extraction from Complex Comic Books Using Edge Detection AndHistogram Scoring 52.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Page Layout and Terminology . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Balloon Extraction Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.1 Page Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.1.1 Conversion to gray scale . . . . . . . . . . . . . . . . . . 15

2.3.1.2 Sobel-based edge detection . . . . . . . . . . . . . . . . 16

2.3.1.3 Negative pages . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.1.4 Flood fill and region extraction . . . . . . . . . . . . . . 18

ix

Page 10: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

x CONTENTS

2.3.2 Balloon extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.2.1 Region culling . . . . . . . . . . . . . . . . . . . . . . . 20

2.3.2.2 Region scoring . . . . . . . . . . . . . . . . . . . . . . . 21

2.3.2.3 Region sorting . . . . . . . . . . . . . . . . . . . . . . . 21

2.3.2.4 Region filtering . . . . . . . . . . . . . . . . . . . . . . 22

2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4.1 Analysis of results . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.4.2 Comparison to other algorithms . . . . . . . . . . . . . . . . . . . 29

2.4.3 OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.5 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3 Conclusions 33

References 35

Page 11: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

List of Figures

2.1 Alpha Flight 6, January 1984, page 9 [1]. . . . . . . . . . . . . . . . . . . 8

2.2 Comic book page components. . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 (left) Original image. The Amazing Spider-Man 679, April 2012, story page

5; (right) Gray scale image. . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 (left) Book page after applying the Sobel edge detector; (right) Negative of

the Sobelized book page. . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Flood fill method comparison. (left) standard flood fill extraction, (right)

modified flood fill extraction. . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.6 (top) Typical histogram of a balloon region; (bottom) histogram of an ordi-

nary (non-balloon) region. . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.7 Balloon regions extracted from the book page shown in Fig. 2.3. Although

not visible here, the regions have a balloon shape indeed. . . . . . . . . . . 23

2.8 Special case balloons that were successfully recognized. Balloons having

different text colors than other balloons, different text colors inside the same

balloon, shapes different from standard balloons, wavy text, different font

faces in the same balloon. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

xi

Page 12: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado
Page 13: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

List of Tables

2.1 Criteria for size-based region culling. . . . . . . . . . . . . . . . . . . . . 20

2.2 Extraction results for [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.3 Extraction results for [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.4 Extraction results for [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.5 Extraction results for [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6 Extraction results for [6] . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.7 Extraction results for [7] . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.8 Extraction results for [8] . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.9 Extraction results for [9] . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.10 Extraction results for [10] . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.11 Extraction results for [11] . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.12 Extraction results for [12] . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.13 Extraction results for [13] . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

xiii

Page 14: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado
Page 15: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

Chapter 1

Introduction

A hand held device, or mobile device, or even hand held computer, is a small computing

device defined by a small screen size and touch input or minimal keyboard. The category

comprises smart phones, tablets or PDA’s. These types of devices started to appear in mass

production at the turn of the century, driven by a need for always accessible communi-

cation and content. Early usage scenarios, beyond establishing phone calls, involved text

messaging, e-mail or simple web browsing, but with the increase in network bandwidth,

cpu power and battery life, those usage scenarios grew more complex and now encompass

everything from book reading to watching movies or video chat. This device market has

seen an explosive growth in the last few years, particularly since the introduction of Apple’s

Ipad device. While some types of content like text books are perfect matches for the devices,

others require some trade-offs when viewed in devices with reduced screen sizes. One of

these content types was comic books.

Comic books, as an art form, have been present in human cultures for many years, with

some possible representations as far back as the middle ages [14], and are a popular form

of entertainment and story telling around the world. With the convenience of hand held

devices, it was logical to expect comic books to be widely spread in usage on those devices,

given the almost perfect match between target audiences for comic books and hand held

devices. Yet, despite the fact that the major publishers have, recently, begun to distribute

digital versions of their comics[15][16], the adoption has been, at best, slow, when compared

to e-books, music or movies. This is, in no small measure, caused by the inadequacy of the

traditional comic book to the small screen size. Currently, one of two things happens when

1

Page 16: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

2 CHAPTER 1. INTRODUCTION

you try to read a comic book in a hand held device:

- The digital comic book has been pre-processed and embedded information regarding

text has been included in the file.

- The digital comic book has not been pre-processed and in order to correctly read it,

the reader has to manually zoom in on all the text elements of the comic to be able to

read them and comprehend the story.

Both of these approaches have drawbacks. The first requires more pre-publishing time for

new comic books before reaching the market, and more manual processing, hence making

them more expensive than required and also delays the distribution, and the second makes

the reading experience very user-unfriendly and ultimately leads to potential readers not

choosing these devices. Since digital comic book have multiple advantages over traditional

paper comic books, like easier storage, global availability, potentially lower price and faster

distribution, better searching and indexing and, even, better quality, it is quite distressing not

to take advantage of it. Of no lesser importance is the fact that digital comic books allow

younger (or new) readers access to comics that are no longer available in paper format[17].

As in other areas like automatic document digitalization[18] or video processing[19], there

are digital image processing techniques that can be applied to comic books to provide some

of the lacking functionalities, namely, correctly identifying the elements of the comic book

image that contain text - the balloons. Relying on edge detection methods, like Sobel, flood

fill and histogram analysis[20], the algorithm presented in this thesis describes an answer

to the problem of how to extract the balloons from comics book pages, to improve usability

and provide a better experience to the end user and allow for an adoption of digital comic

books comparable to text e-books.

1.1 Motivation and Objectives

The main goal of this thesis is to demonstrate the feasibility of applying digital image

processing techniques to comic book images and correctly identify and extract the balloons,

and in doing so, make it possible to use small screen devices to adequately read comic

books, in the same way as those devices are used to read traditional text books today.

Existing proposed solutions to this problem have almost always focused more in simple

Page 17: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

1.1. MOTIVATION AND OBJECTIVES 3

web comics, which are, traditionally, comics with much simpler art style and much smaller

in scope when compared to commercial comic books published by companies like Marvel

Comics, DC Comics or Dark Horse. Those approaches have, invariably, produced extremely

good results for those types of web comics, but less than adequate [21] for commercial

comic books. Other solutions are targeted to specific types of comics, like manga, and

target specific characteristics like top-to-bottom text flows for identification of text balloons

[22]. It has been, so far, impossible to find a solution that has acceptable level of success

extracting balloons accross all types of comics.

This problem of identifying the balloons is not, by any means, a trivial task[18]. Balloons

can have any shape, size or be in any location inside the comic book page. Artistic style can

make the balloons spread over multiple panels, or overlapping each other. Text inside the

balloons can have any color or font face, sometimes even inside the same balloon [23][14].

Another important consideration is that hand held devices are battery powered, so any

approach to this problem must account for the fact that, cpu intensive operations will drain

that battery faster, and, as such, result in a bad user experience. This would work against

the adoption of these devices for comic book reading.

This has been an insurmountable problem, so much so, that, despite the fact that the main

publishers have digital distribution systems in place, the actual digital comic books they sell

are pre-processed manually to identify the balloons. This is, in a way, comparable to the

medieval form of book copying by scribe’s hand, and clamouring for a workable, automated,

solution, that allows for both enhancing the digital comic book experience and allow for the

automated indexing and processing of the digital age.

Irrespective of all those challenges, the main motivation for tackling this problem stems

from the desire to find an enjoyable solution that brings the convenience of hand held devices

that can store several thousand comic books with the pleasure of having available, almost

anywhere, comic books to read, and finally replace dead-tree comics with equally good

digital versions.

Page 18: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

4 CHAPTER 1. INTRODUCTION

1.2 Research Contribution

The major contributions of this thesis is an automatic balloon identification and extraction

algorithm using digital image processing techniques.

1.3 Organization of the Thesis

This document has the following chapters:- Chapter 1. This the current chapter, which approaches the subject of the thesis, its

motivation and objectives, well as the the problem solved in this thesis, the organiza-

tion of the thesis, and the audience to which the thesis is addressing.

- Chapter 2. This chapter is the core of this thesis, because it describes in detail the

algorithm to extract balloons from comic book pages. This algorithm is based on

image processing techniques, and applies to any sort of comics.

- Chapter 3. This is the last chapter of this thesis, where relevant conclusions are

drawn and possible future work is outlined.

1.4 Publications

Resulting from this work, the following paper has been submitted for publication:

J. Correia and A. Gomes. Complex Comic Book Balloon Extraction Using

Edge Detection And Histogram Scoring.

1.5 Target Audience

The target audience of this thesis includes, but is not limited to, software developers for

mobile platforms, comic book publishers and creators, digital imaging researchers, mobile

device content publishers and distributors.

Page 19: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

Chapter 2

Balloon Extraction from Complex ComicBooks Using Edge Detection AndHistogram Scoring

This chapter proposes an algorithm based on the Sobel operator to identify balloons in

comic book pages. Unlike other approaches, this method works on colored and complex

comic book images, as well as comic strips, without making any assumptions regarding

the line continuity of the image, the orientation of the text or the color depth of the image.

Each comic book page is input to the Sobel operator, then each closed region of such a

page is identified, being afterwards each region subject to equalization and scoring, using

for that the mean value of its histogram. Experimental test results show that our method

significantly improves the rate of correctly detected balloons, and simultaneously decreases

the number of false positives, when compared to other methods.

2.1 Introduction

Textual information present in comic books has been so far out of reach of automated

indexing because there was not any reliable way to do it on complex comic books. Most

existing methods have good results on simple web comics or comic strips, but have much

lower success rates on colored and complex pages of comic books.

5

Page 20: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

6 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS

The method presented in this paper offers a possible approach to tackle this problem. It

opens possibilities for information indexing, processing and retrieval similar to what already

exists for standard electronic books, being thus also suited to tablets and smartphones.

These types of devices typically have small screens which make reading comic books either

cumbersome (having to zoom in on each portion of the page) or tiresome (straining the

eyesight to read the small text). By applying our method, applications only need to enhance

or zoom in on the balloons, without loss of visual context, and without preprocessing

requirements at the time of publishing.

Current commercial solutions for digital comic books build on proprietary formats that

incorporate textual information manually introduced at the time of publishing. It is clear that

this procedure increases file sizes, adds complexity and delays the time-to-market. Instead,

by applying the method here proposed, the books can simply be packaged as a set of scanned

images. This will make the transition to digital formats even faster, fostering thus also easier

adoption of such digital formats on handheld devices.

But correctly identifying the text balloons in a complex comic book page is neither easy nor

straightforward. As others [18] have also recognized, text extraction of comic book images

is harder than text extraction from traditional images like pictures or documents, because

the noise level and the image texture (e.g., depth sense is not so notorious as in pictures) of

the drawings are specially problematic.

The vast disparity of possible shapes, colors and arrangement of the elements within a given

comic book page make the automated identification of text elements a very hard process.

There is no technique or shape for the text elements that is always adhered to, and there

can be no assumptions regarding positioning inside the panels or even regarding the shape

of panels itself. To deal with this problem, the currently proposed solutions fall into three

main categories, namely:

- edge based detection,

- blob (or region) based detection,

- connected component based detection.

Some other methods of text extraction from generic images, rather than specifically tailored

for comic book images, have also been studied (see, for example, [24] [25]), but they are

not so relevant for the problem in hand.

Page 21: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

2.1. INTRODUCTION 7

Edge based techniques rely on identifying the edges in the panels and balloons, including

edges of the text glyphs. [18], presented a generic method for separating text from imagery

that, while not specifically designed for comic book images, still provides some acceptable

results for simple comic strip images. Such a method relies on the principle that edges of

text glyphs are smaller when compared to the edges or lines of the surrounding image, so

that balloons are the regions in which are located the smaller edges.

[26] used the same principle of text size to enclose text of each balloon inside a bounding

box. For that, they used a Canny edge detector and other image filters to separate the text

out from the rest of the image.

It is clear that the above methods produce a number of false positives, that is, a number

of non-textual edges are mistaken as textual edges. For example, hair strands growing out

from the skin can be mistaken as text glyphs in a comic book page. [27] presents a possible

solution to this problem that consists in using a text extraction method based on the curvature

of the edges. In fact, the accentuated curvature of the text elements allow us to distinguish

between small non-textual edges and small textual edges, and this is something which could

be adapted for extracting text from balloons in comic books.

Interestingly, all of those methods produce fairly good results for simple images as those

of web comics, but not for commercial comics books in which the artistry is complex. The

assumption for them to work properly is that the edges are well delineated, but such ideal

is not found in complex images of commercial comic books, where many crossing lines

and balloons overlapping across multiple panels and gutter may populate book pages. Also

there are no guarantees that the balloon boundary is completely enclosed and connected.

Following [28], our method makes usage of a Sobel edge detector. It allows us to overcome

the problems aforementioned. More specifically, the Sobel operator produces delineated

edges even when such edges are not well defined in the original image, being this achieved

at expense of identifying gradient changes and enhancing the boundaries of regions in the

image.

On the other hand, blob detection methods depend on identifying shapes and detecting

contrast between text and background. [21] follows an approach that is based on shape

identification by comparing the shapes of a dictionary of letters and symbols in relation to

the binarized images, trying so to locate the text elements. Brahma ends up concluding that

the method does not produce adequate enough results to replace traditional paper comics

Page 22: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

8 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS

Figure 2.1: Alpha Flight 6, January 1984, page 9 [1].

with electronic comic books. In [22], blob identification is also based on rejection of blobs

with reference to their size, resulting from this screening a list of potential text blobs. Once

again, these methods produce good results on simple comic book images, but fail in comics

with balloons that span multiple panels or in images with oddly shaped balloon elements.

Finally, connected component based methods rely on the assumption of continuity between

the outer edges of the images and every other colored pixel, with the exception of text

pixels. The method works by tracking all pixels different than a background color (usually

white), eliminating those background pixels from the edge of the image towards the center.

Of course, the downside is that even for simple images there is a very small probability

of having all pixels connected somehow, not being so possible to distinguish between text

pixels and background pixels. From empirical evidence, regarding complex and colored

comic book pages, the probability of this event to occur in a comic book page is zero in

practice. But, when one applies a connected component based method to even simple web

Page 23: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

2.1. INTRODUCTION 9

Figure 2.2: Comic book page components.

comic images, it may occur some line noise (i.e., left-over lines) on discontinuity areas.

[29] suggests going over the left-over areas and applying a width/height threshold to the

bounding boxes of the remaining connected components in order to identify text areas. But,

Guo et al. recognize that the result was acceptable for 80% of the samples taken into account

in spite of the persistent left-over noise artifacts.

In [30], one tries to extract panels and text elements out comic pages by means of a con-

nected component based method and histogram analysis to identify text areas. Their method

depends strongly on text orientation, because it checks the horizontal pixel density in rela-

tion to the vertical pixel density to identify text regions. In fact, if the text is in any other

orientation, the ratio of horizontal pixel density to vertical pixel density will fall outside of

the pre-defined threshold.

In the context of image processing, [31] compares edge based methods and connected

component methods on color images, and specifically tests those methods on different

lighting conditions. Sushma and Padmaja conclude that edge based methods frequently

produce more robust results, and are considered superior to connected component methods.

and that can also be extended to the problem at hand.

Page 24: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

10 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS

In short, those three types of approaches produce good results when they are applied to

simple images like web comics, but not to complex images such as comic book pages. Edge

detection methods depend heavily on the principle of perfect line separation, thus they fail

when there are many intersections and text overlap. Connected component methods produce

noise artifacts which are often confused with text glyphs during the OCR (optical character

recognition) processing. In respect to region detection based methods, one assumes that the

text glyphs are always of the same shape, which is not true, or the text glyphs have only

horizontal orientation and are not overlapped by other elements, which is not true either.

Despite that, each category of algorithms has interesting techniques, like histogram analysis

on blob-based methods or edge detection on edge-based methods.

Edge detection is indeed adequate for the type of images in comic books because the draw-

ings have defined edges between all color areas, either by having steep gradient changes,

easily detectable by derivative operators like Sobel, or by strong, thick, black lines separat-

ing the elements in the drawing. This is a defined characteristic of the comic book medium.

Also, connected component techniques can be used to, within each defined area, identify

all the contained pixels, and in turn, be used to calculate the bounding boxes used in region

based detection.

The method presented in this paper ends doing that, taking the strong points of each of the

methods described and producing a more robust extractor of text balloons that works in

complex comic book images with a comparable, or even better, result rate when compared

to each of the other methods when applied to simpler images. It is also worth noting that the

method presented in this paper does not intend to extract the text (that is, perform OCR), but

rather correctly find the text-containing elements of the comic book page —the balloons.

This can be used as a first step for performing OCR, but can also be useful on its own

because, when viewing comics on small screen devices, it is often enough to simply zoom

in the balloons rather than actually getting the text inside them. When OCR is actually

needed (for indexing, searching, etc), then the method presented here produces images that

contain just the balloons and have no image noise, which will then permit a straight forward

text extraction using readily available OCR processing engines.

Page 25: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

2.2. PAGE LAYOUT AND TERMINOLOGY 11

2.2 Page Layout and Terminology

Since this method is aimed towards comic book pages, the elements of each one of these

pages are described here to allow a better understanding of the terminology used. For the

purpose of this document, a comic book page and a comic vignette (like a web comic strip)

are indistinguishable and exchangeable at will. Both contain the same basic elements and

as such, the method can be used on either without compromising the validity of the results.

A page in a comic book is composed of several distinct elements:

- Panel — This is the basic element of a comic book page. Usually, a panel contains a

single illustration or drawing. Commonly, but not always, a panel is delimitated by a

clear boundary that separates it from other panels. Each panel represents a scene or

moment in the story. A page can contain one or more panels of diverse shapes.

- Gutter — This is the space between panels. On older comic books or on vignettes,

the gutter is usually white and clearly visible. On modern comic books, it is barely

present, but when it exists, it provides an empty space that separates each panel

from its adjacent panels. Artistic freedom exists in modern comic books, so some

art intrudes itself into the gutter, and even flows from one panel to another across it.

- Art — The art consists of the drawings, which usually are made inside panels. Nev-

ertheless, there is artistic leeway that allows for its extension beyond the limits of the

panels at the discretion of the artist. Drawings are used to convey the narrative and

setting of the story. All colored drawing areas have very well defined borders, either

through black contour lines or contrasting colors. In the case of black and white art,

the regions are separated by black contour lines or different shades of gray.

- Balloons — These elements are enclosed areas where text is written. They rep-

resent speech, thoughts or narrative data. They are usually placed inside panels,

but sometimes they extend outside those panels. Their shape is usually rounded

for speech balloons and thought balloons, and usually rectangular for narrative data

panels. Balloons usually have a tail that points towards the originator of the speech or

the thought in the drawing.

- Splash Balloons — The main difference in relation to other balloons is the shape

which is jagged. Usually, a splash balloon conveys an exclamation or dramatic text,

Page 26: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

12 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS

which is very common on Manga books (Japanese comic books).

Traditionally, balloons have white background and black text. In fact, this is the only feature

common to all types of comics books. Some exceptions do exist, however, usually to lend a

different quality to the voice or thought process of the comic book character. This element

is created to be as clearly visible and legible as possible, as well as to draw the reader’s

attention who must interpret it quickly and without ambiguity. On a more practical level, this

translates into text balloons that are brighter than ordinary regions in the page. Obviously,

this helps the reader to follow the sequence of the story.

Regarding the overall composition, comic books have such a diversity of styles that there

are no guarantees that all elements are present in given a book page. Also, no assumptions

can be made concerning the shape, size or disposition of those elements. In fact, some

comic books have whole pages with nothing in them as part of the narrative [1]. This formal

freedom can show up in other forms such as in the drawing of panels-within-panels-within-

panels or deliberately in the drawing of balloons that overlap.

Also, there are some highly distinctive styles between European, American and Japanese

comic books. Those are the predominant types of comic books existing today, and each

has specific ways of using the comic book elements. For example, American comic books

have pratically no gutter between panels, and the art elements may flow from one panel to

another, while Japanese comics, or manga, have different panel layouts, with long, dramatic

panels covering the width or height of the pages occurring more often.

2.3 Balloon Extraction Algorithm

Our algorithm belongs to the category of edge based algorithms while at the same time using

histogram analysis common in blob-based algorithms. The balloon extraction algorithm

described here is essentially a two-stage algorithm. The first stage is an image segmentation

algorithm that divides each comic book page into regions. The second stage basically filters

out the regions relating to balloons.

The only assumption made by the algorithm is that the balloons in the images have white

background (or very bright background) and black text or, alternatively, they are composed

of a single background color and a single text color. In fact, only residual cases do not

Page 27: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

2.3. BALLOON EXTRACTION ALGORITHM 13

Figure 2.3: (left) Original image. The Amazing Spider-Man 679, April 2012, story page 5; (right)

Gray scale image.

Page 28: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

14 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS

Figure 2.4: (left) Book page after applying the Sobel edge detector; (right) Negative of the Sobelized

book page.

Page 29: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

2.3. BALLOON EXTRACTION ALGORITHM 15

follow this rule. This is due to the fact that text balloons are made to be quite legible to the

reader, i.e., to facilitate their localization to the reader. Naturally, there are a few special

cases that fall outside this definition, but we will see how well our algorithm applies to them

in the results section.

2.3.1 Page Segmentation

Let B = {Pi}, i = 0 . . .n, a colored comic book with n pages. A comic book page is

composed of multiple closed regions, possibly with different colors, but separated by black

outlines. The text balloons are particular cases of closed regions, being for that also delim-

ited by black outlines. The segmentation of each comic book page Pi into regions involves

the following steps:

1. Convert Pi to a grayscale page PGi .

2. Apply the Sobel operator to PGi to get PS

i .

3. Compute the negative page PNi of PS

i .

4. Apply an enhanced flood fill operator to all black pixels of PNi to obtain all page

regions.

By using a Sobel-based edge detector together with a flood fill operator, all regions in a

comic book page can be thus separated and extracted.

2.3.1.1 Conversion to gray scale

To ensure that this method can be applied to color images and also to black and white

images, the first step is to apply an operator that converts color images into grayscale images.

Since the overall goal is to be able detecting the brightest regions (i.e., balloons) in the

image, the conversion to a gray scale representation can be achieved by calculating the

luminance of each pixel from its RGB components.

Luminance is a measure of the brightness of a given pixel when adjusted for human vision.

The human eye can detect brightness variations better than color variations [20], so the

Page 30: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

16 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS

conversion should respect this feature. One way to achieve such a conversion is to utilize

the RGB-to-YCrCb conversion as follows [32]:

Y

Cb

Cr

=

0.2989 0.5866 0.1145

−0.1687 −0.3312 0.5000

0.5000 −0.4183 −0.0816

R

G

B

. (2.1)

This allows us, in a simple manner, to obtain color information in terms of luminance and

chrominance (YCrCb), rather than just color (RGB). The luminance channel (Y) is simply

a grayscale representation of the original image, such that brighter colors are transformed

into brighter grays. The other channels (Cb and Cr) can be ignored for the purpose of our

algorithm, so they do not have to be calculated. They represent the chrominance of blues and

reds. Obviously, if the original book pages already are in black and white, or in grayscale,

this conversion step can be skipped.

It is also clear that other color space representations other than YCrCb might be used after

all, as long as they include a channel for luminance [33]. We could even use distinct weights

for the values of R, G, and B in the computation of luminance, but, as explained further

ahead, this may lead to a need for applying histogram equalization to page regions. The

important thing to retain here is that the computing of luminance should allow us to mimic

the way how human vision delineates contours of regions, in a way that edges extracted by

the algorithm correspond to edges identified by the human eye.

Threfore, the grayscale conversion step is crucial to enable the Sobel operator to detect

intensity changes (edges) more easily, and also ensure that the detected edges are closer to

what a human would perceive as an border when looking at the image.

2.3.1.2 Sobel-based edge detection

From the grayscale page image, the edges must be identified and enhanced for further

processing. In image processing, an edge is a group of pixels that have a significant dif-

ference of brightness in relation to neighbouring pixels. This can be seen as a sudden

change (or discontinuity) when looking at the color variation on a given area. These changes

exist between regions of an image, or between foreground objects and background objects

[28][20].

Page 31: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

2.3. BALLOON EXTRACTION ALGORITHM 17

We used the Sobel operator for edge detection, since it has proved itself perfectly adequate,

not to speak of its simplicity, to accentuate shape contours. In fact, in contrast with photo-

graphic images, art lines drawn in comic books are continuous, which guarantees that the

edges found in book pages are precise, in particular those concerning boundaries found in

the page image. This also means that there is no need for any kind of image pre-processing

using a sharpening filter to better delineate contours in images.

The Sobel operator convolves two 3×3 kernels with the original image in order to compute

approximations to the derivatives in both x-direction and y-direction, which are sensitive to

vertical and horizontal intensity changes, respectively [20]. Thus, this operator slides each

kernel on the book page, computing then the product of the kernel matrix and the 3× 3

neighborhood matrix of each pixel in book page. When compared to other edge detection

operators, the Sobel operator produces thicker edges [28], which is desirable for this type of

images because eases the detection of region contours, as needed when performing region

extraction through flood fill algorithm.

Also, since this operator is fast to traverse the complete image because only requires knowl-

edge of the 3× 3 neighborhood of each page pixel. As a margin note, let us say that we

have also tested a Laplacian operator, which is still faster than Sobel operator, to detect

and enhance edges in the image, but it produced aliasing artifacts on the edges, originating

continuity problems across page regions. More specifically, some regions get connected

when they indeed lie separate in the page.

2.3.1.3 Negative pages

As shown above, edge detection tries to match the way the human eye perceives colors

and brightness, so that each book page is first transformed to a gray scale representation,

immediately before applying the Sobel operator which increases the contrast of the edges

with black lines in a white background. To facilitate processing, one calculates the negative

of each book page, hence resulting white edges in black background (right hand side of

Fig. 2.4).

Page 32: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

18 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS

2.3.1.4 Flood fill and region extraction

Found its negative, each book page can be separated into closed regions. Each region is a

set of pixels from the original book page delimited by a number of edges (white pixels in

the negative book page).

Careful analysis revealed that the letters inside text balloons are also identified as edges.

This is the expected behavior of an edge detection operator, since the text letters can be

considered as discontinuities (edges) in the image. However, this fact is derisive to a proper

extraction of the balloon regions, because making such extraction of regions would result in

extracting the balloons without the letters; in other words, we would obtain balloon regions

with holes left by the letters, as illustrated in left hand side of Fig. 2.5. Therefore, the flood

fill must be adapted to overcome this problem.

Recall that a standard filling algorithm inundates the basin of a region, in a progressive

manner, while its frontier is not attained by the water. A letter inside a balloon region works

as a barrier to the progression of the water. Consequently, flooding a balloon region gives

rise to a region with holes. Filling in these holes can be accomplished by, first, identifying

the most left pixel (see red pixels in the right hand side of Fig. 2.5) and the most right pixel

(see green pixels in the left hand side of Fig. 2.5) of each row of the flooded region, copying

then all the pixels of the corresponding row of the original colored book page onto a colored

ballon region. Thus, in the end of this stage, the extracted regions of a book page are all

colored regions.

2.3.2 Balloon extraction

The page segmentation described above has produced a set {R j}, j = 0, . . . ,N, of regions

for each book page, some of which concern the text balloons. The question then is how

to filter out these regions with text balloons, what corresponds to the second stage of our

algorithm, i.e., balloon extraction, which consists of the following steps:

1. Discard either too large or too small regions.

2. Score the remaining regions (mid-sized regions) with respect to their grayscale his-

tograms.

Page 33: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

2.3. BALLOON EXTRACTION ALGORITHM 19

Figure 2.5: Flood fill method comparison. (left) standard flood fill extraction, (right) modified flood

fill extraction.

Page 34: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

20 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS

3. Sort mid-sized regions according to their weighted average luminance.

4. Filter out ballooned regions using an empirical threshold based on the weighted aver-

age luminance.

2.3.2.1 Region culling

Having extracted all the regions present in the image, it is a priori necessary to cull a

significant number of regions in order to increase the overall performance of the algorithm.

During testing, it was found that most of comic book pages had between 500 and 1000

separate regions; for example, the book page shown in Figure 2.3 contains 661 regions.

Moreover, most of them were too small (just a couple of pixels) or too large (in height or

width) for being considered as balloon regions. The rather small regions usually concern art

details or single letters, which are irrelevant to the outcome of the algorithm and thus can be

discarded straight away. In respect to too large regions, we noted that they usually represent

the gutter or extensive background segments, but not balloon regions, so they should also

be discarded altogether.

Too small Too large

Relative region width <1.5% >50%

Relative region height <1.5% >50%

Relative region area <1% >20%

Table 2.1: Criteria for size-based region culling.

Therefore, as shown in Table 2.1, culling of regions is based on one of the following criteria:

- Relative region width;

- Relative region height;

- Relative region area.

If the region width is less than 1.5% or is greater than 50% relative to page width, it is

then considered as discardable and thus excluded from the remaining set of regions. These

culling percentages also apply to the relative height of each region. Additionally, any region

whose area is smaller than 1% or larger than 20% relative to page area is discarded. This

Page 35: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

2.3. BALLOON EXTRACTION ALGORITHM 21

usually removes 90% of the extracted regions, making the next step of the algorithm even

faster.

2.3.2.2 Region scoring

Before proceeding any further, it may be convenient to remember that we still have about

10% percent of negative regions that need to be ranked in some way, since many of them

will also be dropped. For this purpose, we first extract the color regions from the original

colored book page that are homologous to those negative regions, using for that a pixel-wise

copy operation.

Afterwards, let us convert these color regions to grayscale regions because we intuitively

know that brighter regions correspond to balloon regions. One way of ranking the remaining

regions, and identify the ones most likely to be balloons, is generating a histogram for each

region, taking into consideration the overall luminance of the region.

Obviously, the luminance of a region depends on the luminance of each one of its pixels,

which is given by the Y value in (2.1). Note that Y ∈ [0,255] ⊂ R, so after calculating the

Y -value of every pixel pk of a given region with N pixels, we have to map it into the discrete

grayscale of the histogram as follows:

G(pk) = round(Y (pk)) (2.2)

so, the corresponding histogram bin scores one more point; for example, if G(pk) = 231,

the bin numbered as 231 will be increased of 1, that is,

h(231) = h(231)+1. (2.3)

where h(i), i ∈ [0,255]⊂ N, denotes the i-th bin of the histogram.

2.3.2.3 Region sorting

So, taking into account that we are dealing once again with grayscale regions, the histogram

of each region has a range of values G ∈ [0,255] ⊂ N (dark – white). It is then clear that a

histogram of a brighter region (i.e., a balloon region) has a peak closer to its right hand side

Page 36: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

22 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS

Figure 2.6: (top) Typical histogram of a balloon region; (bottom) histogram of an ordinary (non-

balloon) region.

(or white) (Fig. 2.6 Top). Thus, a grayscale region with N pixels can be ranked according

to the weighted average luminance:

L =

255∑

i=0i×h(i)

N(2.4)

Since the histogram range values vary from 0 (black) to 255 (white), regions with a higher

percentage of white will score more, and darker regions will score less (Fig. 2.6 Bottom).

Thus, the higher scoring regions are typically the balloons in the image.

The regions whose histograms are shown in Fig. 2.6 represent a balloon (top) with L=254,00

and an ordinary region (bottom), with an L=84,25.

2.3.2.4 Region filtering

Experimental results showed that the weighted average luminance L = 247 is the threshold

above which a region is considered as a balloon; otherwise, the region will be discarded.

Note that the white band [247,255] represents only 3% of the histogram range. For example,

Page 37: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

2.3. BALLOON EXTRACTION ALGORITHM 23

Figure 2.7: Balloon regions extracted from the book page shown in Fig. 2.3. Although not visible

here, the regions have a balloon shape indeed.

the regions shown in Fig. 2.7 concern the text balloons that were filtered in this manner from

the book page depicted in Fig. 2.3.

Interestingly, when the ranking produces no results above the threshold L = 247, a different

method can be used to identify the correct regions, specifically, color counting. Using the

histograms already created for each region, score each region by counting the number of

spikes in the histogram. Any region with exactly two spikes is a candidate balloon region

(foreground and background colors). Image artifacts like aliasing, bluring or sharpening

can cause the number of spikes to change, so care should be taken not to use other image

processing filters before creating the histogram. This alternative method deals with almost

Page 38: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

24 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS

all cases where balloons are not the brightest areas on the page.

Besides, it may happen that, with poor quality scanned images or in old, yellowed paper

scanned images, the original image is too dark or has very low contrast. All the balloons

would fall low of the threshold for balloon detection. In this case, an equalization of the

histogram will solve this problem, by normalizing the color representation values for the

purpose of scoring only.

2.4 Experimental Results

The testing of our algorithm were performed using a PC powered by an Intel Core i7 860

processor, 2.8 Ghz clock, with 8GB RAM and one ATI 5670, and running Windows 7 (64

bit version) operating system.

To test and demonstrate the validity of the method presented, a test run with 7 comic books,

from different publishers and with different artistic styles, was used. The comic books

were scanned as lossless PNG files, with resolution 1024x1590, 150 DPI, and only the first

10 story pages per book were used, what makes a total of 70 pages of comics analysed.

Note that the page numbers in Tables 2-13 are relative to the actual page number of the

book’s story. For example, the page number 1 might correspond, in a particular book, to

page 4, discounting covers, publicity pages, title or credits pages, since those have no text

balloons and are not actually comic book images. Also, when we have two consecutive

pages, each showing a half of a larger image (i.e., a page spread), they count as only one,

since that is actually what they represent. This was done purely for convenience but had

it been done differently, would not affect the result. As a margin note, let us say that the

average processing time of each book page was 1 to 2 seconds.

The balloons per page were counted manually and this number was compared with the

number of balloons correctly identified by our method. False positives, as well as missed

balloons, were also counted. Recall that a false positive is an image region that the method

presents as being a balloon containing text but in reality is not, and a missed balloon is

a balloon existing in the image but not identified as such. Those two cases are presented

separately because they represent the two possible points-of-failure (resulting usually from

either under or over-tuning of the thresholding parameter).[2]

Page 39: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

2.4. EXPERIMENTAL RESULTS 25

Table 2.2: Extraction results for [2]

Batman 670

Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success

Detections 6 6 8 10 4 9 6 8 6 7 70

Actual Balloons 6 6 7 6 4 7 7 6 6 7 62

False Positives 0 0 1 4 0 2 0 2 0 0 9 12,86 87,14Missed Balloons 0 0 0 0 0 0 1 0 0 0 1 1,43 98,57

Table 2.3: Extraction results for [3]

Batman 671

Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success

Detections 1 4 4 6 23 1 3 6 8 6 62

Actual Balloons 1 5 4 6 8 4 4 4 8 6 50

False Positives 0 0 0 0 15 0 1 2 0 1 19 30,65 69,35Missed Balloons 0 1 0 0 0 3 2 0 0 1 7 11,29 88,71

Table 2.4: Extraction results for [4]

Batman 672

Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success

Detections 1 8 6 4 8 9 7 5 9 4 61

Actual Balloons 1 8 6 4 11 9 7 5 9 4 64

False Positives 1 0 0 0 0 0 0 0 0 0 1 1,64 98,36Missed Balloons 1 0 0 0 3 0 0 0 0 0 4 6,56 93,44

Table 2.5: Extraction results for [5]

Batman 673

Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success

Detections 2 1 2 1 5 3 8 8 6 4 40

Actual Balloons 2 1 2 0 2 2 8 8 6 4 35

False Positives 0 0 0 1 3 1 0 0 0 0 5 12,50 87,50Missed Balloons 0 0 0 0 0 0 0 0 0 0 0 0,00 100,00

Page 40: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

26 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS

Table 2.6: Extraction results for [6]

Batman 674

Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success

Detections 4 16 4 6 3 5 9 11 2 3 63

Actual Balloons 4 5 4 6 3 5 9 11 2 3 52

False Positives 0 11 0 0 0 0 0 0 0 0 11 17,46 82,54Missed Balloons 0 0 0 0 0 0 0 0 0 0 0 0,00 100,00

Table 2.7: Extraction results for [7]

Amazing Spider-Man 643

Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success

Detections 8 12 9 5 8 7 4 19 10 7 89

Actual Balloons 7 12 9 4 8 6 4 12 10 5 77

False Positives 1 0 0 1 0 1 0 7 0 2 12 13,48 86,52Missed Balloons 0 0 0 0 0 0 0 0 0 0 0 0,00 100,00

Table 2.8: Extraction results for [8]

Amazing Spider-Man 644

Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success

Detections 10 8 10 11 8 10 9 11 7 15 99

Actual Balloons 8 7 10 10 0 9 8 9 7 13 81

False Positives 2 1 0 1 8 1 1 2 0 2 18 18,18 81,82Missed Balloons 0 0 0 0 0 0 0 0 0 0 0 0,00 100,00

Page 41: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

2.4. EXPERIMENTAL RESULTS 27

Table 2.9: Extraction results for [9]

Amazing Spider-Man 645

Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success

Detections 8 12 7 9 7 8 15 10 11 10 97

Actual Balloons 8 10 7 9 6 8 15 7 11 10 91

False Positives 0 3 0 0 1 0 0 3 0 1 8 8,25 91,75Missed Balloons 0 1 0 0 0 0 0 0 0 1 2 2,06 97,94

Table 2.10: Extraction results for [10]

Amazing Spider-Man 646

Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success

Detections 4 4 9 9 2 6 7 5 7 8 61

Actual Balloons 6 4 8 9 2 7 8 1 7 8 60

False Positives 0 0 1 0 0 0 0 4 0 0 5 8,20 91,80Missed Balloons 2 0 0 0 0 1 1 0 0 0 4 6,56 93,44

Table 2.11: Extraction results for [11]

Amazing Spider-Man 647

Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success

Detections 7 2 10 13 12 8 8 8 4 4 76

Actual Balloons 8 2 10 13 13 7 9 7 4 3 76

False Positives 0 0 0 0 0 1 0 1 0 1 3 3,95 96,05Missed Balloons 1 0 0 0 1 0 1 0 0 0 3 3,95 96,05

Page 42: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

28 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS

Table 2.12: Extraction results for [12]

Amazing Spider-Man 648

Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success

Detections 7 4 10 16 15 12 14 12 19 11 120

Actual Balloons 7 4 9 11 13 12 14 12 16 11 109

False Positives 0 0 1 5 3 0 0 1 3 0 13 10,83 89,17Missed Balloons 0 0 0 0 1 0 0 1 0 0 2 1,67 98,33

Table 2.13: Extraction results for [13]

Amazing Spider-Man 649

Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success

Detections 10 14 16 6 9 12 5 14 15 3 104

Actual Balloons 9 11 16 6 7 11 4 8 14 3 89

False Positives 2 3 1 0 2 1 1 4 1 0 15 14,42 85,58Missed Balloons 1 0 1 0 0 0 0 0 0 0 2 1,92 98,08

2.4.1 Analysis of results

Tables 2-13 show the results for 12 comic books (our bookset), a table per book. More

specifically, as mentioned above, each table presents test results for the first 10 pages of a

single book.

Pages with no false positives and no missed balloons are classified as having optimal results,

and this happens in 62 out of 120 pages in the bookset, which, given the complexity of

the pages, is a very high value. It means that balloon extraction is optimal or completely

automatic in more than 50% percent of book pages.

Also, only 25 out of 846 balloons of the bookset were missed out by the algorithm, what

represents 2.95% of total number of balloons.

Taking into consideration that each book page has approximately a minimum of 500 regions,

we end up to processing 5000 regions for the first 10 pages of each book, in a total of 60,000

regions concerning 12 books. Interestingly, by inspection of Tables 2-13, we note that the

algorithm only produced 119 false positives, what represents about 0.2% of the total number

Page 43: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

2.4. EXPERIMENTAL RESULTS 29

of regions.

Interestingly, the page with a significant number of false positives is page 5 in Batman 671,

which has 15 occurrences of false positives. This case has occurred because on page 5 we

find snow, which in some circumstances can be easily mistaken as a balloon region, since

they are both white. In fact, all false positives are created by very bright single color regions

that are the same size as balloons. Such false positives could be avoided by counting the

peaks in the histogram, since all the false positives have only one color. Ou false positives

are visually similar to false positives described in [30].

2.4.2 Comparison to other algorithms

Other authors have tackled the problem of extraction of text balloons in comics (see, for

example, [34] and [22]). However, they used simple comics like web comics and flat color

books (e.g., Asterix, Lucky Luke or Garfiled), but not more complex comics like those

published by Marvel or DC such as Batman or Spider Man. In respect to simple comics, we

can say that our algorithm does not fail, that is, it does not produce no false positives nor

missing balloons.

Moreover, because this algorithm does not concern itself with the text inside the balloons,

but just with the balloons themselves, it can detect balloons that have text in any orientation,

direction, alphabet (cyrilic, arabic, chinese, etc.), font face type or color. During testing, our

algorithm successfully detected the balloons with special characteristics like those shown in

Fig. 2.8.

All of those balloons depicted in Fig. 2.8 cause problems to other methods. Different

colored text inside the balloons would lead to false positive detection, wavy text would

defeat horizontal text searches, and different font faces, with disconnected letters, would

make other methods to fail.

2.4.3 OCR

This algorithm does not attempt to perform OCR on the detected balloons because that is not

always the goal of balloon extraction and also because there are readily available solutions

Page 44: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

30 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS

Figure 2.8: Special case balloons that were successfully recognized. Balloons having different

text colors than other balloons, different text colors inside the same balloon, shapes different from

standard balloons, wavy text, different font faces in the same balloon.

to detect text in images, provided that those images only contain text, as is the case of the

balloons produced after applying this algorithm.

On the other hand, when the optical character recognition of the text in the extracted balloon

images is necessary, we can use any traditional OCR device that supports the font types used

in comic books or, simply, train the OCR device to recognize those font types. Traditionally,

the font types used belong to the ComicScript and Comicraft family for American and

European comic books. Those are the font types of the textual elements, not those of

onomatopeiae or drawn text elements (like street signs, billboards, etc) drawn into the

image.

During testing, using Google’s Tesseract OCR engine, by increasing the extracted balloon

size by a factor of 4 we obtained better results than simply using the regions of the original

image. This facilitated the correct recognition of individual letters that lie connected in the

Page 45: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

2.5. FINAL REMARKS 31

original book page.

2.5 Final Remarks

The method presented offers better results for complex comics when compared to other

methods, and does so while at the same time having less processing requirements. The

results show that, for many pages, it is optimal, in the sense that it has no false positives

and correctly detects all balloons. It does, however, fail for specific pages with bright areas

or balloons with uncommon background colors, like black or other dark colors. These are

corner cases at best, and do not represent any significant portion of existing comic books. It

should be noted that no other existing method can deal with such corner cases successfully

either. Future work will address finding those balloons and hardening the false positives that

result from small bright areas in the image.

Page 46: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

32 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS

Page 47: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

Chapter 3

Conclusions

The results achieved with the algorithm presented allow the conclusion that performing

balloon extraction in comic book pages, in an automated fashion, and without requiring

intense cpu power, is possible. Those results show that this algorithm is both reliable and

comprehensive in scope, in that it is not tailored to any specific style of comic book, but

rather generic enough to be applied to any comic book, without compromising the results.

It is true that there are still some corner cases that make total automation, for all comic

books, unadvisable, but it is also observable from the results that this algorithm produces

better and more consistent results than other methods.

Experimental results show that optimal cases are the majority, that is, processed pages

where there are no false positives and no missed balloons, which is the ultimate goal of

any algorithm for solving this problem.

The work presented can be expanded upon by looking at ways to double-check the regions

identified as balloons, in order to reduce the still-present false positives. This could possibly

be achieved by including OCR elements like character matching and exclude regions with-

out matches, but this has not yet been confirmed. Also, it would be interesting to explore

the possibility of an implementation completely in hardware of the proposed algorithm,

possibly with some type of SoC (system-on-chip), and couple it with e-paper devices, for a

complete comic book reading experience on a single sheet of paper.

33

Page 48: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

34 CHAPTER 3. CONCLUSIONS

Page 49: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

References

[1] John Byrne. Alpha Flight. Number 6. January 1984.

[2] Grant Morrison and Tony Daniel. Batman. Number 670. DC Comics, December 2007.

[3] Grant Morrison and Tony Daniel. Batman. Number 671. DC Comics, January 2008.

[4] Grant Morrison and Tony Daniel. Batman. Number 672. DC Comics, February 2008.

[5] Grant Morrison and Tony Daniel. Batman. Number 673. DC Comics, March 2008.

[6] Grant Morrison and Tony Daniel. Batman. Number 674. DC Comics, April 2008.

[7] Paul Azaceta Marcos Martin Mark Waid, Stan Lee. The Amazing Spider-Man. Number

643. Marvel Comics, November 2010.

[8] Marcos Martin Mark Waid, Stan Lee. The Amazing Spider-Man. Number 644. Marvel

Comics, November 2010.

[9] Mathew Southworth Mark Waid, Paul Azaceta. The Amazing Spider-Man. Number

645. Marvel Comics, December 2010.

[10] Paul Azaceta Mark Waid. The Amazing Spider-Man. Number 646. Marvel Comics,

December 2010.

[11] Dan Slott Fred Van Lente Mark Waid Zeb Wells Max Fiumara Karl Kesel Paul Azaceta

Bob Gale, Joe Kelly. The Amazing Spider-Man. Number 647. Marvel Comics,

December 2010.

[12] Joe Quesada Clayton Henry Dan Slott, Paul Tobin. The Amazing Spider-Man. Number

648. Marvel Comics, January 2011.

35

Page 50: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

36 REFERENCES

[13] Humberto Ramos Dan Slott. The Amazing Spider-Man. Number 649. Marvel Comics,

January 2011.

[14] Scott McCloud. Understanding Comics - The Invisible Art. Harper Collins, 1994.

[15] Marvel Comics. Marvel Digital Comics Shop, (accessed October 12, 2012). http:

//comicstore.marvel.com/.

[16] DC Comics. DC Comics Digital Comics Shop, (accessed October 12, 2012). http:

//www.readdcentertainment.com/.

[17] DigitalComicMuseum. Digital Comic Museum, (accessed October 12, 2012). http:

//digitalcomicmuseum.com/.

[18] Ruini Cao and Chew Lim Tan. Separation of overlapping text from graphics.

In Proceedings of the Sixth International Conference on Document Analysis and

Recognition (ICDAR’01), pages 44–48, 2001.

[19] Trung Quy Phan Palaiahnakote Shivakumara and Chew Lim Tan. A laplacian approach

to multi-oriented text detection in video. In IEEE Transactions on Pattern Analysis and

Machine Intelligence, volume 33, pages 412–419, 2011.

[20] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing Third Edition.

Prentice Hall, 2007.

[21] Siddhartha Brahma. Text extraction using shape context matching, 2006.

[22] Kohei Arai and Herman Tolle. Method for real time text extraction of digital manga

comic. International Journal of Image Processing (IJIP), 4(6):669–676, 2011.

[23] Will Eisner. Comics & Sequential Art. Poorhouse Press, 1985.

[24] M. Praneesh and R. Jaya Kumar. Novel approach for color based comic image

segmentation for extraction of text using modify fuzzy possibilistic c-means clustering

algorithm. Special Issue of International Journal of Computer Applications (0975-

8887) on Information Processing and Remote Computing - IPRC, pages 16–18, August

2012.

[25] Keiichiro Hoashi, Chihiro Ono, Daisuke Ishii, and Hiroshi Watanabe. Automatic

preview generation of comic episodes for digitized comic search. In Proceedings of

the 19th International Conference on Multimedia 2011, 2011.

Page 51: Balloon Extraction from Complex Comic Books Using Edge … · 2016. 6. 8. · das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado

REFERENCES 37

[26] Q. Yuan and C. L. Tan. Page segmentation and text extraction from gray scale image

in microfilm format, 2001.

[27] Sachin Grover, Kushal Arora, and Suman K. Mitra. Text extraction from document

images using edge information. In IEEE India Council Conference, INDICON 2009:

Ahmedabad, 2009.

[28] Wenshuo Gao, Xiaoguang Zhang, Lei Yang, and Huizhong Liu. An improved sobel

edge detection. In Computer Science and Information Technology (ICCSIT), 2010 3rd

IEEE International Conference on, volume 5, pages 67–71, 2010.

[29] Qinqlian Guo, Kyoko Kato, Norio Sato, and Yuko Hoshino. An algorithm for

extracting text strings from comic strips, 2006.

[30] Christophe Rigaud, Norbert Tsopze, Jean-Christophe Burie, and Jean-Marc Ogier.

Extraction robuste des cases et du texte de bandes dessinées, 2012.

[31] J. Sushma and M.Padmaja. Text detection in color images. In IEEE IAMA 2009, 2009.

[32] ITU-R. Recommendation itu-r bt.601-7, March 2011.

[33] R.W.G. Hunt. The Reproduction of Colour in Photography, Printing and Television.

Fountain Press, 1987.

[34] Anh Khoi Ngo ho, Jean-Christophe Burie, and Jean-Marc Ogier. Panel and speech

balloon extraction from comic books. In 2012 10th IAPR International Workshop on

Document Analysis Systems, 2012.