Video Movement Analysis for Diagnosis of Dyslexia -...

50
Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted to the University of La Rochelle for the MSc’s Degree in Computer Engineering, 2003/2004 promotion Adrien BOUSSEAU Supervisors : Dr Laurent JOYEUX, Dr Anil KOKARAM Trinity College Dublin Signal Processing and Media Applications Trinity College Dublin IUP G´ enie Math´ ematique et Informatique La Rochelle University

Transcript of Video Movement Analysis for Diagnosis of Dyslexia -...

Page 1: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

Video Movement Analysisfor Diagnosis of Dyslexia

A dissertation submitted to the University of La Rochellefor the MSc’s Degree in Computer Engineering, 2003/2004 promotion

Adrien BOUSSEAU

Supervisors : Dr Laurent JOYEUX, Dr Anil KOKARAMTrinity College Dublin

Signal Processing and Media ApplicationsTrinity College Dublin

IUP Genie Mathematique et InformatiqueLa Rochelle University

Page 2: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

i

Acknowledgements

I would like to thank everyone who helped me during this training course, andin particular Dr Laurent Joyeux for his supervision, his advice, ideas and sug-gestions throughout the period of the course. I want to thank Dr Anil Kokaramfor proposing the project undertaken during the training course, and for his pre-cious advice on this report. I also would like to thank David Corrigan for hisnumerous corrections of this report. My work with the Sigmedia team has beenreally pleasant so I want to thank them for their kindness and their tolerance ofmy french accent. I address special thanks to Serge Yeterian for his solidarity inmaintaining our faith in french cuisine, and to Akash Pooransingh for his ‘reallygood’ cricket tutorial. Finally I thank Christophe Demko for accepting to be myacademic supervisor.

Page 3: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

ii

Abstract

I have carried out the training course for my MSc’s Degree in Computer En-gineering in the SigMedia research group of the Department of Electronic andElectrical Engineering in Trinity College Dublin.

My project was part of the DysVideo project, which is a collaboration be-tween the SigMedia group and the Department of Psychology in Trinity College.The aim of this project is to use video analysis for diagnosis and treatment ofDyslexia. It relies on the recently discovered persistence of primary reflexes inDyslexic children. The idea is to record children performing physical exercisesdesigned to trigger primary reflexes, and analyse the produced videos to scorepossible Dyslexia. Thus my work was to develop video analysis algorithms toautomatically measure child movements.

The development has been focused on an exercise where the Dyslexia scoringdepends on the bend in the child’s arms according to a rotation of its head. Toextract the arms a pixel-based skin detector using RGB colour segmentation wasdeveloped. The arm bends were then estimated by extracting the main line ofeach arm and computing the equations of these lines by fitting to a line modelwith a robust least squares algorithm. Two solutions have finally been consid-ered for the estimation of head rotations, the first one uses a simple correlationmethod, and the other one uses optical flow.

The algorithms have been applied on DysVideo sequences using DirectShowtechnology which allows easy processing of multimedia streams in real time.

Key Words

Image Processing, Image Analysis, Movement Measurement, Skin Detection,Line Detection, Rotation Estimation, DirectShow, Dyslexia, Trinity College Dublin,SigMedia.

Page 4: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

iii

Resume

J’ai effectue mon stage de maıtrise d’IUP Genie Mathematique et Informatiqueau sein de l’equipe SigMedia (Signal Processing and Media Application Group)du departement Electronic and Electrical Engineering du Trinity College deDublin. Le groupe SigMedia est dirige par le docteur Anil Kokaram, qui a ef-fectue de nombreux travaux sur la restauration d’image et de video. L’equipe estconstituee de trois postdoctorats (dont mon maıtre de stage, Dr Laurent Joyeux)et de huit etudiants en these. SigMedia a travaille sur une grande variete de pro-jets relatifs au traitement et a l’analyse de documents mutimedia, notammentpour l’indexation de videos sportives ou la restauration automatique de films.

Le projet sur lequel j’ai travaille est une collaboration entre le groupe Sig-media et le departement de psychologie du Trinity College. Le but de ce projetest de proposer un logiciel pour l’aide au diagnostic de la dyslexie. Le projet re-pose sur la recente decouverte d’une relation entre la dyslexie et la persistance desreflexes primaires chez l’enfant. Ces reflexes au nombre de 4 (comme le reflexe detetee) assistent le nouveau ne lors des premiers mois de la vie, mais disparaissentensuite lorsque l’enfant grandit. Mais chez les dyslexiques les reflexes primairespersistent, ce qui les empeche de realiser certains mouvements. L’idee est doncd’observer des enfants effectuer des exercices physiques susceptibles de declencherdes reflexes primaires. Ces observations pourraient permettre un diagnostic de ladyslexie avant les indices habituels comme l’echec de l’apprentissage de la lecture.

Dans le cadre du projet DysVideo, chacune des sessions d’exercices a etefilmee et stockee pour de nombreux enfants, ce qui represente une base de cinqcents heures de video. L’inspection de ces videos par un operateur demandebeaucoup de temps et n’est pas vraiement fiable, certains mouvements etant tressubtiles. C’est pourquoi le groupe SigMedia cherche a developper des algorithmesd’analyse de video en vue d’un diagnostic automatique.

Le systeme DysVideo etant destine aux ecoles et cliniques, il doit etre abor-dable et sans contraintes pour l’enfant. C’est pourquoi chaque session n’estfilmee que par une camera, sans technologies complexes comme des capteursmagnetiques pour la mesure des mouvements. Lors de mon stage, l’analyse aete concentree sur deux exercices bases sur des mouvements de membres (mainset bras). Pour mesurer ces mouvements directement sur les videos, les membressont extraits grace a un detecteur de couleur peau. De nombreux detecteursde peau ont ete proposes par la communaute scientifique [14], notamment pourles applications de reconnaissance faciale. La methode la plus simple et rapideest de segmenter chaque pixel en fonction de sa couleur. Les deux difficultesde cette approche sont le choix de l’espace de couleur et le choix du modele derepresentation de la couleur de peau. L’espace de couleur RVB (Rouge, Vert,Bleu) a ete choisi pour l’application DysVideo parce qu’il permet une definition

Page 5: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

iv

intuitive du modele de couleur (la peau comporte plus de rouge que de vert et debleu). Contrairement a d’autres espaces comme le YUV, le RVB comporte uneforte correlation entre ses composantes et entre les donnees de chrominance etde luminance, ce qui peut rendre la detection de couleur sensible aux variationsde lumiere. Mais ceci n’est pas un defaut majeur pour le systeme DysVideo carles sceances sont filmees en interieur, avec un eclairage constant. Le modele decouleur de peau retenu est une suite de regles et de seuils definissant les limitesd’une region ”peau” dans l’espace de couleur.

Une fois le detecteur de peau defini, la seconde partie du stage a ete centre surle developpement d’algorithmes pour la mesure du mouvement d’un des exercices(num. 10). Lors de cet exercice l’enfant se tient a genoux face a la camera, le su-perviseur se placant derriere lui. Le superviseur tourne la tete de l’enfant a droiteet a gauche ce qui provoque un flechissement du bras chez les enfants supposesdyslexiques. Le but est donc de mesurer l’orientation du bras en rapport avecles rotations de la tete. La position des bras dans l’image est obtenue en proje-tant les pixels de peau sur les axes x et y, les zones importantes de peau (bras,tete) provoquant des pics sur les projections. L’orientation des bras est ensuitecalculee en detectant la ligne directrice de chacun des bras. Plusieurs detecteursde droites ont ete testes, comme la transformee de Hough ou la regression auxmoindres carrees. La solution retenue utilise la regression robuste pour calculerla droite principale des bras tout en ignorant les points aberrants.

Deux approches ont etes envisagees pour l’estimation des rotations de la tete.La premiere est une simple methode de correlation d’une image a la suivante.Apres avoir obtenu la position de la tete a partir de celle des bras, la region dela tete subi une suite de rotations dans l’intervalle des angles possibles. Pourchaque rotation la difference entre la tete transformee et la tete dans l’imagesuivante est calculee. L’angle de rotation qui minimise cette difference est retenucomme solution. La seconde methode etudiee pour estimer la rotation est baseesur le flot optique. Le flot optique est un champ de vecteurs qui represente lemouvement apparent des objets dans l’image. Les vecteurs de mouvement d’unerotation etant distribues autour du centre de rotation, leurs perpendiculairesconvergent vers ce centre. L’idee est donc de tracer dans un accumulateur lesdroites perpendiculaires aux vecteurs de mouvement et de considerer le pointd’intersection maximum comme centre de rotation. Une fois le centre de rota-tion estime, l’angle de rotation est calcule par simple trigonometrie.

Les principales difficultes rencontrees lors du developpement du detecteur depeau et de l’analyse de l’exercice 10 sont dues aux nombreuses differences d’unesession a l’autre, que ce soit dans les couleurs (peau, vetement) comme dansles cadrages de camera. Certains problemes n’ont pas pu etre fixes, comme lapresence d’objets au premier plan faussant la localisation des bras.

Page 6: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

v

Les precedentes techniques ont finalement ete appliquees sur les sequencesdu projet DysVideo grace a la technologie DirectShow de Microsoft. Cette tech-nologie permet l’analyse et le traitement de donnees multimedia en temps reel.L’architecture DirectShow divise le traitement d’un flux multimedia en une suitede taches appelees filtres. Beaucoup de filtres sont disponibles pour la capture,la decompression ou encore le rendu de videos. L’utilisateur peut developperses propres filtres adaptes a ses besoins. C’est ce qui a ete fait pour le pro-jet DysVideo avec le developpement de filtres de transformations. Ces filtresrecoivent chaque image d’une video en provenance du filtre de decompression,et fournissent les images transformees au filtre de rendu. Cela permet doncd’appliquer facilement les traitements en s’affranchissant du decodage et del’affichage de la video.

Mots clefs

Traitement d’image, analyse d’image, mesure de mouvement, detection de peau,detection de droites, estimation de rotation, DirectShow, dyslexie, Trinity CollegeDublin, SigMedia.

Page 7: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

Contents

I Introduction and Project Overview 1

1 Introduction 2

2 Introducing the Company and Project 32.1 Company Description . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 Trinity College . . . . . . . . . . . . . . . . . . . . . . . . 32.1.2 SigMedia . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Project Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

II Work Realized 7

3 Skin Detector 93.1 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1.1 Colour spaces . . . . . . . . . . . . . . . . . . . . . . . . . 93.1.2 Skin Modelling . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2 Our Skin Detector . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2.1 Colour spaces and Related Rules . . . . . . . . . . . . . . 113.2.2 Difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2.3 Final Results . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Exercise #10 174.1 Pre-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.1 Arm Localisation . . . . . . . . . . . . . . . . . . . . . . . 174.1.2 Arm Skeleton . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2 Line Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.2.1 Hough Transform . . . . . . . . . . . . . . . . . . . . . . . 204.2.2 Least Square Regression . . . . . . . . . . . . . . . . . . . 214.2.3 Robust Regression . . . . . . . . . . . . . . . . . . . . . . 23

4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.4 Difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.4.1 Occluding Objects . . . . . . . . . . . . . . . . . . . . . . 264.4.2 Arm and Knee Merging . . . . . . . . . . . . . . . . . . . 274.4.3 Distance from the camera . . . . . . . . . . . . . . . . . . 28

vi

Page 8: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CONTENTS vii

4.5 Head Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.5.1 Correlation Based Method . . . . . . . . . . . . . . . . . . 294.5.2 Optical Flow Based Method . . . . . . . . . . . . . . . . . 31

4.6 Compression Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5 DirectShow 375.1 Technology Presentation . . . . . . . . . . . . . . . . . . . . . . . 375.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.2.1 Transform Filter . . . . . . . . . . . . . . . . . . . . . . . 385.2.2 Command Line Program . . . . . . . . . . . . . . . . . . . 38

6 Conclusion 40

Page 9: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

Part I

Introduction and ProjectOverview

1

Page 10: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

Chapter 1

Introduction

My training course took place in the SigMedia research group of Trinity CollegeDublin. SigMedia is at the present time working with the department of Psy-chology of Trinity College on a video analysis application for automatic diagnosisof Dyslexia. This project, called DysVideo and sponsored by Enterprise Ireland,is based on the recently discovered persistence of primary reflexes in Dyslexicchildren. The SigMedia group is working on video analysis algorithms to detectand measure primary reflexes. I took part in this project by developing algo-rithms for measurement of one of these primary reflexes. In the following reportI will first present Trinity College and the SigMedia group. A description of theDysVideo project will then be done. Finally the algorithms I used to performthe movement measurement will be explained.

2

Page 11: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

Chapter 2

Introducing the Company andProject

2.1 Company Description

I spent the 5 months of my training course in the SigMedia research group,at Trinity College Dublin. After a short presentation of Trinity College I willintroduce the SigMedia group and its research activities.

2.1.1 Trinity College

Founded by Queen Elizabeth the First in 1592, Trinity College Dublin is theoldest university in Ireland and one of the older universities of Western Europe.

Based on the general pattern of the ancient colleges like Oxford and Cam-bridge, Trinity campus extends over 40 acres in the heart Dublin. There are6 faculties in the university: Arts (Humanities), Arts (Letters), Business, Eco-nomic and Social Studies, Engineering and Systems Sciences, Health Scienceand Science. Almost 15,000 students are registered at Trinity, with almost 10%coming from foreign countries.

Many famous Irish people, like Oscar Wilde, Samuel Beckett or Douglas Hyde(Ireland’s first president) were educated at Trinity. Mary MacAleese (currentIreland’s President) and Mary Robinson (last but one President and former UNhigh commissioner on human rights) were on the staff of the College. Trinity isalso one of Ireland’s major tourist attractions, notably for the Old Library (Fig.2.1), a building from the 18th century which contains a priceless 9th centurymanuscript: The Book of Kells.

3

Page 12: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 2. INTRODUCING THE COMPANY AND PROJECT 4

Figure 2.1: Trinity College and the Old Library (pictures from www.tcd.ie)

2.1.2 SigMedia

The Signal Processing and Media Application Group (SigMediaApps) is a re-search group of the Department of Electronic and Electrical Engineering in Trin-ity College Dublin. The group is led by Dr Anil Kokaram, who has performedextensive research in the area of Image and Video Restoration. Currently hecoordinates an EU funded Research Training Network in Information Retrieval(MOUMIR) and is also involved with the EU Project Prestospace which dealswith Restoration of Archived Video. SigMedia is in the present time composedof 4 doctors (Dr Anil Kokaram, Dr Rozenn Dahyot, Dr Laurent Joyeux and DrSid-Ahmed Berrani) and 8 PhD students.

SigMedia has been involved in a wide variety of projects including SportVideo Retrieval and Indexing, and Automatic Digital Film Restoration (Fig.2.2). The current research projects are Video Indexing for the MOUMIR project(www.moumir.org), View Interpolation Improvement for Sport Physiotherapy/Medicine, Content Analysis for Sports Media Streaming (CASMS : Fig. 2.2),and Dyslexia Treatment using Video Content Retrieval and Analysis Technology(DysVideo). Most of the projects in the group are funded by Enterprise Irelandand E.U. There is also a connection with film special effects through a companycalled The Foundry, in London, UK.

Figure 2.2: Film Restoration (left) and CASMS Project (right)

Page 13: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 2. INTRODUCING THE COMPANY AND PROJECT 5

2.2 Project Overview

The project I worked on is a collaboration between the Department of Psychol-ogy in Trinity College and the SigMedia group. The aim of this project, calledDysVideo, is to offer a software video analysis toolkit for diagnosis and treatmentof Dyslexia.

The project relies on the recently discovered connection between Dyslexia(also known as ’Specific Learning Difficulty’) and persistence of primary reflexes[3]. Primary reflexes, like the sucking reflex, assist the child during the firstmonths of his life. In normal development, these reflexes become integrated intopostural reflexes to allow the child to progress to the next stage of movement.But in Dyslexics the early reflexes persist, which makes the child unable to per-form certain movements. So the idea is to observe children performing physicalexercises designed to trigger primary reflexes (Fig. 2.3, Fig. 2.4). These obser-vations could allow diagnosis of Dyslexia before usual clues like failure to readbecome apparent. Moreover, Dyslexia can be treated by repeating these specificmovements.

Figure 2.3: Physical exercise #10

Figure 2.4: Physical exercise #12

Page 14: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 2. INTRODUCING THE COMPANY AND PROJECT 6

For the DysVideo project, each session of exercises is recorded, compressedand stored for offline analysis (Fig. 2.5). Each child engages in 3 sessions of 14exercises, so the final set of sequences will represent 500 hours of video. A browserhas been developed to allow Psychologists access to a particular exercise, for agiven session and child. But visual inspection of each video is time consumingand unreliable, since some movements are very subtle. Thus the SigMedia groupis working on a video analysis technology to perform an automatic scoring basedon movement evaluation. There are 14 exercises, 6 correspond to primary reflexesand have to be assessed automatically.

Figure 2.5: DysVideo architecture

The DysVideo system is designed to be used in schools and clinics. So, thesystem has to be affordable and mobile, with a minimal set of equipment (PC,video camera and Video interface card). Consequently the movement evaluationhas to use a single camera view to extract useful position related information.Moreover the system could not use complex technologies like magnetic sensorsplaced on the subject because it might be too uncomfortable for a child, and cer-tainly not robust enough because of variations of magnetic field from one roomto the other.

Finally the DysVideo Project is a potential product for the creation of astart-up working in collaboration with clinics and schools.

Page 15: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

Part II

Work Realized

7

Page 16: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

8

During my training period, the video analysis has been focused on two exer-cises based on hands and arms movements. To perform the movement estimation,limbs had to be extracted. The easiest way to extract human limbs is to use askin detector. The first task was to improve the skin detector currently used inthe group and test it on a large set of sequences (more than 100).

The second part of my project has been to develop algorithms for the move-ment detection of the two exercises (#12 and #10) and implement these algo-rithms as DirectShow filters. The main part of my work concentrated on analysisand development for the exercise #10. For exercise #12 my task was to imple-ment current algorithm in C.

In the following sections the skin detector will first be explained. The analysisof exercise #10 will then be described. Finally the DirectShow technology usedto implement the video analysis will be presented.

Page 17: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

Chapter 3

Skin Detector

As previously stated, the DysVideo system cannot use multiple camera view ormagnetic sensors to measure body movements. Thus to detect body position acommon image processing technique, the skin detector, has been used. After areview of skin detection algorithms, an explanation of our own detector and aset of the results will be given.

3.1 State of the Art

Skin detection for body or face localisation has been the topic of numerous publi-cations [10, 6]. Many heuristics and methods have been proposed [14], pixel-basedcolour segmentation is the most popular method because it is simple and fast.Another method is region-based segmentation, using spatial arrangement of skinpixels. This second method is used as a post processing stage of our pixel-baseddetector to enhance the final result.

The two main difficulties of skin colour segmentation are to choose the rightcolour space and skin model. In the following section, the most popular colourspaces used in skin detection and different ways to define skin models in thesecolour spaces will be presented.

3.1.1 Colour spaces

There are many colour spaces in computer graphics, each of them has differentproperties according to their origins (colourimetry, video signal transmission,compression...). The most popular colour spaces in skin detection are RGB,normalised RGB and YUV.

RGB

The RGB colour space corresponds to the way pictures are displayed by a mon-itor, with three monochromatic rays: Red, Green and Blue. It is the mostused representation to store and process image data. But correlation between

9

Page 18: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 3. SKIN DETECTOR 10

the three channels and mixing of chrominance and luminance data make coloursegmentation in this representation sensitive to light variations.

Normalised RGB

The normalised RGB representation is a simple normalisation of the RGB valuesas follows:

r(R, G,B) =R

R + G + B(3.1)

g(R, G,B) =G

R + G + B(3.2)

b(R, G,B) =B

R + G + B(3.3)

Where r, g and b are the transformed colour channels. As the sum of thethree components is equal to 1, one can be omitted to reduce space dimension.In comparison to the RGB colour space, the normalised RGB is less sensitiveto light variations because of the normalisation [13]. Consider that due to lightintensity variation each colour plane is scaled by same factor α, then the newcomponents rgb remain unchanged:

r(αR, αG, αB) = r(R,G,B) (3.4)

g(αR, αG, αB) = g(R,G,B) (3.5)

b(αR, αG, αB) = b(R,G,B) (3.6)

YUV

YUV is defined by a linear combination of RGB values [12]:

Y = 0.299R + 0.587G + 0.114B (3.7)

U = −0.169R− 0.331G + 0.500B + 128.0 (3.8)

V = 0.500R− 0.419G− 0.081B + 128.0 (3.9)

This colour space separates luminance (Y) and chrominance (U and V), sothe space dimension can be reduced by only concidering U and V components.Moreover, YUV is often used in video and compression formats, so conversion isnot always necessary.

3.1.2 Skin Modelling

Once a colour space has been chosen, the second step in extracting skin is todefine decision rules to discriminate skin and non-skin pixels. The two mainkinds of rules are explicitly defined skin regions and skin distribution modelling.

Page 19: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 3. SKIN DETECTOR 11

Explicit Definition of Skin Colour

This method attempts to define boundaries of a skin cluster in the selected colourspace. The boundaries are given by several rules and thresholds, which produce asimple and fast extractor. The difficulty is to find good rules to extract skin pixelswithout extracting other features like clothes. In the RGB colour space, theserules can be found intuitively (skin has a red value larger than green and blue),but in other colour spaces it is not as easy. Gomez and Morales [4] proposeda machine learning approach to make simple rules combining colours attributesand arithmetic operators.

Skin Colour Distribution Modelling

The idea of this method is to use segmented images as a training dataset to findthe skin model, and then apply this model to new images. Several techniquescan be used to perform this modelling, like the Bayes classifier or the Kohonenmap. The main advantage of this method is its independence to the shape ofskin distribution. But the drawback is that it requires time and storage space toprocess training data.

3.2 Our Skin Detector

The entire analysis of the dyslexia videos has to be processed in real-time. Con-sequently the fastest method to extract skin pixels has been chosen: the explicitdefinition of skin colour. Therefore, finding the best colour space and decisionrules for our videos was the main consideration for the skin detector. The secondconsideration was to improve the detector to deal with bad detections.

3.2.1 Colour spaces and Related Rules

YUV

A skin detector had been previously developed in the YUV colour space, whichhas the least correlation between channels and the least sensitivity to light vari-ations. By translating the basic skin RGB rule (red > blue and green, Fig. 3.1)in the YUV colour space (Fig. 3.2), the following rule had been obtained:

U ≤ 128− ε and V ≥ 128 + ε (3.10)

Where ε is set to reject grey pixels (0 ≤ ε ≤ 10). This rule gave good results,even on black skin. But it is too sensitive to red and yellow colours, which canbe a problem for children’s clothes (Fig. 3.3).

Page 20: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 3. SKIN DETECTOR 12

Figure 3.1: Skin pixels distribution in RGB colour space (R > B and G)

Figure 3.2: Skin pixels distribution in UV colour space (U ≤ 128 and V ≥128)

Figure 3.3: Red and yellow sensitivity in YUV colour space, skin pixels in red,ε = 10

Page 21: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 3. SKIN DETECTOR 13

Normalised RGB

The normalised RGB colour space is potentially a good colour space because ofits relative independence to light variations. But this colour space is not intuitiveenough to easily formulate rules and thresholds to define a colour cluster. Therules obtained by Gomez and Morales [4] have been tried:

r

g> 1.185 and (3.11)

r ∗ b

(r + g + b)2> 0.107 and (3.12)

r ∗ g

(r + g + b)2> 0.112 (3.13)

But these rules did not produce good results on our videos (Fig. 3.4). Tofind correct rules, the same kind of machine learning algorithm as detailed byGomez and Morales should be applied, but this method is more complex thanthe empirical formulation used in the other colour spaces.

Figure 3.4: Errors in normalised RGB colour space, skin pixels in red

RGB

This colour space is apparently worse than the others because of its variationaccording to luminance conditions. But in our case this drawback is not soimportant because scenes are indoors, with constant lighting. Moreover, theRGB colour space is more intuitive, which makes rule formulation easier. Forour videos, the rules proposed by Peer, Kovac and Solina [10] have been used.The thresholds have been adapted to the lighting conditions:

R > 95 and G > 80 and B > 40 skin colour (3.14)

and R > G and R > B blood in skin (3.15)

and R−min(G, B) > 10 avoid grey pixels (3.16)

and R−G > 15 avoid yellow pixel (3.17)

Page 22: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 3. SKIN DETECTOR 14

These rules give excellent results and detect enough skin to apply the otheralgorithms in almost 85% of the cases (visual tests on 100 sequences of 1min 30s,resolution 720× 576 pels). The errors are due to shadow or brightness, but canbe avoided by reducing thresholds of equations Eq. 3.16 and Eq. 3.17. We cansee in Fig. 3.5 that a few regions of skin are undetected in exercise #12, butthe results are still good because we are interested in finding the finger positionswhich are well extracted.

Figure 3.5: Good results in RGB colour space for exercise #12 (R > 95, G > 80,B > 40, R−min(G, B) > 10, R−G > 15), skin pixels in red

The difficulty of these rules is to find appropriate thresholds. Thresholdswithin an exercise can be the same, but different thresholds need to be used fordifferent exercise according to the body orientation. For the exercise #10 thethresholds have to be less strict than in exercise #12 because arms are in thechild’s shadow and receive green light from the ground. Thus sometimes clothesare selected. But this is not a problem because the feature to compute is thedirection of the arms, which are on the foreground and occlude other objects(Fig. 3.6).

Figure 3.6: Good results in RGB colour space for exercise #10 (R > 70, G > 60,B > 20, R−min(G, B) > 5), skin pixels in red

Page 23: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 3. SKIN DETECTOR 15

3.2.2 Difficulties

The RGB skin detector previously presented gives good results on most of thesequences. But we had to deal with bad detections which could cause holes inskin regions. Another important problem to resolve was the colour deviationbetween different videos.

Missed Pixels

As was stated earlier, the skin detector sometimes does not detect some skin be-cause of bad lighting conditions (shadow or brightness). To correct this drawbackthe region-based segmentation method is used to post process the results. Theidea is to use detected skin pixels as starting point of a growing region. Each lineof the image is extended to both the left and right while colour is homogeneous.This fast processing fills almost 80% of holes (Fig. 3.7).

Figure 3.7: Holes due to bad detection (left), and its correction (right)

A morphological closing filtering (3 pixels dilation + 4 pixels erosion) is finallyapplied to fill the remaining holes and remove noise.

Colour Deviation

One of the main difficulties with the skin detection was the deviation of the bluechannel on several videos. The cause of this artefact has not been found, butthe artificial lighting conditions are suspected. To correct it an object presenton every video (the carpet) is extracted, the average colour of this object is com-puted, and the result is compared with a reference colour. The carpet extractionis done with a green pixel detector based on the same kind of rules as the skindetector: G > 120, B−R > 50 and G−R > 60. The comparison gives correctioncoefficients (difference between reference colour and average colour), which areadded to every pixels of the picture. This process gives good results and allowsthe application of the skin detector (Fig. 3.8).

Page 24: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 3. SKIN DETECTOR 16

Figure 3.8: Image with a deviation of the blue channel and its correction

3.2.3 Final Results

After applying the different enhancements the skin detector gives good resultsfor almost 90% of the sequences (visual evaluation on 100 sequences of 1min30s, resolution 720 × 576 pels). The results are better on exercise #10 than onexercise #12 because the thresholds for exercise #10 are less strict. The falsealarms on exercise #12 are often due to clothes’ colour, sometimes very similarto skin colour (pink, yellow or beige colours).

Page 25: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

Chapter 4

Exercise #10

Ayres Test is one of the exercises designed to trigger an Asymmetrical Tonic NeckReflex (ATNR) [3]. During this exercise, the child is on all fours, head turnedto the camera. The supervisor turns the child’s head left and right, which maytrigger a tremor or a bend in the child arms. The goal is to measure the size andspeed of the variation in the angle of the bend, according to head rotations. Forthis purpose, the image is pre-processed to extract the main line of the arms,and line estimation is applied to compute the equations. The line slope givesthe orientation of the arm. Two methods have been considered to measure headrotations from one frame to the other: a simple correlation method and an opticalflow method.

4.1 Pre-Processing

Pre-processing is a necessary step after skin detection to identify the significantinformation. An interest zone is first defined as a bounding box of arms, and theskeleton of skin regions is then computed in this zone to obtain the main lines.

4.1.1 Arm Localisation

The image computed by the skin detector contains skin of the child’s arms, butalso skin of his head and that of the supervisor. To allow the line detection toignore these other skin regions, a bounding box has to be defined to only includechild’s arms. The projection of skin pixels on x and y axis is used to find theinteresting skin zones.

Horizontal Bounds

Pixels of the bottom half part of the image are first projected on x axis. Indeed,the top part of the image only contains the child’s head or the supervisor’sarms. This projection gives a histogram with two main maxima (m1 and m2)which correspond to the two arms. The bounds are then propagated from these

17

Page 26: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 18

maxima while there is skin projected (Fig. 4.1). The maxima are defined as thetwo highest values of the projection, separated by at least 50 pixels. When thereare other high values (> (m1 + m2)/3), the ones which delimit the more centredzone are selected, because in most of the cases the child is at the centre of thescreen.

Figure 4.1: x projection and the related bounds

Vertical Bounds

Once the horizontal bounds have been found, skin pixels in these bounds areprojected on the y axis. The result is a histogram with a first maximum fromthe bottom which corresponds to hands. Hands must not be selected becausemain lines of fingers could perturb the arm line calculation. The hand size isestimated by measuring the size S between the beginning of the projection andthe first local maximum. Thus the first vertical bound is positioned 1.5×S pixelsafter the beginning of the projection, which approximately corresponds to thewrist position.

To find the second vertical bound the algorithm uses several rules. The firstrule, of course, is to put the bound where the skin projection stops, as in Fig. 4.2.But there is sometimes a connection between the arm and the head skin zone.In that case proportions of the child body are used. Indeed, after measuringthe size of arms and hands on 30 sequences, an average ratio has been found:arm size = 2.5× hand size. As hand size is known, the ratio is applied to obtainthe second vertical bound.

Page 27: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 19

Figure 4.2: y projection and the related bounds

4.1.2 Arm Skeleton

The second step after finding the bounding box is to extract the arm main line toapply a line estimator. To perform this task a very simple skeleton is calculatedby taking the middle of each skin zone on every line of the frame (Fig. 4.3).The skeleton produces a coarse line used by the following line estimations andreduces the number of points to take in account.

Figure 4.3: Skeleton (blue pixels) of skin zone (red pixels)

Page 28: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 20

4.2 Line Estimation

Line estimation is the final step which gives us equations of arms direction line.These equations will then be used to calculate feature vectors for dyslexia diag-nosis. As a consequence a precise and robust method had to be developed toobtain reliable results. The different methods tested are the Hough transform,the least square regression and the robust regression.

4.2.1 Hough Transform

The Hough transform is a famous and simple technique used in image analysisto find straight lines [7]. The idea of this algorithm is to count, for each possibleline equation, the number of pixels which belong to it in the image. The linecontaining the most pixels is the main line of the image.

To define each line the standard equation y = ax + b could be used, but itdoes not define vertical line (x = k). As a result, in most of the cases the Houghalgorithm uses another definition: ρ = xcosθ + ysinθ (Fig. 4.4).

Figure 4.4: Parametric line definition

For each pair of parameters (ρ, θ) the algorithm stores the number of pixelsintersected in an array called accumulator, or Hough space. Maxima of theaccumulator indicates the ρ and θ of the important lines. In most of the cases,a line does not correspond to a single point in the Hough space but to a wholearea. Therefore the accumulator is filtered to only contain local maxima (Fig.4.5).

Figure 4.5: Hough accumulator (left) and its local maxima (right)

The main drawback of the Hough transform is the high cost of testing everypossibility. In our case this problem is resolved by reducing the variation domainof θ to [−π/8, π/8] because arm lines are almost vertical.

Page 29: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 21

However, Hough transform is still not very appropriate in our case since it canonly match with existing points. As the skeleton line is a little curved, Houghtransform does not give an average line of the set of data, which might be moreprecise for our application. Moreover, from one frame to another the Houghlines can match on different points, which produce oscillations and noise in finalresults (Fig. 4.6). As a result we decided to use linear regression to estimate theorientation of the lines.

Figure 4.6: Oscillations of the Hough line (green pixels)

4.2.2 Least Square Regression

The least square regression algorithm estimates the equation y = f(x) whichminimise the sum of the squared distance between yi and f(xi) for the set of dataM(xi, yi). The distance minimised corresponds to the error of the estimation.

In our case lines are vertical, so the least square method is used to find a fitto the line equation: x = cy +d, where x and y are the variables, c the slope andd the intercept with x axis. In this case the error to minimise is Eq. 4.1:

E(c, d) =N∑

k=1

(cyk + d− xk)2 (4.1)

The equation Eq. 4.1 reaches its minimum value when its partial derivativesare equal to zero [9]:

∂E(c, d)

∂c= 2

N∑k=1

(cy2k + dyk − xkyk) = 0 (4.2)

= cN∑

k=1

y2k + d

N∑k=1

yk −N∑

k=1

xkyk = 0 (4.3)

Page 30: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 22

∂E(c, d)

∂d= 2

N∑k=1

(cyk + d− xk) = 0 (4.4)

= cN∑

k=1

yk + Nd−N∑

k=1

xk = 0 (4.5)

The least square line parameters are obtained by solving equations Eq. 4.3and Eq. 4.5[8]:

c =

∑Nk=1 xk

∑Nk=1 yk −N

∑Nk=1 xkyk

(∑N

k=1 yk)2 −N∑N

k=1 y2k

(4.6)

d =−

∑Nk=1 xk

∑Nk=1 y2

k +∑N

k=1 xkyk

∑Nk=1 yk

(∑N

k=1 yk)2 −N∑N

k=1 y2k

(4.7)

To apply the least square estimator on both arms, bounds have to be de-fined for each arm. These bounds are initialised with the two main skin zonesintersecting the bottom line of the bounding box. As we can see in Fig. 4.7,least square regression gives much better results than the Hough transform bycalculating an average line of the data set.

Figure 4.7: Hough line (left) and least square line (right). Arm bounds in cyan

But this averaging can be a drawback when there are bad data points too farfrom the main cloud of data, which can produce a deviation of the resulting line(Fig. 4.8). Consequently, to avoid this problem robust regression is used.

Page 31: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 23

Figure 4.8: Least square line deviated by bad points

4.2.3 Robust Regression

The problem of the least square estimator is that the further a point is from themain line, the more it influences the result (Fig. 4.9).

Figure 4.9: Least square estimator

The idea of robust regression is to use other estimators which minimise theinfluence of bad points which are too far from the model. Several estimatorshave been tested like the Lorentzian or Andrews’ Sine [2]:

Lorentzian:

E(c, d, σ) =N∑

k=1

log

(1 +

1

2

(cyk + d− xk

σ

)2)

(4.8)

Andrews’ Sine:

E(c, d, r) =N∑

k=1

{−r cos

(cyk+d−xk

r

)if |cyk + d− xk| < πr,

r otherwise.(4.9)

Where σ and r correspond to the width of the estimator. In our case theseparameters are initialised with arm width.

Andrew’s Sine gives best results because after the specified distance the errordoes not increase and for points inside the distance the error decreases (Fig.4.10).

Page 32: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 24

Figure 4.10: Lorentzian (left) and Andrews’ Sine (right) estimators

But these estimators are more complex than the least square estimator, soan analytic solution like Eq. 4.6 and Eq. 4.7 could not be found. Thus amultiresolution exhaustive search is performed to find the correct line parame-ters. Arm lines are vertical, so a rough estimate of the parameters which min-imise the error is made in the slope interval [−0.3, 0.3], and the origin interval[arm first horizontal bound, arm second horizontal bound]. Another estimate isthen performed in a half sized interval centred on the rough estimate. The finalestimation of line parameters is obtained by iterating this process.

The solution gives an average line of the main cloud of points, without beingaffected by bad points (Fig. 4.11).

Figure 4.11: Line deviation corrected by robust estimation

4.3 Results

The previous algorithm computes two line equations for each frame. The slopeterm of these equations gives the orientation of arms represented by these lines(Fig. 4.12).

Page 33: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 25

Figure 4.12: Line slope variations according to arm movements

In most of the sequences, a few minutes are used to explain the exercise tothe child or to setup the camera position and zoom. During this time the childuse to move his hands, and arm orientation measurement are considered invalid.So to allow detection of the start and end of the exercise period, the two handabscissas (intersections between the estimated arm lines and the axis y = wristposition) are added to final results (Fig. 4.13).

The hand abscissa is also used to detect errors like occlusion which producessudden changes in the line detection. Thus in final results, only periods wherehand positions are constant are taken into account.

Page 34: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 26

Figure 4.13: Line slope and hand abscissa variations

4.4 Difficulties

The algorithm used for the exercise #10 gives good results in almost 80% of thecases (visual evaluation on 30 sequences of 1min 30s, resolution 720× 576 pels).But there are still unsolved problems which produce bad results. These problemsare occluding objects, arm and knee merging and the child being positioned toofar from the camera.

4.4.1 Occluding Objects

The main condition to obtain good results for the line detection is a good detec-tion of the arms. But in several videos this detection is disturbed by occludingobjects like the supervisor’s hair or arms (Fig. 4.14). These occlusions can notbe corrected, but in most of the cases they are detectable on the hand abscissaplot. However, small occlusions are sometimes hard to detect, e.g. when thechild puts its head in front of its arm.

Page 35: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 27

Some occluding objects also produce bad results on the entire sequence, likewhen a fluffy animal is in front of the child: the bounding box locks on it.

Figure 4.14: Samples of occlusions

4.4.2 Arm and Knee Merging

Another difficulty in arm detection is when arms and knees form a single region:knee pixels cause a deviation of the resulting line. The robust estimation cancorrect this kind of defect, but the estimator is not robust enough when the kneeis longer than the half size of the arm, or when knee pixels are not far enoughfrom the main line.

An edge detector is used to divide the two areas (Fig. 4.15). The edgedetector selects pixels where the absolute value of horizontal gradient is superiorto 20, and labels these pixels as non-skin pixels. Consequently when the skeletonof skin pixels is computed it produces one skeleton for each area (arm and knee).Thus the robust estimator only fit the line with the arm skeleton. This processworks in almost 50% of the merging cases, but there are still bad results whenknee and arm colours are too similar.

Page 36: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 28

Figure 4.15: Arm and knee merging, and its correction

4.4.3 Distance from the camera

The distance from the child to the camera can vary significantly from one se-quence to another (the child size can be magnified by a factor of 2.5). When thechild is too far from the camera, the lines are well detected but the arms are toosmall to give precise results. It produces noisy results, less reliable for the finaldiagnosis (Fig. 4.16).

Page 37: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 29

Figure 4.16: Child far from the camera (left) and the noisy result (right)

4.5 Head Rotation

In the exercise #10, psychologists have defined the arm bend during head ro-tations as a possible indication of Dyslexia. Thus, once the arm angles havebeen calculated, the program has to estimate head rotations to extract the ap-propriate arm measurement episodes. As for the previous movement estimation,this process is decomposed in two steps: localisation and movement estimation.The following section presents two methods to perform these steps: a correlationbased method and an optical flow based method.

4.5.1 Correlation Based Method

Head Localisation

The head localisation is made easy by knowing the arms’ positions. Indeed, asthe head use to be between the two arms, the vertical head bounds are initialisedwith the arm bounds. As the skin detector extracts pixels of the child’s faceand supervisor’s hands, a projection on the y axis of skin pixels between thechild arms is used to define horizontal head bounds. The projection is filtered toremove noise and weights are applied to the projection according to body scalesto highlight the head zone. The lower horizontal bound correspond to the firstline of the main skin zone (starting from the bottom), and the upper horizontalbound is set to define a square bounding box (Fig. 4.17).

Page 38: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 30

Figure 4.17: Head Localisation

This head localisation is fast to process but does not give reliable results onevery sequence. Indeed, the child’s head is not always perfectly between the twoarms. But it is difficult to find another solution because hair colour and texturecan be very different between two children.

Movement Estimation

The correlation algorithm used to estimate the head rotation between two framesuses a direct matching method. The idea is to compute each possible rotation ofthe first frame and select the one which is the most correlated with the second.So for a frame n, the circular zone of the head in frame n-1 is rotated from −10◦

to 10◦, and the difference between the head in each rotated image and the headin frame n is computed. To avoid camera or head translation, each image isalso translated in a 10× 10 pel neighbourhood. The angle which minimizes thedifference is the estimation of the rotation (Fig. 4.18).

Figure 4.18: Rotation estimation

Page 39: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 31

This process gives the head rotation angle between each frame, and the cu-mulative sum of these angles gives the head rotation during the sequence (Fig.4.19).

Figure 4.19: Results of the correlation based rotation estimation

But the quality of this rotation estimation depends on the quality of the headlocalisation. Consequently, a method using optical flow motion estimation hasbeen considered.

4.5.2 Optical Flow Based Method

Optical flow is a vector field which represents, from one frame to the other, theapparent motion of objects projected on the image plane (Fig. 4.20). The motionestimator used to compute the optical flow for the following algorithm is basedon the gradient based approach of J. Biemond [1, 5].

Page 40: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 32

Figure 4.20: Optical flow on a DysVideo frame

Head Localisation

The head localisation is performed by estimating the centre of the head rota-tion. As motion vectors are distributed around the rotation centre, the centrecorresponds to the convergence point of perpendicular motion vectors. Thus thealgorithm computes perpendicular vectors of each significant motion vector, anddraws lines from these vectors in an accumulator. Intersections have a highervalue in the accumulator. So the centre of rotation, where most of the linesconverge on, is located on the maximum value of the accumulator. In realitythe centre is not a single point but an area of many intersections, thus a localdensity is computed to find the centre, representing most of the intersections(Fig. 4.21). The local density is calculated for each pixel by summing the valuesof the intersections in a 10 × 10 pel neighbourhood, balanced by the distancefrom the intersection to the pixel.

Figure 4.21: Steps of the head localisation algorithm

This head localisation cannot give good results when the head is not rotating,but this is not a problem in our case because psychologists only measure arm

Page 41: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 33

bends during head rotation to score dyslexia. Results during periods where thehead is not rotated are ignored.

To detect periods where the head is rotating, the maximum density value canbe used. Indeed, this value is only high when many lines are converging on thesame point, i.e. when there are significant rotations (Fig. 4.22).

Figure 4.22: Variations of the maximum density. High values (>1000) correspondto rotations

Movement Estimation

Once the rotation centre has been estimated, the angle of rotation is computedusing simple trigonometry. For each motion vector, the sine of the angle iscomputing by dividing the vector’s norm by the distance between the end of thevector and the centre of rotation (Fig. 4.23).

Figure 4.23: Angle estimation: sin α = AB

The inverse sine gives the absolute value of the angle. The sign of the angle isthen computing by considering the sign of the motion vector’s y component andthe position of the motion vector according to the centre of rotation. Indeed, ifthe vector is to the left of the rotation centre, the sign of the angle is the inverseof the sign of the motion vector’s y component. If the vector is to the right ofthe rotation centre, the sign of the angle is the sign of the motion vector’s ycomponent (Fig. 4.24).

Page 42: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 34

Figure 4.24: Computation of the angle sign

By applying this process on every significant motion vector, an array of angleis obtained. The medium value of this array is chosen as the estimated valueof the angle. After extracting the rotation periods by thresholding the densityvalue, the final angle variations are computed (Fig. 4.25).

Figure 4.25: Final angle variations

The optical flow based method has only been tested on 2 sequences but it givesinteresting results. The main difficulty might be to find the correct threshold onthe density to perform a reliable extraction of rotation periods.

Page 43: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 35

Figure 4.26: Results of the optical flow based rotation estimation

4.6 Compression Rate

There is an enormous amount of data being recorded for the DysVideo project(more then 500 hours of video). It is not possible to store the data practically ina raw state. Consequently a MPEG4 compression is applied on sequences, witha 128 kB/s rate. To assess the effect of compression on the performance of thealgorithms, the result from compressed sequences was compared with the resultsfrom a raw sequence. Fig. 4.27 shows the effect of compression on arm bendmeasurement. The difference e is calculated as :

e =1

N

N∑k=1

||θk| − |θk|||θk|

(4.10)

where θk is the result on raw sequence and θk is the result on compressed sequence.

Page 44: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 4. EXERCISE #10 36

Figure 4.27: Difference between results on compressed sequences and results onthe original sequence

The current compression (128 kB/s rate) produces only a 6.5% rate of error.

Page 45: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

Chapter 5

DirectShow

DirectShow is the technology used in the DysVideo project to perform real timevideo analysis. Firstly the DirectShow technology will be presented and then itsuse in the DysVideo implementation will be explained.

5.1 Technology Presentation

DirectShow is a technology developed by Microsoft, as a part of the DirectXtechnology, to provide easy processing of multimedia streams. The DirectShowarchitecture divides media processing tasks as a set of steps called filters. Manyfilters are already available for media capture, conversion, decompression or ren-dering. In addition, users can develop their own filters for custom applications.

DirectShow filters are connected in a filter graph to perform video playback.A typical playback graph (Fig. 5.1) is composed of:

• a source filter, responsible for reading data from a file.

• a parser (split) filter, which separates audio and video data.

• decoders for audio and video that decompress the data.

• renderers for audio and video that play or draw the decompressed data.

37

Page 46: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 5. DIRECTSHOW 38

Figure 5.1: Typical avi playback graph, displayed in the GraphEdit application.

In a video processing application, the processing filter (called Transform Fil-ter) is connected between the video decoder and the video renderer. For eachframe, the decoder filter provides an image to the transform filter input, and therenderer receives the modified image from the transform filter output. Thus, forthe DysVideo project, a transform filter has to be developed for the analysis ofeach exercise.

5.2 Implementation

Two applications are used for the analysis of a DysVideo exercise. The firstapplication is of course the DirectShow transform filter which performs the videoanalysis and processing. The second application is a command line programwhich builds the filter graph to apply the transform filter on a specific video.

5.2.1 Transform Filter

The development of DirectShow filters is made easier by the Transform FilterWizard for Visual Studio, developed by Ross Cutler [11]. This wizard producesthe class definitions and method implementations for a basic filter, which onlycopy the input stream in the output stream. The only job for the developer isthen to use this filter as a skeleton, and implement his own operations on thestream.

5.2.2 Command Line Program

The command line program is developed to make it easy to apply a DirectShowtransform filter on any exercise of any sequence of the DysVideo project. Inthe DysVideo framework, this software is used after the scene cut processingand could be included as a functionality of the browser. The parameters of thecommand line are the child identity code (represented by 6 digits), the session

Page 47: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

CHAPTER 5. DIRECTSHOW 39

number, the exercise number, the DirectShow filter name, and the name of thefile folder to print result files.

To find the right video for a given exercise, session and child, the programuses the same data base as the browser. Start and end time of each exercise havebeen recorded by supervisors. But a few seconds are often used to prepare thechild before the real execution of the exercise, so a video analysis can be neededto find the exact exercise period. Once the video is found, the second step is tobuild the filter graph to play that video. The construction of a playback filtergraph is automatically made by calling the RenderFile function. The connectionof a transform filter is also performed automatically if the filter is added to thegraph before calling the RenderFile function.

Page 48: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

Chapter 6

Conclusion

This training course in the SigMedia group has been very beneficial for my aca-demic studies. The project allowed me to see practical applications of algorithmsI studied this year, like the Hough Transform or Least Square Estimation, andcompare them to other techniques, like Robust Regression.

Additionally, applying these image processing algorithms on a real projectmakes me discover the numerous difficulties due to the variety of configurationswhich can happen in the real world. Most of the time, there is not a perfectsolution, just a better solution than the others.

For future work a few improvements may be applied on the current algo-rithms. The skin detector presented in this report is simple and fast to process,but bad results due to shadows or clothes could perhaps be avoided by usingmore complicated techniques based on a training dataset. Arm orientations arewell estimated by the current algorithm but the head rotation estimation is notfinalised. A head tracking algorithm could be developed to improve the headlocalisation. An accurate head localisation is necessary for the correlation basedmethod but can also be used to define a bounding box for the computation ofthe optical flow.

For the movement measurement, visual marks could make the work mucheasier. The head rotation estimation may for example be simpler if the childcould wear a black helmet with a white line.

On a personal view, this period in a foreign country was a perfect way to openmy mind and discover new cultures. It also allows me to improve my English, agood English level being essential in the actual computer science area.

40

Page 49: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

Bibliography

[1] Biemond J., Boekee D. E., Looijenga L., Plompen R.H.J.M.: A pel-recursiveWiener based displacement estimation algorithmIn Signal Processing. vol 13, p.399-412, 1987

[2] Black Michael J.: Robust Incremental Optical Flow, 1992http://www.cs.brown.edu/people/black/Thesis/thesis.pdf

[3] Goddard S.: A Teachers Window into a Child’s Mind, A non-invasive ap-proach to solving learning and behaviour problemsFern Ridge Press, Oregon, 1996.

[4] Gomez G., Morales E.: Automatic feature construction and a simple ruleinduction algorithm for skin detectionIn Proc. of the ICML Workshop on Machine Learning in Computer Vision,31-38. 2002http://shasta.mpi-stuttgart.mpg.de/people/giovani/mlcv.pdf.gz

[5] Kokaram Anil: Motion Picture RestorationSpringer-Verlag, 1998

[6] Linder Yves: Developpement d’un systeme de reconnaissance visuelle pourRobota, 2003http://asl.epfl.ch/research/projects/RobotaPocketPC/rapport-yves-linder.pdf

[7] Maıtre H.: Un panorama de la transformee de HoughIn Traitement du signal 2(4): 306-617. 1985

[8] Mathews John H.: Lab for Least Squares Lines and Polynomials, 1998http://math.fullerton.edu/mathews/numerical/lp.htm

[9] Mathews John H., Fink Kurtis K.: Numerical Methods Using Matlab, 4thEdition, p254-255, 2004http://math.fullerton.edu/mathews/n2003/leastsquaresline/LeastSqLineProof.pdf

[10] Peer P., Kovac J., Solina F.: Human skin colour clustering for face detectionIn submitted to EUROCON 2003 - International Conference on Computer as

41

Page 50: Video Movement Analysis for Diagnosis of Dyslexia - Freebousseau.adrien.free.fr/images/reportSigmedia.pdf · Video Movement Analysis for Diagnosis of Dyslexia A dissertation submitted

BIBLIOGRAPHY 42

a Tool. 2003http://lrv.fri.uni-lj.si/∼peterp/publications/eurocon03.pdf

[11] Mark D. Pesce. Programming Microsoft Directshow for digital video andtelevisionMicrosoft Press, 2003.

[12] Peyronnet Remi: Plugin GIMP : Transformation YUV - RGB, 2002http://people.via.ecp.fr/∼remi/ecp/tpi/rapport/yuv.html

[13] Skarbeck W., Koschan A.: Colour image segmentation -a survey-Institute for Technical Informatics, Technical University of Berlin. 1994http://iristown.engr.utk.edu/∼koschan/paper/coseg.pdf

[14] Vezhnevets V., Sazonov V., Andreeva A: A Survey on Pixel-Based SkinColor Detection TechniquesProc. Graphicon-2003, pp. 85-92. 2003http://graphics.cs.msu.su/en/publications/text/gc2003vsa.pdf