Voice Based Classification of Healthy Subjects and ...

1
Voice Based Classification of Healthy Subjects and Patients with Amyotrophic Lateral Sclerosis Suhas BN, Deep Patel, Nithin Rao, Yamini Belur, Pradeep Reddy, Nalini Atcharyam Ravi Yadav, Dipanjan Gope and Prasanta Kumar Ghosh SPIRE Lab, Electrical Engineering, Indian Institute of Science, Bengaluru 560012, India National Institute of Mental Health and Neurosciences, Bengaluru 560029, India PROBLEM STATEMENT To determine the choice of task and device in developing an assistive tool for detection and monitoring of ALS MOTIVATION Need for detection of ALS: Early detection helps better management Monitoring of ALS: Following the progress of the condition [1] CHALLENGES Identifying speech cues that help in better diagnosis [2]. Access for people of different socio-economic backgrounds DATASET Recording setup Moto G5 Plus (MOT), Zoom H6 X/Y recorder (ZOO), Xiaomi Redmi 4 (XIA), Dell XPS 15 (LAP), Apple iPhone 7 (IPH). Sampling freq 44.1 Khz Speech Stimuli Spontaneous (SPON), Diadochokinetic rate (DDK), Sustained Phonation (PHON) Native Language (count) Bengali (5 subjects each), Hindi (5 subjects each) , Kannada (5 subjects each), Odiya (3 subjects each), Tamil (3 subjects each), Telugu (4 subjects each). Total duration SPON (7 hours), DDK (5.36 hours), PHON (7.9 hours) WAVEFORMS AND SPECTROGRAMS (A) subject with ALSFRS-R 0 (B) subject with ALSFRS-R 2 (C) A healthy subject ALSFRS-R RATING FOR SPEECH Severity 0 1 2 3 4 Finding Loss of useful speech Speech combined with nonvocal communications Intelligible with repeating Detectable speech disturbance Normal Scan here for audio samples METHODOLOGY EXPERIMENTS 50 subjects, 5 fold cross validation setup. Per fold : 5 ALS & 5 Healthy Controls balanced for ALS severity, age and language. Classifiers : Support Vector Machines (SVM) & Deep Neural Networks (DNN) Parameters: Choices of N w = {0.5, 0.8, 1, 2, 3}s with N sh = 0.1s. Optimal N w =0.8s. SVM:C& γ selected by maximizing the performance on the validation set DNN: Activation Functions = {‘sigmoid’, ‘tanh’, ‘relu’} Hidden Layers = {1, 2, 3} & No. of Neurons={64, 128, 256, 512} for which the validation loss is minimized. RESULTS 1 2 3 70 80 90 SPON 1 2 3 70 80 90 1 2 3 70 80 90 1 2 3 70 80 90 1 2 3 70 80 90 1 2 3 70 80 90 DDK 1 2 3 70 80 90 1 2 3 70 80 90 1 2 3 70 80 90 1 2 3 70 80 90 1 2 3 70 80 90 PHON 1 2 3 70 80 90 1 2 3 70 80 90 1 2 3 70 80 90 1 2 3 70 80 90 FRAME LENGTH (s) A C C U R A C Y (%) Classification accuracy by varying N w . ? SVM, DNN. A comparative study on speech classification of ALS vs Healthy controls using 3 speech tasks & 5 recording devices. DDK task consistently performs better than other tasks while MOT performs the best across all devices. 0 5 10 15 20 25 30 35 45 50 55 60 65 70 75 80 85 SPON DDK PHON MOT A C C U R A C Y (%) C O E F F I C I E N T STATIC MFCC VELOCITY MFCC ACCELERATION MFCC Classification results of (3xN) coefficients of suprasegmental features of each mean, median and S.D of one MFCC. SPON: 3 rd , 15 th & 27 th dim of MFCC yield highest accuracy : 78.7%, 82.7% and 84.8% among 36-D MFCC : correspond to 4 th DCT basis function. DDK: 2 nd , 14 th & 26 th dim of MFCC yield highest accuracy : 71.2%, 84.4% and 84.4% among 36-D MFCC : correspond to 3 rd DCT basis function. REFERENCES 1. B.K. Yamini and A. Nalini, ‘Measures of maximum performance of speech related tasks in patients with amyotrophic lateral sclerosis,’ Amyotrophic Lateral Sclerosis, vol. 9, 2008. 2. Aravind Illa, Deep Patel, BK Yamini, N Shivashankar, Preethish Kumar Veeramani, Kiran Polavarapui, Saraswati Nashi, Atchayaram Nalini, Prasanta Kumar Ghosh, et al., ‘Comparison of speech tasks for automatic classification of patients with amyotrophic lateral sclerosis and healthy subjects,’ in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 6014-6018. ACKNOWLEDGEMENT Authors thank the Pratiksha Trust for their support. Ongoing work : Developing an assistive tool for detection and monitoring of ALS. http://spire.ee.iisc.ac.in/spire/ [email protected], [email protected]

Transcript of Voice Based Classification of Healthy Subjects and ...

Page 1: Voice Based Classification of Healthy Subjects and ...

Voice Based Classification of Healthy Subjects and Patients withAmyotrophic Lateral Sclerosis

Suhas BN, Deep Patel, Nithin Rao, Yamini Belur, Pradeep Reddy, Nalini AtcharyamRavi Yadav, Dipanjan Gope and Prasanta Kumar Ghosh

SPIRE Lab, Electrical Engineering, Indian Institute of Science, Bengaluru 560012, IndiaNational Institute of Mental Health and Neurosciences, Bengaluru 560029, India

PROBLEM STATEMENTTo determine the choice of task and device in developing an assistive tool for detection and monitoring of ALS

MOTIVATIONNeed for detection of ALS: Early detection helps better management

Monitoring of ALS: Following the progress of the condition [1]

CHALLENGESIdentifying speech cues that help in better diagnosis [2].Access for people of different socio-economic backgrounds

DATASET

Recording setupMoto G5 Plus (MOT), Zoom H6 X/Y recorder (ZOO), Xiaomi

Redmi 4 (XIA), Dell XPS 15 (LAP), Apple iPhone 7 (IPH).

Sampling freq 44.1 Khz

Speech StimuliSpontaneous (SPON), Diadochokinetic rate (DDK),

Sustained Phonation (PHON)

Native Language (count)

Bengali (5 subjects each), Hindi (5 subjects each) ,

Kannada (5 subjects each), Odiya (3 subjects each),

Tamil (3 subjects each), Telugu (4 subjects each).

Total duration SPON (7 hours), DDK (5.36 hours), PHON (7.9 hours)

WAVEFORMS AND SPECTROGRAMS

(A) subject with ALSFRS-R 0 (B) subject with ALSFRS-R 2 (C) A healthy subject

ALSFRS-R RATING FOR SPEECH

Severity 0 1 2 3 4

Finding

Loss of

useful

speech

Speech combined

with nonvocal

communications

Intelligible

with

repeating

Detectable

speech

disturbanceNormal

Scan here foraudio samples

METHODOLOGY

EXPERIMENTS50 subjects, 5 fold cross validation setup.

Per fold : 5 ALS & 5 Healthy Controls balanced for ALS severity, age and language.

Classifiers : Support Vector Machines (SVM) & Deep Neural Networks (DNN)

Parameters: Choices of Nw = {0.5, 0.8, 1, 2, 3}s with Nsh = 0.1s. Optimal Nw=0.8s.

SVM: C & γ selected by maximizing the performance on the validation setDNN: Activation Functions = {‘sigmoid’, ‘tanh’, ‘relu’}

Hidden Layers = {1, 2, 3}& No. of Neurons={64, 128, 256, 512} for which the validation loss is minimized.

RESULTS

1 2 3

70

80

90

SPON

1 2 3

70

80

90

1 2 3

70

80

90

1 2 3

70

80

90

1 2 3

70

80

90

1 2 3

70

80

90

DDK

1 2 3

70

80

90

1 2 3

70

80

90

1 2 3

70

80

90

1 2 3

70

80

90

1 2 3

70

80

90

PHON

1 2 3

70

80

90

1 2 3

70

80

90

1 2 3

70

80

90

1 2 3

70

80

90

FRAME LENGTH (s)

A

C

C

U

R

A

C

Y

(%)

Classification accuracy by varying Nw. ? SVM, • DNN.

A comparative study on

speech classification of

ALS vs Healthy controls

using 3 speech tasks & 5

recording devices.

DDK task consistently

performs better than

other tasks while MOT

performs the best across

all devices.

0 5 10 15 20 25 30 3545

50

55

60

65

70

75

80

85

SPON

DDK

PHON

MOT

A

C

C

U

R

A

C

Y

(%)

C O E F F I C I E N T

STATIC MFCC VELOCITY MFCC ACCELERATION MFCC

Classification results of (3xN) coefficients of suprasegmental

features of each mean, median and S.D of one MFCC.

SPON: 3rd, 15th & 27th

dim of MFCC yield

highest accuracy :

78.7%, 82.7% and

84.8% among 36-D

MFCC : correspond to

4th DCT basis function.

DDK: 2nd, 14th & 26th dim

of MFCC yield highest

accuracy : 71.2%, 84.4%

and 84.4% among 36-D

MFCC : correspond to

3rd DCT basis function.

REFERENCES1. B.K. Yamini and A. Nalini, ‘Measures of maximum performance of speech related tasks in patients with amyotrophic

lateral sclerosis,’ Amyotrophic Lateral Sclerosis, vol. 9, 2008.

2. Aravind Illa, Deep Patel, BK Yamini, N Shivashankar, Preethish Kumar Veeramani, Kiran Polavarapui, Saraswati Nashi,

Atchayaram Nalini, Prasanta Kumar Ghosh, et al., ‘Comparison of speech tasks for automatic classification of patients

with amyotrophic lateral sclerosis and healthy subjects,’ in 2018 IEEE International Conference on Acoustics, Speech

and Signal Processing (ICASSP). IEEE, 2018, pp. 6014-6018.

ACKNOWLEDGEMENTAuthors thank the Pratiksha Trust for their support.

Ongoing work :Developing an assistive tool for detectionand monitoring of ALS.

http://spire.ee.iisc.ac.in/spire/ [email protected], [email protected]