Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven...

52
. . Test-Driven Development Joe Huang Anti-Spam Team Cellopoint Joint work with Alex Fu and H.-H. Tu August 12, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Transcript of Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven...

Page 1: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

.

...... Test-Driven Development

Joe HuangAnti-Spam Team

Cellopoint

Joint work with Alex Fu and H.-H. TuAugust 12, 2013

.......... ...... ..................... ..................... ..................... ..... ..... . ................

Page 2: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

.. Outline...1 Test-Driven Development

TDD ProcessAutomatic Unit Tests

...2 Case Study: Subject AnalysisDigging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

...3 ConclusionJoe Huang et al. TDD Example

Page 3: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

TDD ProcessAutomatic Unit Tests

.. Outline...1 Test-Driven Development

TDD ProcessAutomatic Unit Tests

...2 Case Study: Subject AnalysisDigging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

...3 ConclusionJoe Huang et al. TDD Example

Page 4: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

TDD ProcessAutomatic Unit Tests

.. Test-Driven Development (TDD)

Improved development processReduce costEase the pain

Joe Huang et al. TDD Example

Page 5: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

TDD ProcessAutomatic Unit Tests

.. Why Testing is Important

”Code that isn’t tested doesn’t work - thisseems to be the safe assumption.” - Kent BeckThe easier parts of the system to test, gettested a lot more than those that are harder totestTesting is a major activity in any developmentlifecycle - a large part of a project budget isspent on it

Joe Huang et al. TDD Example

Page 6: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

TDD ProcessAutomatic Unit Tests

.. The Test Automation Approach

One of the most common strategies today forimproving our ability to test a system is testautomationThe adoption of test first practices (TDD)demonstrates how test automation needs areaddressed

Joe Huang et al. TDD Example

Page 7: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

TDD ProcessAutomatic Unit Tests

.. TDD Process

.

.add a test.

run tests

..

Run tests seenew failure

..

Run tests seeall pass

.. ProductionCode

.

Refactor

.

add features

.

Joe Huang et al. TDD Example

Page 8: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

TDD ProcessAutomatic Unit Tests

.. TDD Process

..add a test

.

run tests

..

Run tests seenew failure

..

Run tests seeall pass

.. ProductionCode

.

Refactor

.

add features

.

Joe Huang et al. TDD Example

Page 9: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

TDD ProcessAutomatic Unit Tests

.. TDD Process

..add a test.

run tests

.

.

Run tests seenew failure

..

Run tests seeall pass

.. ProductionCode

.

Refactor

.

add features

.

Joe Huang et al. TDD Example

Page 10: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

TDD ProcessAutomatic Unit Tests

.. TDD Process

..add a test.

run tests

..

Run tests seenew failure

.

.

Run tests seeall pass

.. ProductionCode

.

Refactor

.

add features

.

Joe Huang et al. TDD Example

Page 11: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

TDD ProcessAutomatic Unit Tests

.. TDD Process

..add a test.

run tests

..

Run tests seenew failure

..

Run tests seeall pass

.

. ProductionCode

.

Refactor

.

add features

.

Joe Huang et al. TDD Example

Page 12: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

TDD ProcessAutomatic Unit Tests

.. TDD Process

..add a test.

run tests

..

Run tests seenew failure

..

Run tests seeall pass

.. ProductionCode

.

Refactor

.

add features

.

Joe Huang et al. TDD Example

Page 13: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

TDD ProcessAutomatic Unit Tests

.. TDD Process

..add a test.

run tests

..

Run tests seenew failure

..

Run tests seeall pass

.. ProductionCode

.

Refactor

.

add features

.

Joe Huang et al. TDD Example

Page 14: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

TDD ProcessAutomatic Unit Tests

.. TDD Process

..add a test.

run tests

..

Run tests seenew failure

..

Run tests seeall pass

.. ProductionCode

.

Refactor

.

add features

.

Joe Huang et al. TDD Example

Page 15: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

TDD ProcessAutomatic Unit Tests

.. How Test Automation Works

Tests are written before the production codeThis guarantees any part added to the systemmakes the whole testableA testable system can evolve easily - you canadd features knowing existing ones did notbreak

Joe Huang et al. TDD Example

Page 16: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

TDD ProcessAutomatic Unit Tests

.. The Result of Less Testing

”To test or not to test, that is the question”Low priority for testing during system designphases results in hard testingSince now writing automated tests is hard, theactual effort invested in it is reducedIn the end, the priority of automated testing islowered even further

Joe Huang et al. TDD Example

Page 17: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

TDD ProcessAutomatic Unit Tests

.. Automatic Unit Tests

When we approach writing automatic unit tests(AUT), the main difficulty we face is the needto isolate the tested parts in the system fromthe rest of itIssues in Zilberfeld (2012a) are addressed:

Instantiating a classIsolation from dependenciesVerifying Interactions

Goal: test every unit with minimuminterferences

Joe Huang et al. TDD Example

Page 18: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Outline...1 Test-Driven Development

TDD ProcessAutomatic Unit Tests

...2 Case Study: Subject AnalysisDigging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

...3 ConclusionJoe Huang et al. TDD Example

Page 19: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Digging into E-mailsTwo parts in emails can be analyzed

HeaderBody

Body is much more complex than headerHeader mainly consists of multiple fields

ReceivedFrom, To, Cc, BccReply-ToSubject

Analyzing subjects in headers as the first stepJoe Huang et al. TDD Example

Page 20: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Subject Analysis

Spams can be judged by their subjectsFaster analysisFeature generation

Joe Huang et al. TDD Example

Page 21: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Subjects of Hams and Spams

We may find subjects as followingHam

確定開課! 最後 3 個名額 7/25 7/26 SPSSStatistics 基礎與軟體實作材料科技的突破,让我们拭目以待!

Spam真的有. 夠俗每片 26 元 -我的天你好!有《全 -国 -通 -用)《国 -地 -税》机打《发《票》代开咨询

Spam subjects pose abnormal patternsJoe Huang et al. TDD Example

Page 22: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Illustration

..

.............................From: JOE <[email protected]>Sender: JOE <[email protected]>To: [email protected]: JOE <[email protected]>Date: 18 Jul 2013 09:36:16 +0800

...............................

...............................

.

.eml file

.

Subject: =?utf-8?b?5Luj6ZaL44qj44qj55m856Wo?=

.

[2.0, 2.0, 0.0

.

text token

.

feature

.

代開㊣㊣發票

.

, 1.0, 2.0, 0.0

.

non-text token

.

, 6.0

.

, 0.0, 0.0, 0.0, 0.0]

.

subject length

.

digit character count

....

Joe Huang et al. TDD Example

Page 23: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Flowchart

..FileSetup

. SubjectExtraction

. TokenCollection

. FeatureGeneration

. Output....

Joe Huang et al. TDD Example

Page 24: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. File Setup

..FileSetup

. SubjectExtraction

. TokenCollection

. FeatureGeneration

. Output....

set file()Read fileParse the header

set enable feature()Set the list of features turned on

Joe Huang et al. TDD Example

Page 25: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Subject Extraction

..FileSetup

. SubjectExtraction

. TokenCollection

. FeatureGeneration

. Output....

get enc subject()Extract the encoded subjectLocale encoding

dec subject()Decode the subject to the UTF-8 encoding

Joe Huang et al. TDD Example

Page 26: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Token Collection

..FileSetup

. SubjectExtraction

. TokenCollection

. FeatureGeneration

. Output....

get token()Extract the tokens with specification

Joe Huang et al. TDD Example

Page 27: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Token Collection State MachineString: 代開㊣㊣發票TempToken = u”, Result = []

..in token. out of token.

START

.

tokenchar

.

non-tokenchar

.

non-tokenchar

.

tokenchar

.

Joe Huang et al. TDD Example

Page 28: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Token Collection State MachineString: 代 開㊣㊣發票TempToken = u’ 代’, Result = []

..in token. out of token.

START

.

tokenchar

.

non-tokenchar

.

non-tokenchar

.

tokenchar

.

Joe Huang et al. TDD Example

Page 29: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Token Collection State MachineString: 代開 ㊣㊣發票TempToken = u’ 代開’, Result = []

..in token. out of token.

START

.

tokenchar

.

non-tokenchar

.

non-tokenchar

.

tokenchar

.

Joe Huang et al. TDD Example

Page 30: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Token Collection State MachineString: 代開㊣ ㊣發票TempToken = u”, Result = [u’ 代開’]

..in token. out of token.

START

.

tokenchar

.

non-tokenchar

.

non-tokenchar

.

tokenchar

.

Joe Huang et al. TDD Example

Page 31: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Token Collection State MachineString: 代開㊣㊣ 發票TempToken = u”, Result = [u’ 代開’]

..in token. out of token.

START

.

tokenchar

.

non-tokenchar

.

non-tokenchar

.

tokenchar

.

Joe Huang et al. TDD Example

Page 32: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Token Collection State MachineString: 代開㊣㊣發 票TempToken = u’ 發’, Result = [u’ 代開’]

..in token. out of token.

START

.

tokenchar

.

non-tokenchar

.

non-tokenchar

.

tokenchar

.

Joe Huang et al. TDD Example

Page 33: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Token Collection State MachineString: 代開㊣㊣發票TempToken = u’ 發票’, Result = [u’ 代開’]

..in token. out of token.

START

.

tokenchar

.

non-tokenchar

.

non-tokenchar

.

tokenchar

.

Joe Huang et al. TDD Example

Page 34: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Token Collection State MachineString: 代開㊣㊣發票TempToken = u”, Result = [u’ 代開’, u’ 發票’]

..in token. out of token.

START

.

tokenchar

.

non-tokenchar

.

non-tokenchar

.

tokenchar

.

Joe Huang et al. TDD Example

Page 35: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Feature Generation

..FileSetup

. SubjectExtraction

. TokenCollection

. FeatureGeneration

. Output....

token to feature()Compute the features with given token lists

get digit count()Compute the number of occurrences for each typeof ’numbers’

Joe Huang et al. TDD Example

Page 36: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Feature GenerationFeatures are computed with tokens collected andother characteristics of subjects

Subject: 代開㊣㊣發票Text/non-text token features

Number of tokens: 2.0 (text token); 1.0 (non-texttoken)Average token length: 2.0 (text token); 2.0(non-text token)Std. of token lengths: 0.0 (text token); 0.0(non-text token)

Subject length: 6.0Digit character counts: 0.0, 0.0, 0.0, 0.0

Joe Huang et al. TDD Example

Page 37: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Output

..FileSetup

. SubjectExtraction

. TokenCollection

. FeatureGeneration

. Output....

output()Output the result to the console or a file

Joe Huang et al. TDD Example

Page 38: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Our Development Process

Detailed separation of functionswrite a unit test for each functionevery code revision should pass all tests

Code reviewHan-Hsing: code efficiency and clarityAlex: testability and readability

Unit test in Python

Joe Huang et al. TDD Example

Page 39: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Function Separation

Every unit accounts for an actionLess entangling codes/bugsDebugging is easier

Joe Huang et al. TDD Example

Page 40: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Example 1 - Subject Extraction

def gen_feature(self):# opertion: extract subject featureself._get_enc_subject()self._dec_subject()

The subject extraction process can be split intoparts

Encoded subject extractionDecoding of the subject

Joe Huang et al. TDD Example

Page 41: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

..Example 2 - Token Collection and FeatureGeneration

def _get_token_feature(self):# return listtext_token_list = self._get_token(mode="text")text_token_feature = self._token_to_feature(text_token_list)nontext_token_list = self._get_token(mode="nontext")nontext_token_feature

= self._token_to_feature(nontext_token_list)return text_token_feature + nontext_token_feature

Token collectionFeature generation

Joe Huang et al. TDD Example

Page 42: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Code Review

Pair code reviewWeekly meetingCode revision

Joe Huang et al. TDD Example

Page 43: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Example - List Comprehensive

Before revisionfor i in xrange(self.m_number_of_feature):

if self.m_feature_list[i] is 1:output_file.write("{0} ".format(self.m_feature[i]))

After revisionresult_feature = [self.m_feature[i] * self.m_feature_list[i]

for i in xrange(self.m_number_of_feature)]output_file.write("{0}\n".format(" ".join(str(i)

for i in result_feature)))

Joe Huang et al. TDD Example

Page 44: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Unit Test in Python

’unittest’ packagePut ’test first’ into practiceSuccessful and failed cases

Joe Huang et al. TDD Example

Page 45: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Example 1 - Testing set file()

def test_set_file(self):subject_obj = emlSubjectAnalyzer.subject()self.assertRaises(IOError, subject_obj.set_file, "ABCDEFGH")self.assertRaises(IOError, subject_obj.set_file, "/usr")self.assertTrue(subject_obj.set_file("./logconf"))

assertRaises() checks if an exception is raisedassertTrue() checks if ’True’ is returned

Joe Huang et al. TDD Example

Page 46: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

.. Example 2 - Testing token to feature()def test_token_to_feature(self):

subject_obj = emlSubjectAnalyzer.subject()

#English testself.assertEqual(subject_obj._token_to_feature(["aaa", "bbb", "ccc"]), [3.0, 3.0, 0.0])

#Regular Chinese testself.assertEqual(subject_obj._token_to_feature([u"修身 ", u"齊家 ", u"治國 ", u"平天下 "])

, [4.0, 2.25, math.sqrt(0.1875)])

#Simplified Chinese testself.assertEqual(subject_obj._token_to_feature([u"中国共产党 ", u"毛泽东 ", u"江泽民 ", u"胡锦涛 ", u"习近平 "])

, [5.0, 3.4, 0.8])

Joe Huang et al. TDD Example

Page 47: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Digging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

..Example 2 - Testing token to feature()(Cont’d)

#Japanese kanji testself.assertEqual(subject_obj._token_to_feature([u"選挙区 ", u"稼働率 ", u"走塁 "])

, [3.0, 8.0/3, math.sqrt(2.0/9)])

#Korean hanja testself.assertEqual(subject_obj._token_to_feature([u"曺圭賢 ", u"裵勇浚 "])

, [2.0, 3.0, 0.0])

#Empty list testself.assertEqual(subject_obj._token_to_feature([])

, [0.0, 0.0, 0.0])

Joe Huang et al. TDD Example

Page 48: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

.. Outline...1 Test-Driven Development

TDD ProcessAutomatic Unit Tests

...2 Case Study: Subject AnalysisDigging into E-mailsSubject AnalysisFlowchartToken Collection State MachineOur Development Process

...3 ConclusionJoe Huang et al. TDD Example

Page 49: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

.. Conclusion

Apply TDD to facilitate the developmentprocessImproved in quality and quantityReal world case: Google

Joe Huang et al. TDD Example

Page 50: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

.. Acknowledgement

Thank Han-Hsing for precious suggestions onPythonThank Alex Fu for helpful materials on TDD

Joe Huang et al. TDD Example

Page 51: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

.. References

G. Zilberfeld. Design for testability - the true story.2012. URL http://www.infoq.com/articles/Testability.

Joe Huang et al. TDD Example

Page 52: Test-Driven Development - 國立臺灣大學r97002/temp/cellop/eml_tdd.pdf · Test-Driven Development Case Study: Subject Analysis Conclusion References TDD Process Automatic Unit

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Test-Driven DevelopmentCase Study: Subject Analysis

ConclusionReferences

Thanks for your attention :)

Joe Huang et al. TDD Example