Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation:...

72
Final review LING 572 Fei Xia 03/07/06

Transcript of Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation:...

Page 1: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Final review

LING 572

Fei Xia

03/07/06

Page 2: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Misc

• Parts 3 and 4 were due at 6am today.

• Presentation: email me the slides by 6am on 3/9

• Final report: email me by 6am on 3/14.

• Group meetings: 1:30-4:00pm on 3/16.

Page 3: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Outline

• Main topics

• Applying to NLP tasks

• Tricks

Page 4: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Main topics

Page 5: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Main topics

• Supervised learning– Decision tree– Decision list– TBL– MaxEnt– Boosting

• Semi-supervised learning– Self-training– Co-training– EM– Co-EM

Page 6: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Main topics (cont)

• Unsupervised learning– The EM algorithm– The EM algorithm for PM models

• Forward-backward• Inside-outside • IBM models for MT

• Others– Two dynamic models: FSA and HMM– Re-sampling: bootstrap– System combination– Bagging

Page 7: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Main topics (cont)

• Homework– Hw1: FSA and HMM– Hw2: DT, DL, CNF, DNF, and TBL– Hw3: Boosting

• Project:– P1: Trigram (learn to use Carmel, relation between

HMM and FSA)– P2: TBL– P3: MaxEnt– P4: Bagging, boosting, system combination, SSL

Page 8: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Supervised learning

Page 9: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

A classification problem

District House type

Income Previous

Customer

Outcome

Suburban Detached High No Nothing

Suburban Semi-detached

High Yes Respond

Rural Semi-detached

Low No Respond

Urban Detached Low Yes Nothing

Page 10: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Classification and estimation problems

• Given– x: input attributes– y: the goal– training data: a set of (x, y)

• Predict y given a new x: – y is a discrete variable classification problem– y is a continuous variable estimation problem

Page 11: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Five ML methods

• Decision tree

• Decision list

• TBL

• Boosting

• MaxEnt

Page 12: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Decision tree

• Modeling: tree representation

• Training: top-down induction, greedy algorithm

• Decoding: find the path from root to a leaf node, where the tests along the path are satisfied.

Page 13: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Decision tree (cont)

• Main algorithms: ID3, C4.5, CART

• Strengths:– Ability to generate understandable rules– Ability to clearly indicate best attributes

• Weakness:– Data splitting– Trouble with non-rectangular regions– The instability of top-down induction bagging

Page 14: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Decision list

• Modeling: a list of decision rules

• Training: greedy, iterative algorithm

• Decoding: find the 1st rule that applies

• Each decision is based on a single piece of evidence, in contrast to MaxEnt, boosting, TBL

Page 15: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

TBL

• Modeling: a list of transformations (similar to decision rules)

• Training: – Greedy, iterative algorithm – The concept of current state

• Decoding: apply every transformation to the data

Page 16: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

TBL (cont)

• Strengths:– Minimizing error rate directly– Ability to handle non-classification problem

• Dynamic problem: POS tagging• Non-classification problem: parsing

• Weaknesses:– Transformations are hard to interpret as they interact

with one another– Probabilistic TBL: TBL-DT

Page 17: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Boosting

Training Sample

Weighted Sample

Weighted Sample

fT

f1

f2

f

ML

ML

ML

Page 18: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Boosting (cont)

• Modeling: combining a set of weak classifiers to produce a powerful committee.

• Training: learn one classifier at each iteration

• Decoding: use the weighted majority vote of the weak classifiers

Page 19: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Boosting (cont)

• Strengths– It comes with a set of theoretical guarantee

(e.g., training error, test error).– It only needs to find weak classifiers.

• Weaknesses:– It is susceptible to noise.– The actual performance depends on the data

and the base learner.

Page 20: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

MaxEnt

)(maxarg* pHpPp

}},...,1{,|{ ~ kjfEfEpP jpjp

The task: find p* s.t.

where

Z

exp

xf jk

jj )(

1

)(

If p* exists, it has of the form

Page 21: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

MaxEnt (cont)

• If p* exists, then )(maxarg* qLpQq

x

xqxpqL )(log)(~)(

}0,)(|{1

)(

j

xf

Z

exqqQ

k

jjj

where

Page 22: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

MaxEnt (cont)

• Training: GIS, IIS

• Feature selection: – Greedy algorithm– Select one (or more) at a time

• In general, MaxEnt achieves good performance on many NLP tasks.

Page 23: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Common issues

• Objective function / Quality measure:– DT, DL: e.g., information gain– TBL, Boosting: minimize training errors– MaxEnt: maximize entropy while satisfying

constraints

Page 24: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Common issues (cont)

• Avoiding overfitting– Use development data– Two strategies:

• stop early• post-pruning

Page 25: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Common issues (cont)

• Missing attribute values:– Assume a “blank” value– Assign most common value among all “similar”

examples in the training data – (DL, DT): Assign a fraction of example to each

possible class.

• Continuous-valued attributes– Choosing thresholds by checking the training data

Page 26: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Common issues (cont)

• Attributes with different costs– DT: Change the quality measure to include

the costs

• Continuous-valued goal attribute– DT, DL: each “leaf” node is marked with a real

value or a linear function– TBL, MaxEnt, Boosting: ??

Page 27: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Comparison of supervised learners

DT DL TBL Boosting MaxEnt

Probabilistic PDT PDL TBL-DT Confidence Y

Parametric N N N N Y

representation Tree Ordered list of rules

Ordered list of transformations

List of weighted classifiers

List of weighted features

Each iteration Attribute Rule Transformation

Classifier & weight

Feature & weight

Data processing

Split

data

Split data*

Change cur_y

Reweight (x,y)

None

decoding Path 1st rule Sequence of rules

Calc f(x) Calc f(x)

Page 28: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Semi-supervised Learning

Page 29: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Semi-supervised learning

• Each learning method makes some assumptions about the problem.

• SSL works when those assumptions are satisfied.

• SSL could degrade the performance when mistakes reinforce themselves.

Page 30: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

SSL (cont)

• We have covered four methods: self-training, co-training, EM, co-EM

Page 31: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Co-training

• The original paper: (Blum and Mitchell, 1998)– Two “independent” views: split the features into two

sets.– Train a classifier on each view.– Each classifier labels data that can be used to train

the other classifier.

• Extension: – Relax the conditional independence assumptions– Instead of using two views, use two or more

classifiers trained on the whole feature set.

Page 32: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Unsupervised learning

Page 33: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Unsupervised learning

• EM is a method of estimating parameters in the MLE framework.

• It finds a sequence of parameters that improve the likelihood of the training data.

Page 34: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

The EM algorithm

• Start with initial estimate, θ0

• Repeat until convergence– E-step: calculate

– M-step: find

),(maxarg)1( tt Q

)|,(log),|(),(1

yxPxyPQ it

n

i yi

t

Page 35: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

The EM algorithm (cont)

• The optimal solution for the M-step exists for many classes of problems.

A number of well-known methods are special cases of EM.

• The EM algorithm for PM models– Forward-backward algorithm– Inside-outside algorithm– …

Page 36: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Other topics

Page 37: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

FSA and HMM

• Two types of HMMs:– State-emission and arc-emission HMMs– They are equivalent

• We can convert HMM into WFA• Modeling: Marcov assumption• Training:

– Supervised: counting– Unsupervised: forward-backward algorithm

• Decoding: Viterbi algorithm

Page 38: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Bootstrap

f1

f2

fB

ML

ML

ML

f

Page 39: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Bootstrap (cont)

• A method of re-sampling:– One original sample B bootstrap samples

• It has a strong mathematical background.

• It is a method for estimating standard errors, bias, and so on.

Page 40: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

System combination

f1

f2

fB

ML1

MLB

ML2

f

Page 41: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

System combination (cont)

• Hybridization: combine substructures to produce a new one.– Voting– Naïve Bayes

• Switching: choose one of the fi(x)– Similarity switching– Naïve Bayes

))(),....,(()( 1 xfxfgxf m

Page 42: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Bagging

f1

f2

fB

ML

ML

ML

f

bootstrap + system combination

Page 43: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Bagging (cont)

• It is effective for unstable learning methods:– Decision tree– Regression tree– Neural network

• It does not help stable learning methods– K-nearest neighbors

Page 44: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Relations

Page 45: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Relations

• WFSA and HMM

• DL, DT, TBL

• EM, EM for PM

Page 46: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

WFSA and HMM

HMM Finish

Add a “Start” state and a transition from “Start” to any state in HMM.Add a “Finish” state and a transition from any state in HMM to “Finish”.

Start

Page 47: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

DT, CNF, DNF, DT, TBL

k-CNF k-DNFk-DT

K-DL

k-TBL

Page 48: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

The EM algorithm

The generalized EM

The EM algorithm

PM Gaussian MixInside-OutsideForward-backwardIBM models

Page 49: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Solving a NLP problem

Page 50: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Issues• Modeling: represent the problem as a formula and

decompose the formula into a function of parameters

• Training: estimate model parameters

• Decoding: find the best answer given the parameters

• Other issues:– Preprocessing– Postprocessing– Evaluation– …

Page 51: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Modeling

• Generative vs. discriminative models

• Introducing hidden variables

• The order of decomposition

)|()|(),( FEPEFPEFP

a

EaFPEFP )|,()|(

),|(*)|()|,( EaFPEaPEaFP

),|(*)|()|,( EFaPEFPEaFP

Page 52: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Modeling (cont)

• Approximation / assumptions

• Final formulae and types of parameters

)|(),|()|( 11

1 i

ii

i

ii aaPaEaPEaP

)|()1(

)|()|(

1 1i

m

j

l

ijmefP

l

lmPEFP

Page 53: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Modeling (cont)

• Using classifiers for non-classification problem– POS tagging– Chunking– Parsing

Page 54: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Training

• Objective functions:– Maximize likelihood: EM– Minimize error rate: TBL– Maximum entropy: MaxEnt– ….

• Supervised, semi-supervised, unsupervised:– Ex: Maximize likelihood

• Supervised: simple counting• Unsupervised: EM

Page 55: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Training (cont)

• At each iteration: – Choose one attribute / rule / weight / … at a time,

and never change it in later time: DT, DL, TBL,

– Update all the parameters at each iteration: EM

• Choose “untrained” parameters (e.g., thresholds): use development data.– Minimal “gain” for continuing iteration

Page 56: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Decoding

• Dynamic programming: – CYK for PCFG– Viterbi for HMM

• Dynamic problem:– Decode from left to right– Features only look at the left context– Keep top-N hypotheses at each position

Page 57: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Preprocessing

• Sentence segmentation

• Sentence alignment (for MT)

• Tokenization

• Morphing

• POS tagging

• …

Page 58: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Post-processing

• System combination

• Casing (MT)

• …

Page 59: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Evaluation

• Use standard training/test data if possible.

• Choose appropriate evaluation measures:– WSD: for what applications?– Word alignment: F-measure vs. AER. How

does it affect MT result?– Parsing: F-measure vs. dependency link

accuracy

Page 60: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Tricks

Page 61: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Tricks

• Algebra

• Probability

• Optimization

• Programming

Page 62: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Algebra

),...,(...),...,(...1 21 2

11 nx x x

nx x x

xxfcxxfcnn

),...,(...),...,(...1 11 2

11 nx x x

nx x x

xxfxxfn nn

The order of sums:

Pulling out constants:

Page 63: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Algebra (cont)

i

ii

i ff loglog

The order of sums and products:

)()(...1111 22

n

i xii

x x x

n

iii

iinn

xfxf

The order of log and product / sum:

Page 64: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Probability

yy

yxpypyxpxp ),|(*)|()|,()|(

),|()|()(),,( zyxPyzPyPzyxP

Introducing a new random variable:

The order of decomposition:

),|()|()(),,( yxzPxyPxPzyxP

Page 65: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

More general cases

),...|(),...,(

),...,()(

111

1

,...,11

2

ii

in

AAn

AAAPAAP

AAPAPn

Page 66: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Probability (cont)

)(

)|()()|(

xp

yxpypxyp

)|()(maxarg)|(maxarg yxpypxypyy

Source-channel model:

Bayes Rule:

Page 67: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Probability (cont)

x

xCt

xCtxp

)(

)()(Normalization:

Jensen’s inequality:

)])([log()]([log( xpExpE

Page 68: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Optimization

• When there is no analytical solution, use iterative approach.

• If the optimal solution to g(x) is hard to find, look for the optimal solution to a (tight) lower bound of g(x).

Page 69: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Optimization (cont)

• Using Lagrange multipliers: Constrained problem: maximize f(x) with the constraint that g(x)=0

Unconstrained problem: maximize f(x) – λg(x)

• Taking first derivatives to find the stationary points.

Page 70: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Programming

• Using/creating a good package:– Tutorial, sample data, well-written code– Multiple levels of code

• Core ML algorithm: e.g., TBL• Wrapper for a task: e.g., POS tagger• Wrapper to deal with input, output, etc.

Page 71: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Programming (cont)

• Good practice:– Write notes and create wrappers (all the commands should be

stored in the notes, or even better in a wrapper code)

– Use standard directory structures:• src/, include/, exec/, bin/, obj/, docs/, sample/, data/, result/

– Give meaning filenames only to important code: e.g., aaa100.exec, build_trigram_tagger.pl

– Give meaning function, variable names

– Don’t use global variables

Page 72: Final review LING 572 Fei Xia 03/07/06. Misc Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email.

Final words

• We have covered a lot of topics: 5+4+3+4

• It takes time to digest, but at least we understand the basic concepts.

• The next step: applying them to real applications.