Distinguish Wild Mushrooms with Decision Tree
description
Transcript of Distinguish Wild Mushrooms with Decision Tree
Distinguish Wild Mushrooms with Decision Tree
Shiqin Yan
Objective Utilize the already existed database of the
mushrooms to build a decision tree to assist the process of determine the whether the mushroom is poisonous.
DataSet Existing record drawn from the Audubon
Society Field Guide to North American Mushrooms (1981) . G. H. Lincoff (Pres. ), NewYork: Alfred A. Knopf
Number of Instances: 8124 (classified as either edible or poisonous)
Number of Attributes: 22 Training: 5416, Tuning: 1354, Testing: 1354 Missing attribute values: 2480 (denoted by
“?”), all for attribute 11
Mushroom Features 1. cap-shape: bell=b, conical=c, convex=x,
flat=f, knobbed=k, sunken = s 2. cap-surface: fibrous=f, grooves=g,
scaly=y, smooth=s 3. cap-color: brown=n, buff=b, cinnamon=c,
gray=g, green=r, pink=p, purple=u, red=e, white=w, yellow=y
4. bruise?: bruises=t, no=f 5. odor: almond=a, anise=l, creosote=c,
fishy=y, foul=f …
Approach Mutual information to determine the features
used to split the tree.
Mutual information: Y: label, X: feature Choose feature X which maximizes I(Y;X)
Most informative features extracted from decision tree: odor spore-print-color habitat population
Prior Research
by Wlodzislaw Duch, Department of Computer Methods, Nicholas Copernicus University
Add cross-validation to improve the accuracy
Prune the tree to avoid over-fitting
Future