CV M4 L01 - cs.ou.edu
Transcript of CV M4 L01 - cs.ou.edu
![Page 1: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/1.jpg)
• CV_M4_L01
Andrew H. Fagg: Machine Learning Practice 1
![Page 2: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/2.jpg)
CS/DSA 5970: Machine Learning Practice
Representing Data
Andrew H. Fagg: Machine Learning Practice 2
![Page 3: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/3.jpg)
Connecting Real World Data to our ML Tools
Often have a huge disconnect between the two. Our ML
tools often rely on:
• Well-defined formatting of the data
• Cut into distinct examples. Each example:
– List of property values. Most often assume each example
consists of the same properties.
– Label / expected output value (for supervised problems)
Andrew H. Fagg: Machine Learning Practice 3
![Page 4: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/4.jpg)
Connecting Real World Data to our ML Tools
(cont) Our ML tools often assume:
• Properties are numerical
• Statistical independence between the different examples
• All examples are drawn from the same statistical
distribution
Andrew H. Fagg: Machine Learning Practice 4
![Page 5: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/5.jpg)
Connecting Real World Data to our ML Tools
Real world data can:
• Be weakly formatted
• Properties can be enumerated types (e.g., strings such as
“circle”, “square”)
• Values can be incorrect
• Values can be missing
• Different examples can have different properties
• Distribution that we draw examples from can be changing
in time
Andrew H. Fagg: Machine Learning Practice 5
![Page 6: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/6.jpg)
Connecting Real World Data to our ML Tools
Transforming the raw data to a well-formatted form is a key
first step:
• This step can take much of our project time, depending on
the form of the data
• How careful we are in taking this step can dramatically
affect everything else we do
• As a byproduct of this step, it is important to really
understand the nature of your data
Andrew H. Fagg: Machine Learning Practice 6
![Page 7: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/7.jpg)
Roadmap
• Pandas package
– Importing data from standard formats
– Data massaging
• Numpy package
– Efficient representation of numerical data
• Matplotlib package
– Matlab-like visualization package
Andrew H. Fagg: Machine Learning Practice 7
![Page 8: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/8.jpg)
Pandas
Toolkit for data handling and analysis
• File I/O, including csv files
• Hooks for visualization
• Basic statistics
• Data selection and massaging
• SQL-type operations
Andrew H. Fagg: Machine Learning Practice 8
![Page 9: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/9.jpg)
Classes Provided by Pandas
Two primary Python classes:
• Series: 1D data
– Indexed by integer location in the array or by some index
variable (index values can be numerical or strings)
• DataFrame: 2D data
– Each dimension indexed by integer index or other index
variable
– Most common for us: examples (rows) x features (columns)
Andrew H. Fagg: Machine Learning Practice 9
![Page 10: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/10.jpg)
Some Useful DataFrame Operations
• Data exploration:
– Show row / column index names
– Compute statistics for individual columns
• Create a new DataFrame that contains a subset of the
rows and/or columns
• Remove or repair rows and/or columns that contain
invalid data
• Export data to a numpy array for use with ML methods
Andrew H. Fagg: Machine Learning Practice 10
![Page 11: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/11.jpg)
Numpy
Numerical methods package
• Representation of vectors, matrices, tensors
– Vector: yet another way of representing a list of numbers
• Implementation of many linear algebra type operations
– Computing matrix inverses, Singular Value Decomposition …
• Basis for many ML packages, including Scikit-Learn
Andrew H. Fagg: Machine Learning Practice 11
![Page 12: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/12.jpg)
Andrew H. Fagg: Machine Learning Practice 12
![Page 13: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/13.jpg)
• CV_M4_L02
Andrew H. Fagg: Machine Learning Practice 13
![Page 14: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/14.jpg)
Real-Time Activity Recognition for
Assistive Robotics
OU Crawling Assistant (Kolobe, Fagg, Miller, Ding) Scientific American (Oct 2016)
Andrew H. Fagg: Machine Learning Practice
![Page 15: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/15.jpg)
Infants Learning to Crawl
• Learning to crawl is in part a reinforcement learning
process:
– Initially: making novel things happen (such as the body rolling
or shifting a bit) is rewarding
– Eventually: it becomes rewarding to grasp toys (or car keys)
• These rewards are important:
– Practice many types of motor skills
– Drives the development of spatial skills
Andrew H. Fagg: Machine Learning Practice 15
![Page 16: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/16.jpg)
Infants at Risk for Cerebral Palsy
• Initial exploratory movements do not result in interesting
things happening
• These infants show a dramatic delay in the onset of
crawling
• This impacts the learning of other motor skills & the
development of spatial skills
Andrew H. Fagg: Machine Learning Practice 16
![Page 17: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/17.jpg)
SIPPC Crawling Assistant
Wide-Angle
Cameras
Vertical
Lifts
6-Axis
Load Cell
Infant
Support
Covered Omni-
Wheels
EEG
Head
Net
Andrew H. Fagg: Machine Learning Practice
![Page 18: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/18.jpg)
Kinematic Capture Suit
IMU-based kinematic suit
• 12 sensors mounted in suit
• Real-time reconstruction of
body posture
• Recognition of crawling-like
actions
Lower leg Thigh
Back sensor
and central
processor
Shoulder
Upper
arm
ForearmFoot
Southerland (2012)
Andrew H. Fagg: Machine Learning Practice
![Page 19: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/19.jpg)
Infant-Robot Interaction
Three modes of interaction:
• Force control: robot velocity is linearly related to ground
reaction forces
• Power steering: small ground reaction forces produce a
substantial robot movement
• Gesture-based control: recognized crawling-like
movements produce robot movement
Andrew H. Fagg: Machine Learning Practice
![Page 20: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/20.jpg)
Machine Learning Questions
• Predict robot motion from kinematic data
• Predict visual attention from kinematic and robot data
• Predict limb motion from EEG data
• Predict visual attention from EEG data
• …
Andrew H. Fagg: Machine Learning Practice
![Page 21: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/21.jpg)
Andrew H. Fagg: Machine Learning Practice 21
![Page 22: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/22.jpg)
• CV_M4_L03
Andrew H. Fagg: Machine Learning Practice 22
![Page 23: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/23.jpg)
CS/DSA 5970: Machine Learning Practice
Introduction to Pandas
Andrew H. Fagg: Machine Learning Practice 23
![Page 24: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/24.jpg)
Pandas Roadmap
• Importing data from Comma Separated Values (CVS) file
• Exploring data
• Indexing rows and columns
Andrew H. Fagg: Machine Learning Practice 24
![Page 25: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/25.jpg)
Pandas
• Live example
Andrew H. Fagg: Machine Learning Practice 25
![Page 26: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/26.jpg)
• CV_M4_L04
Andrew H. Fagg: Machine Learning Practice 26
![Page 27: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/27.jpg)
CS/DSA 5970: Machine Learning Practice
Pandas: Basic Plotting
Andrew H. Fagg: Machine Learning Practice 27
![Page 28: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/28.jpg)
• Live example
Andrew H. Fagg: Machine Learning Practice 28
![Page 29: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/29.jpg)
• CV_M4_L05
Andrew H. Fagg: Machine Learning Practice 29
![Page 30: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/30.jpg)
CS/DSA 5970: Machine Learning Practice
Introduction to Numpy
Andrew H. Fagg: Machine Learning Practice 30
![Page 31: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/31.jpg)
Numpy Rodmap
• Transforming Pandas data to a Numpy matrix
• Indexing Numpy matrices
• Combining vectors to create a matrix
Andrew H. Fagg: Machine Learning Practice 31
![Page 32: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/32.jpg)
• Live example
Andrew H. Fagg: Machine Learning Practice 32
![Page 33: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/33.jpg)
• CV_M4_L06
Andrew H. Fagg: Machine Learning Practice 33
![Page 34: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/34.jpg)
CS/DSA 5970: Machine Learning Practice
Visualization with Matplotlib
Andrew H. Fagg: Machine Learning Practice 34
![Page 35: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/35.jpg)
Matplotlib Roadmap
• Creating temporal figures
• Creating scatter plots
• Tuning the display of figure elements
• Subplots
• Repairing a Pandas dataset & visualizing the results
Andrew H. Fagg: Machine Learning Practice 35
![Page 36: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/36.jpg)
• Live example
Andrew H. Fagg: Machine Learning Practice 36
![Page 37: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/37.jpg)
• CV_M4_L07
Andrew H. Fagg: Machine Learning Practice 37
![Page 38: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/38.jpg)
CS/DSA 5970: Machine Learning Practice
Pipelines
Andrew H. Fagg: Machine Learning Practice 38
![Page 39: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/39.jpg)
Pipelines
• Data processing often involves multiple computational
steps, only some of which involve ML
• The Scikit-Learn Pipeline class provides a clean interface
for expressing these steps
– Each step (or pipeline element) is implemented by a class that
adheres to a standard interface
– This allows us to mix-and-match elements for different
purposes
Andrew H. Fagg: Machine Learning Practice 39
![Page 40: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/40.jpg)
Flavors of Pipeline Element Classes
A pipeline element class is some combination of:
• Estimator
• Transformer
• Predictor
Andrew H. Fagg: Machine Learning Practice 40
![Page 41: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/41.jpg)
Flavors of Pipeline Element Classes
Estimator: given a dataset, compute some measure or
some model parameters
• Implements the fit() method
– Takes as input one or two datasets (input data & desired
output)
• Our ML methods are estimators
Andrew H. Fagg: Machine Learning Practice 41
![Page 42: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/42.jpg)
Flavors of Pipeline Element Classes
Transformer: modifies a dataset in some way
• Implements the transform() method
– Takes as input one dataset
– Returns a dataset
• Transformers can be used to clean a dataset before it is
used by a ML method
Andrew H. Fagg: Machine Learning Practice 42
![Page 43: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/43.jpg)
Flavors of Pipeline Element Classes
Predictor: predicts some quantity given a dataset
• Implements the predict() method
– Takes as input one dataset and returns a different dataset
• Implements a score() method that evaluates a prediction
– Takes as input an input dataset and an expected output dataset
– Returns a score
Andrew H. Fagg: Machine Learning Practice 43
![Page 44: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/44.jpg)
Pipeline Notes
• Pipeline elements are classes in and of themselves
• The Pipeline class is also a pipeline elements
– So, we can nest pipelines!
• Python classes can inherit from multiple classes
– An element can be both an Estimator and a Predictor
• Datasets are generally Pandas objects or Numpy tensors
– A particular pipeline element will use only one type as an input
and one type as an output
Andrew H. Fagg: Machine Learning Practice 44
![Page 45: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/45.jpg)
• Live example
Andrew H. Fagg: Machine Learning Practice 45
![Page 46: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/46.jpg)
• CV_M4_L07
Andrew H. Fagg: Machine Learning Practice 46
![Page 47: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/47.jpg)
CS/DSA 5970: Machine Learning Practice
Creating Pipeline Elements
Andrew H. Fagg: Machine Learning Practice 47
![Page 48: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/48.jpg)
• Live example
Andrew H. Fagg: Machine Learning Practice 48
![Page 49: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/49.jpg)
• CV_M4_L08b
Andrew H. Fagg: Machine Learning Practice 49
![Page 50: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/50.jpg)
• CV_M4_L09
Andrew H. Fagg: Machine Learning Practice 50
![Page 51: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/51.jpg)
CS/DSA 5970: Machine Learning Practice
Creating Pipeline Element Classes
Andrew H. Fagg: Machine Learning Practice 51
![Page 52: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/52.jpg)
• Live example
Andrew H. Fagg: Machine Learning Practice 52
![Page 53: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/53.jpg)
• CV_M4_L10
Andrew H. Fagg: Machine Learning Practice 53
![Page 54: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/54.jpg)
CS/DSA 5970: Machine Learning Practice
Pipeline Example: Computing Derivatives
Andrew H. Fagg: Machine Learning Practice 54
![Page 55: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/55.jpg)
Computing Derivatives
Numerical differentiation of a timeseries x:
• For each time t:
𝑥 𝑡 ≈𝑥 𝑡 + 1 − 𝑥[𝑡]
∆𝑡
• Often will want to include some filtering to address the
discrete nature of the data (though we won’t do this here)
Andrew H. Fagg: Machine Learning Practice 55
![Page 56: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/56.jpg)
• Live example
Andrew H. Fagg: Machine Learning Practice 56
![Page 57: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/57.jpg)
• CV_M4_L11
Andrew H. Fagg: Machine Learning Practice 57
![Page 58: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/58.jpg)
CS/DSA 5970: Machine Learning Practice
Pipeline Example: Linear Imputer
Andrew H. Fagg: Machine Learning Practice 58
![Page 59: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/59.jpg)
Linear Imputer
For our implementation: we will take advantage of the
DataFrame.interpolate() method
Andrew H. Fagg: Machine Learning Practice 59
![Page 60: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/60.jpg)
• Live example
Andrew H. Fagg: Machine Learning Practice 60
![Page 61: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/61.jpg)
• CV_M4_L12
Andrew H. Fagg: Machine Learning Practice 61
![Page 62: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/62.jpg)
CS/DSA 5970: Machine Learning Practice
Pipeline Example: Building a New Pipeline
Andrew H. Fagg: Machine Learning Practice 62
![Page 63: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/63.jpg)
• Live example
Andrew H. Fagg: Machine Learning Practice 63
![Page 64: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/64.jpg)
• CV_M4_L13
Andrew H. Fagg: Machine Learning Practice 64
![Page 65: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/65.jpg)
CS/DSA 5970: Machine Learning Practice
Representing Categorical Data
Andrew H. Fagg: Machine Learning Practice 65
![Page 66: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/66.jpg)
Handling Categorical Data
• Discrete, finite set of values
– Most often the different values are strings or symbols
– Also known as an enumerated type
• Most ML algorithms only address numerical data, so need
some way of transforming from categorical values to
some numerical representation
Andrew H. Fagg: Machine Learning Practice 66
![Page 67: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/67.jpg)
Handling Categorical Data
Often done in stages:
• Identify the set of possible categorical values
• Transform these values into an integer index
– Order is arbitrary
• Transform the integer index into a 1-hot encoding
– Array of bits: one bit per possible index value
– For a given categorical value, only one bit is one and all others
are zeros
• Different from book: use OneHotEncoder to do all of this!
Andrew H. Fagg: Machine Learning Practice 67
![Page 68: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/68.jpg)
• Live example
Andrew H. Fagg: Machine Learning Practice 68
![Page 69: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/69.jpg)
• CV_M4_L14
Andrew H. Fagg: Machine Learning Practice 69
![Page 70: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/70.jpg)
CS/DSA 5970: Machine Learning Practice
Example: Adding Data to a DataFrame
Andrew H. Fagg: Machine Learning Practice 70
![Page 71: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/71.jpg)
Example: Adding Data to a DataFrame
Our example:
• Create a discrete label as a function of Z
• Convert discrete label to a 1-Hot Encoding
• Add these columns to the original DataFrame
Andrew H. Fagg: Machine Learning Practice 71
![Page 72: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/72.jpg)
• Live example
Andrew H. Fagg: Machine Learning Practice 72
![Page 73: CV M4 L01 - cs.ou.edu](https://reader036.fdocuments.in/reader036/viewer/2022073106/62e4ef68ba952102f06bcc5a/html5/thumbnails/73.jpg)
Andrew H. Fagg: Machine Learning Practice 73