MACHINE LEARNING & APPLICATIONS - WordPress.com · Machine Learning & Applications Course...
Transcript of MACHINE LEARNING & APPLICATIONS - WordPress.com · Machine Learning & Applications Course...
MACHINE LEARNING &
APPLICATIONS
Introductory Videos
What is Machine Learning
Example of Machine Learning
Career opportunities
Search Engine companies: Google Yahoo Bing (Microsoft) Ask
Social network companies: Facebook Twitter Linkedin Instagram Tumblr
Engineering related companies:
Intel Oil industry IBM HCL Technologies Wipro Technologies Verizon Visa Boeing SAP Oracle
Financial related companies: Amazon Apple eBay EMC Bank of America Capital One Paypal GE Capital
Career opportunities
Data science vendors: Palantir Teradata Pixar SAS Alpine Labs Pivotal Tableau
More than 8000 companies hiring data
scientist.
First Lecture
Examination Scheme
Prerequisites
Course Objectives
Course Outcomes
Syllabus Mapping
Last Three Year Result
Examination Scheme
Machine Learning & Applications
Course Objectives:
Understanding Human learning aspects.
Understanding primitives and methods in learning process by computer.
Understanding the nature of problems solved with Machine Learning
Machine Learning & Applications
Course Outcomes:
Model the learning primitives.
Build the learning model.
Tackle real world problems in the domain of Data Mining and Big Data Analytics, Information Retrieval, Computer vision, Linguistics and Bioinformatics.
Text Books
Text Books
10 to 15
http://www-bcf.usc.edu/~gareth/
faculty.washington.edu/ dwitten/
web.stanford.edu/~hastie/
tatweb.stanford.edu/~tibs/
Syllabus Mapping
UNIT-I
Syllabus Mapping
UNIT-II
Syllabus Mapping
UNIT-III
Syllabus Mapping
UNIT-IV
Syllabus Mapping
UNIT-V
Syllabus Mapping
UNIT-VI
Result of Last Three Years
2017-18 2016-17 2015-16
0
10
20
30
40
50
60
70
80
90
100
Result Analysis
Result in %
Academic Year
Re
su
lt in
%
Academic Year Highest Marks Name of the Subject
Topper
2017-18 79 Pagare Snehal
2016-17 79 Prerana Bafna
2015-16 71 Kothawade Priyanka
Unit 1
INTRODUCTION TO MACHINE LEARNING Introduction: What is Machine Learning, Examples of Machine Learning applications, Training versus Testing, Positive and Negative Class, Cross-validation Types of Learning: Supervised, Unsupervised and Semi-Supervised Learning Dimensionality Reduction: Introduction to Dimensionality Reduction, Subset
Selection, Introduction to Principal Component Analysis
Overview of Machine Learning
Recap
Machine Designing Or
Machine Learning ?
8/2/2018 21
Designing Algorithms and Analysis (DAA)
OR (LAA)
8/2/2018 22
Recap
• Y is calculated
• A is Designed
• One Stroke Process
Complexity ( LB, UB)
(e.g. Straussan’s-Matrix and it’s LB)
A
X Y
DAA
Design Analysis (Design Effects)
Designing
8/2/2018 23
Techniques/Models/Methods:
Divide and Conquer
Greedy
Dynamic Programming
BackTracking, etc.
A
calculates /searches
actual output
• Y is Estimated
• A is Learned
• Two Stroke Process
• Complexity
• Overfitting/ Underfitting
• Bias-Variance Tradeoff
• Learning Curves - (Ein & Eout Vs N )
A
X
Y
LAA
Learning Analysis (Learning Effects)
Two Stroke Process :
• Training Data Set and Testing Data Set
• Universal Data Set (OMG -- !! ) --- ??
• Probability for Inferencing and Emulating Universal Set to prove feasibility of learning
• Hoeffding Inequality-- Tells us How Poor the Training Set is !!
( Relates knowledge hidden in Training Set to Universal Set )
Techniques/Models/Methods:
Linear/Non-Linear
Parametric/Non-Parametric
Kernel Based Models
Probabilistic Models , etc.
Designing and Learning
8/2/2018 24
A
learns/searches
“Hypothesis”
that helps to
predict/describe output
Definition of Machine Learning
Arthur Samuel defined machine learning as a "Field of study that gives computers the ability to learn without being explicitly programmed". Definition by Herbert Simon’s “Learning is any process by which a system improves performance from experience.” According to Definition by Tom M. Mitchell "Machine Learning is the study of algorithms that improve their performance P at some task T with experience E.” A well define learning task is given by <P, T, E>.
Spam filter Example
Consider the example of Spam Filter, an email program which watches the email is to be mark as spam or not. T: To decide whether an email is spam or not E: The number of emails which are correctly decided as spam/not spam P: Observing the label of emails for available email data
Let’s do exercise…. Importance of label
• Apple • Banana • Red • Yellow • Orange • Blue
• Cherry
• Apple • Banana • Orange • Cherry
• Red • Yellow • Blue
• Apple • Banana • Cherry
• Red • Yellow • Blue • Orange
Let’s do exercise….
• Apple - Fruit • Banana -Fruit • Red -Color • Yellow-Color • Orange – Fruit • Blue - Color
• Cherry -Fruit
• Apple • Banana • Orange • Cherry
• Red • Yellow • Blue
supervised and unsupervised learning
suppose you had a basket and it is filled with some different kinds of fruits, your task is to arrange them as groups. For understanding let me clear the names of the fruits in our basket.
Unsupervised Learning :
No Training rules or data available while grouping of fruits. suppose you have considered color RED COLOR GROUP: apples & cherry fruits. GREEN COLOR GROUP: bananas & grapes. so now you will take another physical character such as size
RED COLOR AND BIG SIZE: apple. RED COLOR AND SMALL SIZE: cherry fruits. GREEN COLOR AND BIG SIZE: bananas. GREEN COLOR AND SMALL SIZE: grapes.
Unsupervised Learning
Here you didn’t know learn any thing before , means label are not included in training data and no response variable. This type of learning is know unsupervised learning.
Apple
Banana
Grapes
Cherry
four types of fruits
Supervised Learning :
physical characters of fruits are known . So arranging the same type of fruits at one place is easy now. Your previous work is called as training data in data mining. so you already learn the things from your train data, this is because of response variable. Response variable mean just a decision variable. You can observe response variable below (FRUIT NAME) .
No. SIZE COLOR SHAPE FRUIT NAME
1 Big Red Rounded shape with a depression at the
top Apple
2 Small Red Heart-shaped to nearly globular
Cherry
3 Big Green Long curving
cylinder Banana
4 Small Green Round to oval,Bunch
shape Cylindrical Grape
Supervised LEarning
If you learn the thing before from training data and
then applying that knowledge to the test data(for
new fruit), This type of learning is called as
Supervised Learning.
Supervised Learning
If you learn the thing before from training data and
then applying that knowledge to the test data(for
new fruit), This type of learning is called as
Supervised Learning.
Learning Associations
100 customers
10 8
6
P (Milk | Butter) = 6/100 = 0.06. It concludes that 6 percent of customers who buy butter also
buy milk.
Classification
IF income> θ1 AND savings> θ2 THEN low-risk ELSE high-risk
Pattern recognition
• Recognition or authentication of people using their physiological characteristics
• Difficult to write programs that recognizing a face
• A machine learning algorithm then takes these examples and produces a program
Regression
Brand, Year, Engine Capacity,
Mileage, And
Other Information
X denotes the car attributes and Y be the price of the car
Y = WX + W0, for suitable values of w and w0
Unsupervised Learning
To find clusters or groupings of input of similar data.
Reinforcement learning
Examples of Machine Learning Application
•Learning Associations •Classification •Regression •Unsupervised Learning •Reinforcement Learning
Training Vs Testing
Training Vs Testing
Training Phase: • Input :Training dataset having attributes
and class labels to prepare model.
• to find relationships, detect patterns, understand complex problems and make
decisions. • Training error is the error that is occurred
by applying the model to the same data
from which the model is trained. • In simple way the actual output of training
data and predicted output of model does not match the training error Ein is said to be occurred.
• Training error is much easier to compute.
Testing Phase: • Input: Test dataset is a dataset for which
class label is unknown
• For assessment of the finally chosen model. • Training and Testing dataset are completely
different. • Testing error is the error that is occurred by
assessing the model by providing the
unknown data to the model. • In simple way the actual output of testing
data and predicted output of model does not match the testing error E out is said to be occurred.
• E out is observed generally larger then Ein.
Cross-validation
• To minimize the generalization error. • The generalization error is essentially the average error for data the model has have never seen. • In general, the dataset is divided into two partition training and test sets.
• The fit method is called on the training set to build the model. • This fit method is applied to the model on the test set to estimate the target value and evaluate the
model's performance. • The reason the data is divided into training and test sets is to use the test set to estimate how well
the model trained on the training data and how well it would perform on the unseen data.
• However, cross-validation is a method that goes beyond evaluating a single model using a single train and test split of the data.
• It is applied to more subsets created using the training dataset and each of which is used to train and evaluate a separate model.
• Cross-validation is a method for getting a reliable estimate of model performance using only the
available training data. • There are several ways to cross-validate. The most common is K fold cross-validation.
Cross Validation….
K Fold Cross Validation
Algorithm for K Fold Cross Validation:
1. Split the dataset into K equal partitions (or “folds”). 2. Use fold 1 as the testing set and the union of the other folds as the training
set. 3. Calculate testing accuracy. 4. Repeat steps 2 and 3 K times, using a different fold as the testing set each
time. 5. Use the average testing accuracy as the estimate of out-of-sample accuracy. A value of k=10 is very common in the field of applied machine learning.
Positive and Negative Class
The ingredients of Machine Learning
right features to build the right models that achieve the right tasks
• Features define a ‘language’ in which the relevant objects are defined in particular domain.
E.g. Car object can have features Model No, Manufacturing Year, run kilometers etc. • Task is an abstract representation of a problem we want to solve
regarding those domain objects E.g. to decide the price of used car. • Many of the tasks can be represented as a mapping from data
points to outputs. This mapping is done by the machine learning model.
• There is a wide variety of models to choose from, so it is
observed that models lend the machine learning fielddiversity, but
tasks and features give it unity
Machine Learning Task:
The problems that can be solved with machine learning is generally defined by task Task have the broad categories:
Supervised Learning and Unsupervised Learning:
The task of grouping data with prior information is known as Supervised Learning and the task of finding out hidden structure from given data is unsupervised learning.
Predictive Model and Descriptive Model: The output of predictive model involves the target variable. The model tries to predict a value X using
other values in the dataset. For example, it tries to predict if loan is approved or not, an e-mail is spam or not. The output of descriptive model does not involve the target variable. A descriptive model instead tries
to find structure of data in novel and interesting ways. More specifically it detects or recognizes a particular pattern.
Categories of Machine Learning Task:
Predictive Task
Binary Classification: The task of classifying the given instances into two groups on the basis of classification rules. It is intuitive and easy to explain. E.g. decide the category of Email Spam or Ham
Multiclass Classification: The task of classifying the instances into more than two groups. E.g. decide the category of Email Spam or Private Mail or Work-related mail.
Regression: Sometimes it is natural to discard the notion of discrete classes, instead predict a real number. E.g. randomly selecting a n email from inbox and label it with an urgency score (between 0 to 1),
work related email are labelled with priority 1.1 and so on. Clustering: The task of grouping data without prior information is known as clustering. A typical
clustering works by measuring the similarities between given instances, putting similar instances in same cluster and dissimilar instances into different cluster. In one way of clustering, every cluster has one representative known as exemplar, this clustering is known as predictive clustering.
Descriptive Task
Subgroup discovery: In Subgroup discovery the dataset is given with instances and some attributes of instances. The task of machine learning is to find the sub groups of the instances that are statistically more interesting. Subgroup discovery attempts to search relations
between different properties or variables of a set with respect to a target variable. The relations are generally represented through rules, e.g. if LoC > 100 and complexity > 4 then
code is defective. Association rule discovery: Association analysis is useful for discovering interesting
relationship hidden in large dataset. The relationship can be represented in the form of association rules or frequent item set. For Market Basket Analysis Considering the
association rule of two item set are in the form of x and y is X→Y, e.g. {bread}→{milk}, the person who has purchased bread also purchased milk.
Descriptive Clustering: In descriptive clustering exemplars are not used
Unit 1
INTRODUCTION TO MACHINE LEARNING Introduction: What is Machine Learning, Examples of Machine Learning applications, Training versus Testing, Positive and Negative Class, Cross-validation Types of Learning: Supervised, Unsupervised and Semi-Supervised Learning Dimensionality Reduction: Introduction to Dimensionality Reduction, Subset
Selection, Introduction to Principal Component Analysis
Unit 1
INTRODUCTION TO MACHINE LEARNING Introduction: What is Machine Learning, Examples of Machine Learning applications, Training versus Testing, Positive and Negative Class, Cross-validation Types of Learning: Supervised, Unsupervised and Semi-Supervised Learning Dimensionality Reduction: Introduction to Dimensionality Reduction, Subset
Selection, Introduction to Principal Component Analysis
Supervised Learning
• The task of grouping data with prior information in terms of labelled training data is known as supervised learning.
• In training data each instance is a pair of an input object a desired output value.
• supervised learning is to have input variables (x), output variable (Y) and an algorithm to learn the mapping function from the input to the output.
𝑌 = 𝑓(𝑋) • A supervise learning analyze the training data and produce inferred function,
which can be used to map new examples.
Supervised Learning Model
Supervised Learning Task
Classification: A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”. Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.
Unsupervised Learning
• In machine learning task of unsupervised learning is that of trying to find hidden structure in unlabeled data.
• The training data is unlabeled, so there is no error or reward signal to evaluate a partial solution.
• Unsupervised learning is to have input data (X) and no corresponding output variables.
• The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.
• These are called unsupervised learning because unlike supervised learning above there is no correct answers and there is no teacher.
Unsupervised Learning Model
Unsupervised Learning Task
Clustering A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior. Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.
Semisupervised Learning Task
• Large amount of input data (X) and only some of the data is labeled (Y) • originate in between both supervised and unsupervised learning. • A good example is a photo archive where only some of the images are labeled,
(e.g. dog, cat, person) and the majority are unlabeled. • expensive or time-consuming to label data as it may require access to domain
experts. Whereas unlabeled data is cheap and easy to collect and store. • One can use unsupervised learning techniques to discover and learn the
structure in the input variables. • One can also use supervised learning techniques to make best guess
predictions for the unlabeled data, feed that data back into the supervised learning algorithm as training data and use the model to make predictions on new unseen data.
Unit 1
INTRODUCTION TO MACHINE LEARNING Introduction: What is Machine Learning, Examples of Machine Learning applications, Training versus Testing, Positive and Negative Class, Cross-validation Types of Learning: Supervised, Unsupervised and Semi-Supervised Learning Dimensionality Reduction: Introduction to Dimensionality Reduction, Subset
Selection, Introduction to Principal Component Analysis
Features
• The pillars of machine learning. • Determine much of the success of a machine learning application • A goodness of model is depending on goodness of • Mathematically, features are functions that map from the instance
space to some set of feature values called the domain of the feature. • Features are variety in nature e.g.
o Set of integers, the number of occurrence of particular word o Boolean, true or false for email is spam or ham o Arbitrary finite set of colors or shapes etc.
Usages of Features
• Feature as Split
Binary Split : Spam , Ham Non-Binary Split : Priority mail (Education, Placement etc.)
• Feature as Predictors 𝑤𝑖 ∗ 𝑥𝑖
𝑛𝑖=1 where 𝑥𝑖is a numerical feature
• if this is large and positive, a positive 𝑥𝑖 increases the score;
• if 𝑤𝑖 = 0, a positive 𝑥𝑖 decreases the score; • if 𝑤𝑖 ≈ 0, 𝑥𝑖s influence is negligible.
These two uses of features – ‘features as splits’ and ‘features as predictors’ – are sometimes combined in a single model
Dimensionality Reduction
• Too many factors on the basis for the final classification or
regression • Factors are basically variables called features
• The higher the number of features, the harder it gets to visualize
the training set
• These features are correlated, and hence redundant
• Dimensionality reduction is a series of techniques in machine
learning and statistics to reduce the number of random variables to
consider
• feature selection and feature extraction
Dimensionality Reduction
Why Dimensionality Reduction?
• Reduces Time Complexity: Less computation
• Reduces Space Complexity: Less parameters
• Saves the cost of observing the feature
• Simpler models are more robust on small datasets
• More interpretable; simpler explanation
• Data visualization (structure, groups, outliers, etc) if
plotted in 2 or 3 dimensions
Feature Selection
• find a smaller subset of a many-dimensional data
set to create a data model
• finding k features of the d dimensions that give us
the most information and discard the other (d − k)
dimensions.
• Subset selection is one of the widely used method
Feature Extraction
• transforming high-dimensional data into spaces of fewer dimensions
• finding a new set of k dimensions that are combinations of the original d dimensions.
• supervised or unsupervised depending on whether or not they use the output information.
• Principal Components Analysis (PCA) is most widely used
Subset Selection
• to find the best subset of the set of features • The best subset contains the least number of
dimensions that most contribute to accuracy • used in both regression and classification problems. • 2𝑑 possible subsets of 𝑑 variables • is not possible to test for all of them unless 𝑑 is small • Instead some heuristics is designed to get a
reasonable (but not optimal) solution in reasonable (polynomial) time.
Subset Selection
• to find the best subset of the set of features • The best subset contains the least number of
dimensions that most contribute to accuracy • used in both regression and classification problems. • 2𝑑 possible subsets of 𝑑 variables • is not possible to test for all of them unless 𝑑 is small • Instead some heuristics is designed to get a
reasonable (but not optimal) solution in reasonable (polynomial) time.
• Forward Selection and Backward Selection
Subset Selection
• to find the best subset of the set of features • The best subset contains the least number of
dimensions that most contribute to accuracy • used in both regression and classification problems. • 2𝑑 possible subsets of 𝑑 variables • is not possible to test for all of them unless 𝑑 is small • Instead some heuristics is designed to get a
reasonable (but not optimal) solution in reasonable (polynomial) time.
• Forward Selection and Backward Selection
Forward Selection
o It starts with no variables or null model. o In next step it will add one by one feature which is not
already considered before. o At each step after adding the one feature the error is
checked. o The process is continuing until it will find the subset
of features that decreases the error the most, or until any further addition does not decrease the error.
Forward Selection
o It starts with no variables or null model. o In next step it will add one by one feature which is not
already considered before. o At each step after adding the one feature the error is
checked. o The process is continuing until it will find the subset
of features that decreases the error the most, or until any further addition does not decrease the error.
Terminologies for Algorithm(FS & BS)
• In either case, checking the error should be done on a validation set which is distinct from the training set.
• With more features, generally training error can be reduced, but validation error may not be reduced.
• Let 𝐹 denotes, a feature set of input dimensions, 𝑥𝑖 , 𝑖 = 1, . . . , 𝑑.
• 𝐸(𝐹) denotes the error incurred on the validation sample when only the inputs in 𝐹 are used.
• Depending on the application, the error is either the mean square error or misclassification error.
Algorithm -Backward Selection
1. It starts with no features: F = ∅. 2. At each step, for all possible, x_i, the model is trained with
the training set and calculate E(F ∪ ,x_i) on the validation set. 3. The input, x_i is chosen that causes the least error
4. stop if adding any feature does not decrease E. It stops earlier if the decrease in error is too small
𝑗 = 𝑎𝑟𝑔 min𝑖 𝐸(𝐹 ∪ 𝑥𝑖)
and
add 𝑥𝑗 to 𝐹 𝑖𝑓 𝐸(𝐹 ∪ 𝑥𝑗 ) < 𝐸(𝐹)
Limitation of Backward Selection
• May be costly because to decrease the dimensions from d to k, to train and test the system runs for 𝑑 + (𝑑 − 1) + (𝑑 −2) +· · · + 𝑑 − 𝑘 times, and the time required is 𝑂(𝑑2).
• Local search procedure which does not guarantee finding the optimal subset, namely, the minimal subset causing the smallest error.
• For example,𝑥𝑖and 𝑥𝑗 individually does not give good effect
but together may decrease the error significantly. In this situation forward selection is not a good choice because this algorithm is greedy and adds attributes one by one, it may not be able to detect the effect of more than one features.
Algorithm -Backward Selection
1. Start with F containing all features 2. Remove one attribute from F that causes the least
error
And remove 𝑥𝑖 from F 𝑖𝑓 𝐸(𝐹 − 𝑥𝑗) < 𝐸(𝐹)
3. Stop if removing a feature does not decrease the error
𝑗 = 𝑎𝑟𝑔 min𝑖𝐸(𝐹− 𝑥𝑖)
Comment on -Backward Selection
The complexity of backward search has the same order of complexity as forward search, except that training a system with more features is costlier than training a system with fewer features, and forward search may be preferable especially if we expect many useless features.
Principal Component Analysis:
Principal Component Analysis:
• mapping from d-dimensional space to a new (k < d)-
dimensional space, with minimum loss of information.
• As the dimensions of data increases, the difficulty to
visualize it and perform computations on it also increase
• Mainly there are two strategies to reduce the dimensions
of a data-
o Remove the redundant dimensions
o Only keep the most important dimensions
Concepts used in PCA
Variance: It is a measure of the variability or it simply measures how spread the data set is
Concepts used in PCA
Covariance: : It is a measure of the extent to which corresponding elements from two sets of ordered data move in the same direction
Concepts used in PCA
• PCA finds a new set of dimensions such that all the dimensions are orthogonal (and hence linearly independent) and ranked according to the variance of data along them.
• It means more important principle axis occurs first. (more important = more variance/more spread out data).
• The principal direction in which the data varies is shown by the U axis and the second most important direction is the V axis orthogonal to it.
• If the each (X, Y) instance is transform coordinate into its corresponding (U, V) value, the data is de-correlated, meaning that the co-variance between the U and V variables is zero.
Concepts used in PCA
• The directions U and V are called the principal components.
PCA for Data Representation
PCA for Dimension Reduction
Working of PCA
1. The first step is to gather reliable raw data from a sample based on a questionnaire
2. The second step is to calculate correlations between the variables.
3. In principal component analysis, principal components are extracted and presented as a table with the components in columns and variables in rows.
4. The principal components analysis table is truncated. Components are reported in order by eigenvalue and by the proportion of total variance. Frequently, these components are easily interpreted.
Objectives of PCA
• PCA helps to Extract the most important information from
the data table and compress the size of the data set by
keeping only important information,
• PCA also Simplify the description of the data
• set and analyze the structure of the observations and the
variables.
Achieved 1st Milestone