new AI
-
Upload
riddickdanish1 -
Category
Documents
-
view
221 -
download
0
Transcript of new AI
-
7/31/2019 new AI
1/12
-
7/31/2019 new AI
2/12
STEPS OF PATTERN RECOGNITION
-
7/31/2019 new AI
3/12
PARAMETER ESTIMATION
Estimation theory is a branch ofstatisticsandsignal processingthat deals with estimating the values of
parameters based on measured/empirical data that has a random component. The parameters describe an
underlying physical setting in such a way that their value affects the distribution of the measured data.Anestimatorattempts to approximate the unknown parameters using the measurements.
In estimation theory, it is assumed the measured data is random with probability distribution dependent on the
parameters of interest. For example, in electrical communication theory, the measurements which contain
information regarding the parameters of interest are often associated with anoisysignal. Without randomness, or
noise, the problem would bedeterministicand estimation would not be needed.
We can design an optimal classifier if we knew the prior probabilities P(i) and the class-conditional densities
p(x|i). Unfortunately, in pattern recognition applications we rarely if ever have this kind of complete knowledgeabout the probabilistic structure of the problem. In a typical case we merely have some vague, general knowledge
about the situation, together with a number of design samples or training data particular representatives of the
patterns we want to training data classify. The problem, then, is to find some way to use this information to design
or train the classifier.
One approach to this problem is to use the samples to estimate the unknown probabilities and probability
densities, and to use the resulting estimates as if they were the true values. In typical supervised pattern
classification problems, the estimation of the prior probabilities presents no serious difficulties. However,
estimation of the class-conditional densities is quite another matter.
The problem of parameter estimation is a classical one in statistics, and it can be approached in several ways. We
shall consider two common and reasonable procedures, maximum likelihood estimation and Bayesian estimation.
Although the results obtained with these two procedures are frequently nearly identical, the approaches are
conceptually quite different.
One is computational complexity and here maximum likelhood methodsare often to be preferredsince they require merely differential
calculus techniques or gradient search for , rather than a possibly complex multidimensional integration needed in Bayesian estimation.
Interpretability. In many cases the maximum likelihood solution will be easier to interpret and understand since it returns the single best
model from the set the designer provided (and presumably understands).In contrast Bayesian methods give a weighted average of models
(parameters), often leading to solutions more complicated and harder to understand than those provided by the designer. The Bayesian
approach reflects the remaining uncertainty in the possible models.
http://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Signal_processinghttp://en.wikipedia.org/wiki/Signal_processinghttp://en.wikipedia.org/wiki/Signal_processinghttp://en.wikipedia.org/wiki/Estimatorhttp://en.wikipedia.org/wiki/Estimatorhttp://en.wikipedia.org/wiki/Estimatorhttp://en.wikipedia.org/wiki/Noise_(physics)http://en.wikipedia.org/wiki/Noise_(physics)http://en.wikipedia.org/wiki/Signal_(electrical_engineering)http://en.wikipedia.org/wiki/Signal_(electrical_engineering)http://en.wikipedia.org/wiki/Signal_(electrical_engineering)http://en.wikipedia.org/wiki/Determinismhttp://en.wikipedia.org/wiki/Determinismhttp://en.wikipedia.org/wiki/Determinismhttp://en.wikipedia.org/wiki/Determinismhttp://en.wikipedia.org/wiki/Signal_(electrical_engineering)http://en.wikipedia.org/wiki/Noise_(physics)http://en.wikipedia.org/wiki/Estimatorhttp://en.wikipedia.org/wiki/Signal_processinghttp://en.wikipedia.org/wiki/Statistics -
7/31/2019 new AI
4/12
Maximum likelihood
In statistics, maximum-likelihood estimation (MLE) is amethodofestimatingtheparametersof astatistical
model. When applied to a data set and given astatistical model, maximum-likelihood estimation
providesestimatesfor the model's parameters.
The method of maximum likelihood corresponds to many well-known estimation methods in statistics. For
example, one may be interested in the heights of adult female giraffes, but be unable due to cost or time
constraints, to measure the height of every single giraffe in a population. Assuming that the h eights arenormally
(Gaussian) distributedwith some unknownmeanandvariance, the mean and variance can be estimated with MLE
while only knowing the heights of some sample of the overall population. MLE would accomplish this by taking the
mean and variance as parameters and finding particular parametric values that make the observed results the
most probable (given the model).
Parameters are fixed but unknown
Best parameters obtained by maximizing probability of obtaining samples observed
z Has good convergence properties as sample size increases
z Simpler than any other alternative techniques
The General Principle
Suppose that we separate a collection of samples according to class, so that we have c sets, D1, ...,Dc, with the
samples in Dj having been drawn independently according to the probability law p(x|j ). We say such samples are
i.i.d. independent identically i.i.d. distributed random variables. We assume that p(x|j ) has a known
parametric form,and is therefore determined uniquely by the value of a parameter vector j .For example, we
might have p(x|j ) N(j ,j ), where j consists of the components of j and j . To show the dependence of
p(x|j )on j explicitly, we write p(x|j )as p(x|j , j ). Our problem is to use the information provided by the
training samples to obtain good estimates for the unknown parameter vectors 1, ..., c associated with each
category.
To simplify treatment of this problem, we shall assume that samples in Di give no information about j if i = j
that is, we shall assume that the parameters for the different classes are functionally independent. This permits us
to work with each class separately, and to simplify our notation by deleting indications of class distinctions.
With this assumption we thus have c separate problems of the following form:Use a set D of training samples
drawn independently from the probability density p(x|)to estimate the unknown parameter vector .
Use n training samples in a class to estimate
If D contains n independently drawn samples, x1, x2,, xn
http://en.wikipedia.org/wiki/Statistical_methodhttp://en.wikipedia.org/wiki/Statistical_methodhttp://en.wikipedia.org/wiki/Statistical_methodhttp://en.wikipedia.org/wiki/Estimation_theoryhttp://en.wikipedia.org/wiki/Estimation_theoryhttp://en.wikipedia.org/wiki/Estimation_theoryhttp://en.wikipedia.org/wiki/Parameterhttp://en.wikipedia.org/wiki/Parameterhttp://en.wikipedia.org/wiki/Parameterhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Estimatehttp://en.wikipedia.org/wiki/Estimatehttp://en.wikipedia.org/wiki/Estimatehttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Variancehttp://en.wikipedia.org/wiki/Variancehttp://en.wikipedia.org/wiki/Variancehttp://en.wikipedia.org/wiki/Variancehttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Estimatehttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Parameterhttp://en.wikipedia.org/wiki/Estimation_theoryhttp://en.wikipedia.org/wiki/Statistical_method -
7/31/2019 new AI
5/12
ML estimate of is, by definition the value that maximizes p(D | )
It is the value of that best agrees with the actually observed training samples
The maximum-likelihood estimator has essentially nooptimalpropertiesfor finite samples.[3]
However, the
maximum-likelihood estimator possesses a number of attractiveasymptotic properties, for many problems; these
asymptotic properties include:
Consistency: the estimator converges in probability to the value being estimated.
Asymptotic normality: as the sample size increases, the distribution of the MLE tends to the Gaussian
distribution with mean and covariance matrix equal to the inverse of theFisher informationmatrix.
Efficiency, i.e., it achieves theCramrRao lower boundwhen the sample size tends to infinity. This means
that no asymptotically unbiased estimator has lower asymptoticmean squared errorthan the MLE.
Second-order efficiency after correction for bias.
Applications
Maximum likelihood estimation is used for a wide range of statistical models, including:
linear modelsandgeneralized linear models;
exploratoryandconfirmatory factor analysis;
structural equation modeling;
many situations in the context ofhypothesis testingandconfidence intervalformation;
discrete choicemodels.
These uses arise across applications in widespread set of fields, including:
communication systems;
econometrics;
data modeling in nuclear and particle physics;
magnetic resonance imaging;
http://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Maximum_likelihood#cite_note-2http://en.wikipedia.org/wiki/Maximum_likelihood#cite_note-2http://en.wikipedia.org/wiki/Maximum_likelihood#cite_note-2http://en.wikipedia.org/wiki/Asymptotic_theory_(statistics)http://en.wikipedia.org/wiki/Asymptotic_theory_(statistics)http://en.wikipedia.org/wiki/Asymptotic_theory_(statistics)http://en.wikipedia.org/wiki/Asymptotic_theory_(statistics)http://en.wikipedia.org/wiki/Consistency_of_an_estimatorhttp://en.wikipedia.org/wiki/Consistency_of_an_estimatorhttp://en.wikipedia.org/wiki/Asymptotic_normalityhttp://en.wikipedia.org/wiki/Asymptotic_normalityhttp://en.wikipedia.org/wiki/Fisher_informationhttp://en.wikipedia.org/wiki/Fisher_informationhttp://en.wikipedia.org/wiki/Fisher_informationhttp://en.wikipedia.org/wiki/Efficient_estimatorhttp://en.wikipedia.org/wiki/Efficient_estimatorhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Mean_squared_errorhttp://en.wikipedia.org/wiki/Mean_squared_errorhttp://en.wikipedia.org/wiki/Mean_squared_errorhttp://en.wikipedia.org/wiki/Linear_modelhttp://en.wikipedia.org/wiki/Linear_modelhttp://en.wikipedia.org/wiki/Generalized_linear_modelhttp://en.wikipedia.org/wiki/Generalized_linear_modelhttp://en.wikipedia.org/wiki/Generalized_linear_modelhttp://en.wikipedia.org/wiki/Factor_analysishttp://en.wikipedia.org/wiki/Factor_analysishttp://en.wikipedia.org/wiki/Confirmatory_factor_analysishttp://en.wikipedia.org/wiki/Confirmatory_factor_analysishttp://en.wikipedia.org/wiki/Confirmatory_factor_analysishttp://en.wikipedia.org/wiki/Structural_equation_modelinghttp://en.wikipedia.org/wiki/Structural_equation_modelinghttp://en.wikipedia.org/wiki/Hypothesis_testinghttp://en.wikipedia.org/wiki/Hypothesis_testinghttp://en.wikipedia.org/wiki/Hypothesis_testinghttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Discrete_choicehttp://en.wikipedia.org/wiki/Discrete_choicehttp://en.wikipedia.org/wiki/Communication_systemshttp://en.wikipedia.org/wiki/Communication_systemshttp://en.wikipedia.org/wiki/Econometricshttp://en.wikipedia.org/wiki/Econometricshttp://en.wikipedia.org/wiki/Econometricshttp://en.wikipedia.org/wiki/Communication_systemshttp://en.wikipedia.org/wiki/Discrete_choicehttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Hypothesis_testinghttp://en.wikipedia.org/wiki/Structural_equation_modelinghttp://en.wikipedia.org/wiki/Confirmatory_factor_analysishttp://en.wikipedia.org/wiki/Factor_analysishttp://en.wikipedia.org/wiki/Generalized_linear_modelhttp://en.wikipedia.org/wiki/Linear_modelhttp://en.wikipedia.org/wiki/Mean_squared_errorhttp://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_lower_boundhttp://en.wikipedia.org/wiki/Efficient_estimatorhttp://en.wikipedia.org/wiki/Fisher_informationhttp://en.wikipedia.org/wiki/Asymptotic_normalityhttp://en.wikipedia.org/wiki/Consistency_of_an_estimatorhttp://en.wikipedia.org/wiki/Asymptotic_theory_(statistics)http://en.wikipedia.org/wiki/Maximum_likelihood#cite_note-2http://en.wikipedia.org/wiki/Optimization_(mathematics) -
7/31/2019 new AI
6/12
INTRODUCTION OF CLUSTER TECHNIQUES
-
7/31/2019 new AI
7/12
-
7/31/2019 new AI
8/12
-
7/31/2019 new AI
9/12
A hidden Markov model (HMM) is astatisticalMarkov modelin which the system being modeled is
assumed to be aMarkov processwith unobserved (hidden) states. An HMM can be considered as the
simplestdynamic Bayesian network.
In a regularMarkov model, the state is directly visible to the observer, and therefore the state transition
probabilities are the only parameters. In a hiddenMarkov model, the state is not directly visible, but
output, dependent on the state, is visible. Each state has a probability distribution over the possible output
tokens. Therefore the sequence of tokens generated by an HMM gives some information about the
sequence of states. Note that the adjective 'hidden' refers to the state sequence through which the model
passes, not to the parameters of the model; even if the model parameters are known exactly, the model is
still 'hidden'.
We continue to assume that at every time step t the system is in a state (t) but now we also
assume that it emits some (visible) symbol v(t). While sophisticated Markov models allow for the
emission of continuous functions (e.g., spectra), we will restrict ourselves to the case where a
discrete symbol is emitted. As with the states, we define a particular sequence of such visible
states as VT = {v(1),v(2), ..., v(T)} and thus we might have V6 = {v5,v1,v1,v5,v2,v3}.
Our model is then that in any state (t) we have a probability of emitting a particular visible state
vk(t). We denote this probability P(vk(t)|j (t)) = bjk. Because we have access only to the visible
states, while the i are unobservable, such a full model is called a hidden Markov model.
http://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_processhttp://en.wikipedia.org/wiki/Markov_processhttp://en.wikipedia.org/wiki/Markov_processhttp://en.wikipedia.org/wiki/Dynamic_Bayesian_networkhttp://en.wikipedia.org/wiki/Dynamic_Bayesian_networkhttp://en.wikipedia.org/wiki/Dynamic_Bayesian_networkhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Dynamic_Bayesian_networkhttp://en.wikipedia.org/wiki/Markov_processhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Statistical_model -
7/31/2019 new AI
10/12
-
7/31/2019 new AI
11/12
A concrete example
Consider two friends, Alice and Bob, who live far apart from each other and who talk together daily over the telephone about what they did
that day. Bob is only interested in three activities: walking in the park, shopping, and cleaning his apartment. The choice of what to do is
determined exclusively by the weather on a given day. Alice has no definite information about the weather where Bob l ives, but she knows
general trends. Based on what Bob tells her he did each day, Alice tries to guess what the weather must have been like.
Alice believes that the weather operates as a discreteMarkov chain. There are two states, "Rainy" and "Sunny", but she cannot observe
them directly, that is, they are hiddenfrom her. On each day, there is a certain chance that Bob will perform one of the following activities,
depending on the weather: "walk", "shop", or "clean". Since Bob tells Alice about his activities, those are the observations. The entire system
is that of a hidden Markov model (HMM).
states = ('Rainy','Sunny')
observations = ('walk','shop','clean')
start_probability = {'Rainy': 0.6,'Sunny': 0.4}
transition_probability = {
'Rainy' : {'Rainy': 0.7,'Sunny': 0.3},
'Sunny' : {'Rainy': 0.4,'Sunny': 0.6},
}
emission_probability = {
'Rainy' : {'walk': 0.1,'shop': 0.4,'clean': 0.5},
'Sunny' : {'walk': 0.6,'shop': 0.3,'clean': 0.1},
}
In this example, there is only a 30% chance that tomorrow will be sunny if today is rainy. Theemission_probability represents
how likely Bob is to perform a certain activity on each day. If it is rainy, there is a 50% chance that he is cleaning his apartment; if it is sunny,
there is a 60% chance that he is outside for a walk.
http://en.wikipedia.org/wiki/Markov_chainhttp://en.wikipedia.org/wiki/Markov_chainhttp://en.wikipedia.org/wiki/Markov_chainhttp://en.wikipedia.org/wiki/Markov_chain -
7/31/2019 new AI
12/12
Knowledge representation (KR) is an area of artificial intelligenceresearch aimed at representing knowledge in symbols to facilitateinferencing from
thoseknowledgeelements, creating new elements of knowledge. The KR can be made to be independent of the underlying knowledge model or knowledge base
system (KBS) such as asemantic network.
Knowledge Representation (KR) research involves analysis of how to reason accurately and effectively and how best to use a set of symbols to represent a set of
facts within a knowledge domain. A symbol vocabulary and a system of logic are combined to enableinferencesabout elements in the KR to create new KR
sentences. Logic is used to supply formal semanticsof how reasoning functions should be applied to the symbols in the KR system. Logic is also used to define
how operators can process and reshape the knowledge. Examples of operators and operations include, negation, conjunction, adverbs, adjectives, quantifiers and
modal operators. The logic is interpretation theory. These elements--symbols, operators, and interpretation theory--are what give sequences of symbols meaning
within a KR.
A knowledge representation (KR) is most fundamentally a surrogate, a substitute for the thing it self, used to enable an entity to determine consequences bythinking rather than acting, i.e., by reasoning about the world rather than taking action in it.
It is a set of ontological commitments, i.e., an answer to the question: In what terms should I think about the world? It is a fragmentary theory of intelligent reasoning, expressed in terms of three components: (i) the representation's fundamental conception of intelligent
reasoning; (ii) the set of inferences the representation sanctions; and (iii) the set of i nferences it recommends. It is a medium for pragmatically efficient computation, i.e., the computational environment in which thinking is accomplished. One contribution to this
pragmatic efficiency is supplied by the guidance a representation provides for organizing information so as to facilitate making the recommended inferences.
It is a medium of human expression, i.e., a language in which we say things about the world."
Some issues that arise in knowledge representation from an AI perspective are:
How do people represent knowledge? What is the nature of knowledge? Should a representation scheme deal with a particular domain or should it be general purpose? How expressive is a representation scheme orformal language? Should the scheme be declarative or procedural?
CharacteristicsA good knowledge representation covers six basic characteristics:
Coverage, which means the KR covers a breadth and depth of information. Without a wide coverage, the KR cannot determine anything or resolveambiguities.
Understandable by humans. KR is viewed as a natural language, so the logic should flow freely. It should support modularity and hierarchies of classes(Polar bears are bears, which are animals). It should also have simple primitives that combine in complex forms.
Consistency. If John closed the door, it can also be i nterpreted as the door was closed by John. By being consistent, the KR can eliminate redundant orconflicting knowledge.
Efficient Easiness for modifying and updating. Supports the intelligent activity which uses the knowledge base
http://en.wikipedia.org/wiki/Artificial_intelligencehttp://en.wikipedia.org/wiki/Artificial_intelligencehttp://en.wikipedia.org/wiki/Artificial_intelligencehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Knowledgehttp://en.wikipedia.org/wiki/Knowledgehttp://en.wikipedia.org/wiki/Knowledgehttp://en.wikipedia.org/wiki/Semantic_networkhttp://en.wikipedia.org/wiki/Semantic_networkhttp://en.wikipedia.org/wiki/Semantic_networkhttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Semanticshttp://en.wikipedia.org/wiki/Semanticshttp://en.wikipedia.org/wiki/Semanticshttp://en.wikipedia.org/wiki/Formal_languagehttp://en.wikipedia.org/wiki/Formal_languagehttp://en.wikipedia.org/wiki/Formal_languagehttp://en.wikipedia.org/wiki/Formal_languagehttp://en.wikipedia.org/wiki/Semanticshttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Semantic_networkhttp://en.wikipedia.org/wiki/Knowledgehttp://en.wikipedia.org/wiki/Inferencehttp://en.wikipedia.org/wiki/Artificial_intelligence