Old Machine Learning notes

Updated: Jan 09, 2020 by Pradeep Gowda.

“Sooner or later, you will have to derive it”

This is a work in progress and not ready for public consumption (if it ever will be!).



Text books and references

  1. Pattern recognition and machine learning by Crhistopher Bishop
  2. The elements of Statistical Learning by T. Hastie, R. Tibshirani and
      1. Friedman

Journals and conferences

Lecture notes and videos


Random variables

Probability Distributions


As probability theory is used in quite diverse applications, terminology is not uniform and sometimes confusing. The following terms are used for non-cumulative probability distribution functions:

The following terms are somewhat ambiguous as they can refer to non-cumulative or cumulative distributions, depending on authors’ preferences:


Basic terms

Conditional probability

Marginal Probability

Joint Probability

Discrete probability Distribution

Examples: {Poisson, Bernoulli, binomial, geometric, and negative binomial} distribution.

A discrete probability distribution is often represented as a generalized probability density function involving Dirac delta functions which substantially unifies the treatment of continuous and discrete distributions. This is especially useful when dealing with probability distributions involving both a continuous and a discrete part.

Normal Distribution or Gaussian Distribution

is a continuous probability distribution that has a bell-shaped probability density function, known as the Gaussian function or informally the bell curve

$f(x; \mu,\sigma^{2}) = \frac{1}{\sqrt{2\pi}\sigma}e^{-(x-\mu)^2/(2\sigma^2)}$

where, μ is the mean or expectation(location of the peak) and σ2 is the variance.

Ref: http://en.wikipedia.org/wiki/Normal\_distribution

Probability Distribution Function

OF a random variable describes the relative frequencies of different values for that random variable.

Joint Distribution Function

Co-variance Matrix

Precision matrix (Σ − 1 ?)

Multivariate Distribtion

Multivariate Normal Distribution

Some definitions

Exponentiating a quadratic function f(x) = a2 + bx + c gives f(x) = ea2 + bx + c

Parameterisation is the process of deciding and defining the parameters necessary for a complete or relevant specification of a model or geometric object.

Natural Parameter of the normal distribution: ?

Mathematics notes

Posterior probability

The wikipedia page has good explanation.

The posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence is taken into account. Similarly, the posterior probability distribution is the distribution of an unknown quantity, treated as a random variable, conditional on the evidence obtained from an experiment or survey.

$$P(A|B) = \frac{P(B|A) P(A)}{P(B)}$$

The posterior probability distribution of one random variable given the value of another can be calculated with Bayes’ theorem by multiplying the prior probability distribution by the likelihood function, and then dividing by the normalizing constant, as follows:

Conjugate Prior

If the posterior distributions p(θ|x) are in the same family as the prior probability distribution p(θ), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood

A Compendium of Conjugate Priors (pdf), has a good explanation.

Conjugate prior relationships.

Closed form

In mathematics, an expression is said to be a closed-form expression if it can be expressed analytically in terms of a bounded number of certain “well-known” functions. Typically, these well-known functions are defined to be elementary functions—constants, one variable x, elementary operations of arithmetic (+ − × ÷), nth roots, exponent and logarithm (which thus also include trigonometric functions and inverse trigonometric functions).

An equation is said to be a closed-form solution if it solves a given problem in terms of functions and mathematical operations from a given generally accepted set. For example, an infinite sum would generally not be considered closed-form. However, the choice of what to call closed-form and what not is rather arbitrary since a new “closed-form” function could simply be defined in terms of the infinite sum.2

Lagrange multipliers

Identity Matrix

Eigen values and Eigen Vectors

Indicator Random variables

  1. http://en.wikipedia.org/wiki/Probability\_distribution↩︎

  2. http://mathworld.wolfram.com/Closed-FormSolution.html↩︎