# Old Machine Learning notes

Updated: Jan 09, 2020 by Pradeep Gowda.

“Sooner or later, you will have to derive it”

This is a work in progress and not ready for public consumption (if it ever will be!).

## Introduction

### Pre-requisites

• Algebra
• Optimisation
• Probability and Randomness

### Text books and references

1. Pattern recognition and machine learning by Crhistopher Bishop
2. The elements of Statistical Learning by T. Hastie, R. Tibshirani and
1. Friedman

## Probability Distributions

Terminology1

As probability theory is used in quite diverse applications, terminology is not uniform and sometimes confusing. The following terms are used for non-cumulative probability distribution functions:

• Probability mass, Probability mass function, p.m.f.: for discrete random variables.
• Categorical distribution: for discrete random variables with a finite set of values.
• Probability density, Probability density function, p.d.f: Most often reserved for continuous random variables.

The following terms are somewhat ambiguous as they can refer to non-cumulative or cumulative distributions, depending on authors’ preferences:

• Probability distribution function: Continuous or discrete, non-cumulative or cumulative.
• Probability function: Even more ambiguous, can mean any of the above, or anything else.

Finally,

• Probability distribution: Either the same as probability distribution function. Or understood as something more fundamental underlying an actual mass or density function.

Basic terms

• Mode: most frequently occurring value in a distribution
• Tail: region of least frequently occurring values in a distribution

### Discrete probability Distribution

Examples: {Poisson, Bernoulli, binomial, geometric, and negative binomial} distribution.

A discrete probability distribution is often represented as a generalized probability density function involving Dirac delta functions which substantially unifies the treatment of continuous and discrete distributions. This is especially useful when dealing with probability distributions involving both a continuous and a discrete part.

### Normal Distribution or Gaussian Distribution

is a continuous probability distribution that has a bell-shaped probability density function, known as the Gaussian function or informally the bell curve

$f(x; \mu,\sigma^{2}) = \frac{1}{\sqrt{2\pi}\sigma}e^{-(x-\mu)^2/(2\sigma^2)}$

where, μ is the mean or expectation(location of the peak) and σ2 is the variance.

### Probability Distribution Function

OF a random variable describes the relative frequencies of different values for that random variable.

### Some definitions

Exponentiating a quadratic function f(x) = a2 + bx + c gives f(x) = ea2 + bx + c

Parameterisation is the process of deciding and defining the parameters necessary for a complete or relevant specification of a model or geometric object.

Natural Parameter of the normal distribution: ?

## Mathematics notes

#### Posterior probability

The posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence is taken into account. Similarly, the posterior probability distribution is the distribution of an unknown quantity, treated as a random variable, conditional on the evidence obtained from an experiment or survey.

$$P(A|B) = \frac{P(B|A) P(A)}{P(B)}$$

The posterior probability distribution of one random variable given the value of another can be calculated with Bayes’ theorem by multiplying the prior probability distribution by the likelihood function, and then dividing by the normalizing constant, as follows:

#### Conjugate Prior

If the posterior distributions p(θ|x) are in the same family as the prior probability distribution p(θ), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood

A Compendium of Conjugate Priors (pdf), has a good explanation.

#### Closed form

In mathematics, an expression is said to be a closed-form expression if it can be expressed analytically in terms of a bounded number of certain “well-known” functions. Typically, these well-known functions are defined to be elementary functions—constants, one variable x, elementary operations of arithmetic (+ − × ÷), nth roots, exponent and logarithm (which thus also include trigonometric functions and inverse trigonometric functions).

An equation is said to be a closed-form solution if it solves a given problem in terms of functions and mathematical operations from a given generally accepted set. For example, an infinite sum would generally not be considered closed-form. However, the choice of what to call closed-form and what not is rather arbitrary since a new “closed-form” function could simply be defined in terms of the infinite sum.2