What are the two kinds of calculus?

  1. Integral Calculus
  2. Differential Calculus

How can are the two key concerns of the task of fitting models?

  1. optimization: the process of fitting our models to observed data
  2. generalization: the mathematical principles and practitioners’ wisdom that guide as to how to produce models whose validity extends beyond the exact set of data examples used to train them

What is the derivative f’ of f: R -> R?

A derivative can be interpreted as the instantaneous rate of change of a function with respect to its variable. It is also the slope of the tangent line to the curve of the function.

f’(x) = limit((f(x+h)-f(x))/h, h, 0)

What is the derivative of C?

0

What is the derivative of x^n?

n*x^(n-1)

What is the derivative of e^x?

e^x

What is the derivative of ln(x)?

1/x

What is the derivative of C*f(x)?

C*derivative(f(x))

What is the derivative of f(x)+g(x)?

derivative(f(x)) + derivative(g(x))

What is the derivative of f(x)*g(x)?

f(x)derivative(g(x)) + derivative(f(x))g(x)

What is the derivative of f(x)/g(x)?

(g(x)derivative(f(x)) - f(x)derivative(g(x)))/g(x)**2

Plot the function 𝑒=𝑓(π‘₯) and its tangent line 𝑦=2π‘₯βˆ’3 at π‘₯=1 , where the coefficient 2 is the slope of the tangent line

x = np.arange(0, 3, 0.1)
plot(x, [f(x), 2 * x - 3], 'x', 'f(x)', legend=['f(x)', 'Tangent line (x=1)'])

What is a partial derivative?

Let 𝑦=𝑓(π‘₯1,π‘₯2,…,π‘₯𝑛) be a function with 𝑛 variables. The partial derivative of 𝑦 with respect to its 𝑖th parameter π‘₯𝑖 is limit((f(x1,..x_i +h, …) - f(x1,…x_i,..))/h, h, 0). We simply treat π‘₯1,…,π‘₯π‘–βˆ’1,π‘₯𝑖+1,…,π‘₯𝑛 as constants and calculate the derivative of 𝑦 with respect to π‘₯𝑖 .

What is a gradient?

We concatenate partial derivatives of a multivariate function with respect to all its variables to obtain the gradient vector of the function.

Let 𝐱 be an 𝑛 -dimensional vector, the following rules are often used when differentiating multivariate functions:

For all π€βˆˆβ„π‘šΓ—π‘› , βˆ‡π±π€π±=π€βŠ€ ,

For all π€βˆˆβ„π‘›Γ—π‘š , βˆ‡π±π±βŠ€π€=𝐀 ,

For all π€βˆˆβ„π‘›Γ—π‘› , βˆ‡π±π±βŠ€π€π±=(𝐀+π€βŠ€)𝐱 ,

βˆ‡π±β€–π±β€–2=βˆ‡π±π±βŠ€π±=2𝐱 .

Similarly, for any matrix 𝐗 , we have βˆ‡π—β€–π—β€–2𝐹=2𝐗 .

What is the chain rule?

The chain rule enables us to differentiate composite functions.

Suppose that functions 𝑦=𝑓(𝑒) and 𝑒=𝑔(π‘₯) are both differentiable, then the chain rule states that 𝑑𝑦/𝑑π‘₯=𝑑𝑦/𝑑𝑒 * 𝑑𝑒/𝑑π‘₯

Suppose that the differentiable function 𝑦 has variables 𝑒1,𝑒2,…,π‘’π‘š , where each differentiable function 𝑒𝑖 has variables π‘₯1,π‘₯2,…,π‘₯𝑛 . Note that 𝑦 is a function of π‘₯1,π‘₯2,…,π‘₯𝑛 . Then the chain rule gives: 𝑑𝑦/𝑑π‘₯𝑖 = 𝑑𝑦/𝑑𝑒1 * 𝑑𝑒1/𝑑π‘₯𝑖 + 𝑑𝑦/𝑑𝑒2 * 𝑑𝑒2/𝑑π‘₯𝑖 +β‹―+ 𝑑𝑦/π‘‘π‘’π‘š * π‘‘π‘’π‘š/𝑑π‘₯𝑖