Take for example $f(x,y) = x^y$. I defined the total derivative to be the best linear approximation of $f$. Without working out the Jacobian I found that $$Df(x,y)(h_1,h_2) = h_1yx^{y-1} + h_2x^ylog(x)$$

However the Jacobian gives me a $1 times 2$ matris: $$begin{bmatrix} yx^{y-1} & x^y log (x) end{bmatrix} $$

I don’t really understand how computing the total derivative explicitly gives me a real number, and computing the Jacobian gives me a matrix, how are they related because I do know they somehow are.

The Jacobian is just a linear function. Apply it to the point $begin{bmatrix} h_1 \ h_2end{bmatrix}$ and you get back the total derivative.

The total derivative is the best linear approximation of $f$, thus an affine plane in $x,y$ space.

$$

f(x+h_1,y+h_2) = f(x, y) + mbox{grad } f cdot (h_1, h_2)^T + O(h^2)

$$

Compare this with the $f’ : mathbb{R}^2 to mathbb{R}$ in

$$

lim_{(h_1, h_2) to 0}

frac{lVert f(x+h_1, y+h_2) – f(x,y) – f'(x,y) (h_1, h_2)^TrVert}{lVert (h_1, h_2)^T rVert} = 0

$$

You computed

$$

Delta f approx mbox{grad } f cdot (h_1, h_2)^T

$$

while the total derivative is

$$

f’ = (mbox{grad } f)^T

$$

in this case.