Questions
1. Explaining the derivation of the hazard function
A bit of context
The hazard function is often found stated in brevity as:
$$h(t)=frac{f(t)}{S(t)}$$
where $f(cdot)$ is the probability density function, and $S(cdot)$ is the survival function. Throughout this question I will be referring the descriptions given by Rodríguez and Tian.
Traditionally the survival and hazard functions come into play when the random variable $T$ is nonnegative and continuous. In this sense, at least the concept of the survival function is remarkably straight forward being the probability that $T$ is greater than $t$.
$$S(t)= 1F(t) = P(T>t)$$
This is especially intuitive when put in context, e.g. the time following diagnosis of a disease until death.
From the definition of the hazard function above, it is clear that it is not a probability distribution as it allows for values greater than one.
My confusion comes in at Rodríguez‘s definition:
$$ h(t) = limlimits_{dtrightarrow0}frac{P(tleq T<t+dtTgeq t)}{dt}$$
This to me, really only reads in a manner that makes sense in context, e.g. the numerator being the probability that the diagnosed person dies in some increment of time ($dt$) following some passage of time $t$, given that they have lived at least so long as the passage of time $t$ (or simpler, if it has been $t$ time since diagnosis, the probability that you’ll die within the next $dt$ time).
Confusion starts here:
Prior to the definition of equation (7.3) he states:
“The conditional probability in the numerator may be written as the ratio of the joint probability that $T$ is in the interval $[t,t+dt)$ and $Tgeq t$ (which is, of course, the same as the probability that $t$ is in the interval), to the probability of the condition $Tgeq t$. The former may be written as $f(t)dt$ for small $dt$, while the latter is $S(t)$ by definition”
My confusion comes from the following:
 in my exposure, joint distortions come from two random variables, not one as is the case here, $T$.
If I just accept that can be the case, I can then use a rule from conditional probability $$P(Acap B)=P(AB)P(B)$$ to restructure the numerator:
$$P(t leq T < t+dt  T geq t) = frac{P(t leq T < t+dt cap Tgeq t)}{P(Tgeq t)}$$
then substitute back in to get:
$$h(t) = limlimits_{dtrightarrow0} = frac{P(t leq T < t+dt cap Tgeq t)}{P(Tgeq t)dt}$$

it is stated matter of fact that P(t leq T < t+dt cap Tgeq t) may be written as $f(t)dt$ for small $dt$. How?

What does passing to the limit mean?

The claim is made that $h(t) = frac{d}{dt}log{S(t)}$, while possibly trivial I would appreciate to see this calculation.

What are the units of the hazard function (other than a vaguely defined likelihood)?
2. Timeindependent random variables for hazard functions?
Since the hazard function is often used in a timedependent manner, can one use it for a timeindenepent continuous random variable?

You are correct that the most of the usage of the word “joint” comes from joint distribution of multiple random variables. Obviously the author use “joint probability” to describe the probability of the intersection of events. I do see some usage on the web and other text; but whether it is a very frequent usage I am not sure.

By definition
$$ f_T(t) = frac {d} {dt} F_T(t)
= lim_{Delta t to 0} frac {F_T(t+Delta t) – F_T(t)} {Delta t}
= lim_{Delta t to 0} frac {Pr{t < T leq t + Delta t}} {Delta t}$$
Therefore you claim that $Pr{t < T leq t + Delta t} approx f_T(t)Delta t$ as $Delta t$ is small. Also note
$$Pr{t < T leq t + Delta t cap T > t} = Pr{t < T leq t + Delta t}$$
as $t < T leq t + Delta t$ is a subset of $T > t$

Passing to the limit means taking limit (after some calculations). You need to learn the definition of limit of sequence / limit of function if you are not sure about the concept.

It depends on your fundamental definition of $h(t)$:
$$ h(t) = frac {f(t)} {S(t)}
= frac {1} {S(t)} frac {d} {dt} F(t)
= frac {1} {S(t)} frac {d} {dt} [1 – S(t)]
= frac {1} {S(t)} frac {d} {dt} S(t)
= – frac {d} {dt} ln S(t)$$

You see from definition it is unitless – survival function is just a probability, and pdf is the derivative of CDF.
Not sure about your last question. Hazard function is often used to in time modelling of survival analysis. Inherently there is nothing prohibiting hazard function to be used in other places.