Gradient MC and Semi-gradient n-step TD directly parallel their tabular counterparts. However, function approximation techniques also allow us to introduce something new, TD($\lambda$). This method is a bit more subtle than those we have already met, and so requires a bit more explanation.
MC and n-step TD methods takes a forward-view: values of states are updated by looking ahead to the values of future states. This works well, but means that the target is not actually available until $n$ steps into the future. This means that the updates are not equally distributed in time: our first update is delayed by $n$ steps from the start of an episode, and this leaves us $n$ updates to catch up after the episode has finished. Additionally, in order to perform each update we must maintain a store of the last $n$ feature vectors (and rewards), which comes with computational cost.
TD($\lambda$) converts these forward view methods into backward-view versions. The mechanism for this is a short-term memory vector, the eligibility trace $\mathbf{z}_t \in \mathbb{R}^d$, that parallels the long-term weight vector $\mathbf{w} \in \mathbb{R}^d$, keeping track of which components of $\mathbf{w}$ have contributed to recent state valuations.
$$\begin{align*} &\mathbf{z}_{-1} = \mathbf{0} \\
&\mathbf{z}_t = \gamma \lambda \mathbf{z}_{t-1} + \nabla \hat{v}(S_t, \mathbf{w}_t), \quad 0 \leq t \leq T,\end{align*}$$
where $\gamma$ is the usual discount rate, $\lambda \in [0, 1]$, and the product of these two quantities defines the concept of 'recent' in the description above.
The trace indicates which components of $\mathbf{w}$ deserve most credit for the error at the current state, where error is defined by the moment-by-moment one-step TD error, $\delta_t = G_{t:t+1} - \hat{v}(S_t, \mathbf{w}_t)$. Components of $\mathbf{w}$ that have contributed most recently, or most frequently to preceding state valuations are assigned the most credit, and are said to be the most 'eligible' for an update. Concretely, each component of $\mathbf{w}$ is updated in proportion to the scalar TD error and it's corresponding component in the vector eligibility trace.
$$\mathbf{w}_{t+1} = \mathbf{w}_t + \alpha \delta_t \mathbf{z}_t. $$
In forward view n-step TD we can imagine the agent moving through state space with a telescope, looking ahead to future rewards and states in order to evaluate the currently visited state. In the backward view TD($\lambda$), we can imagine the same agent moving through state space with a blindfold, computing the TD error at each state visited, and shouting this back to previously visited states.
When $\lambda = 0$ the trace is exactly the state-value gradient for $S_t$, and the update for $\mathbf{w}$ reduces to the one-step semi-gradient TD update. When $\lambda = 1$ the trace decays only according to $\gamma$ and, although this is not as trivial to see, the update reduces to the MC Gradient update. Intermediate values of $\lambda$ represent intermediate levels of bootstrapping between these two extremes, just as intermediate values of $n$ represent intermediate levels of bootstrapping in the n-step TD algorithm.
In fact it can be shown that the TD($\lambda$) is exactly equivalent to n-step TD, but with the n-step return $G_{t:t+n}$ replaced by a compound return composed of a weighted average of $\textit{all}$ the n-step returns, each weighted proportional to
$\lambda^{n-1}$ (the same $\lambda$ as above) and normalised by a factor of $1-\lambda$ to ensure that the weights sum to $1$. This is called the $\lambda$-return, $G_t^\lambda$.
$$ G_t^\lambda = (1 - \lambda) \sum_{n=1}^\infty \lambda^{n-1}G_{t:t+n}. $$
This equivalence can be seen easily for the cases $\lambda = 0$ and $\lambda = 1$. In the former case $G_t^\lambda$ reduces to $G_{t:t+1}$, the one-step TD return, and in the latter it reduces to $G_t$, the complete MC return, which is consistent with the behaviour of TD($\lambda$) above.
The advantages of eligibilty traces over n-step methods are clear. Updates are now performed continually and uniformly in time rather than being delayed $n$ steps and then catching up at the end of the episode (note that for the $\lambda$-return we would actually have to wait all the way until the end of the episode to make the first update since only then is the longest of its component returns available). This means that learning can occur and affect behaviour immediately after a state is encountered rather than being delayed $n$ steps. Additionally, now only a single trace vector needs storing rather than the last n feature vectors.
The formulation we have used for the eligibility trace is called the accumulating trace: each time a state is visited the corresponding components of $\mathbf{z}$ are bumped up, decaying away between visits. Another kind of trace is called the replacing trace, which resets the components back to $1$ on each state visit rather than growing them any further (i.e. the frequency with which components of $\mathbf{w}$ have contributed to recent state valuations is dropped as a measure of eligibility).
Of course the concept of eligibility traces works for action-value function approximation methods too. As usual all that is required is to swap $\hat{v}$s for $\hat{q}$s in the discussion above. The resulting methods are called Sarsa($\lambda$) methods.