Everyday computational science

Project maintained by adolgert Hosted on GitHub Pages — Theme by mattgraham

02 Sep 2020 - adolgert

Correction for how to calculate the mean age of death.

For demography, actuarial mathematics, or global health, we often count life events that happen to people within some time span. Then we analyze the rate at which those events happen. The mean age is the average time at which events happen, given that they happen within an interval, and weighted by the number of individuals remaining.

The standard calculation for the demographic calculation of the mean age, for constant mortality, isn’t numerically stable, and this is a note on how to fix it. Rarely does the final result of work rely on an assumption that mortality is constant across an age interval, unless it chooses small age intervals, but the calculation of mean age is often used to bootstrap iterative methods or provide a bound on results.

When I say mortality rate, I mean hazard rate for events. Sometimes that’s a mortality rate. Sometimes that’s a hazard rate for giving birth.

The mean age for an interval of time that starts at time \(x\) and ends at time \(x+n\), for hazard rate \(\lambda(t)\), is the ratio of the expected value of the time over a normalization by population.

\[\begin{equation} {}_na_x = \frac{\int_x^{x+n}(a-x)\lambda(a)e^{-\int_x^a\lambda(s)ds}}{\int_x^{x+n}\lambda(a)e^{-\int_x^a\lambda(s)ds}} \end{equation}\]This mean age is the value within the time interval. When we assume that hazard is constant, \(\lambda(t)=m_x\), this simplifies considerably to

\[\begin{equation} {}_na_x = \frac{1}{m_x}- \frac{n e^{-m_x n}}{1-e^{-m_x n}} \end{equation}\]The problem is that \(m_x\) can be near or at zero for some intervals. The mean age, as \(m_x\) approaches zero, does converge to \(n / 2\) mathematically, but floating-point calculation shows lots of crazy values as \(m_x\rightarrow 0\).

The problem is worse when we evaluate draws from a distribution of events, because outliers can often be zero. I see this even for smaller national populations where mortality rates can be low for 10–15 year-olds. In particular, there is an iterative method, called the graduation method, that fails some percentage of the time (about one percent for national-level data) that calculates mortality from mortality rate using this mean age equation.

Given that the limit of the mean age, as \(m_x\rightarrow 0\), converges to \(n/2\), it should be possible to find a computation that doesn’t diverge.

\[\begin{equation} {}_na_x = \frac{n}{2}\left(1 - \coth(m_x n)) + \frac{1}{m_x n}\right) \end{equation}\]Writing the equation this way shows that we need to remove the infinity from the hyperbolic cotangent. To see this, look at the series approximation to the hyperbolic cotangent.

\[\begin{equation} \coth x = \frac{1}{x} + \frac{x}{3} - \frac{x^3}{45} + \frac{2x^5}{945}+\cdots \end{equation}\]This removal even has a name, the Langevin function.

\[\begin{equation} L(x) = \coth(x) - \frac{1}{x} \end{equation}\]We can write the mean age of death for constant mortality using the Langevin function.

\[\begin{equation} {}_na_x = \frac{n}{2}\left(1 - L(m_x n)\right) \end{equation}\]If your math library doesn’t implement the Langevin
function, then *An Atlas of Functions* has a nice
approximation, in the form of a continued fraction.

That approximation is really good, but it doesn’t hold as \(x\) approaches one, so we would use a uniform interpolation between that and the full value. Call the continued fraction \(\tilde{L}(x)\). Between a lower value \(a=1/1000\) and an upper value \(b=1/100\), interpolate.

\[\begin{eqnarray} w_1 & = & (x - a) / (b - a) \\ w_2 & = & (b - x) / (b - a) \\ L(x) & =& w_1 \tilde{L}(x) + w_2 (\coth(x) - \frac{1}{x}) \end{eqnarray}\]With the Langevin function in hand, you can throw any mortality rate at the mean age equation.

Spanier, Jerome, and Keith B. Oldham. An Atlas of Functions. New York: Hemisphere publishing corporation, 1987.