Approaches and Methods for Causal Analysis of Panel Data in the Area of Morbidity and Mortality

We aim to give an overview of the state of the art of causal analysis of demographic issues related to morbidity and mortality. We will systematically introduce strategies to identify causal mechanisms, which are inherently linked to panel data from observational surveys and population registers. We will focus on health and mortality, and on the issues of unobserved heterogeneity and reverse causation between health and (1) retirement, (2) socio-economic status, and (3) characteristics of partnership and fertility history. The boundaries between demographic research on mortality and morbidity and the neighbouring disciplines epidemiology, public health and economy are often blurred. We will highlight the specifi c contribution of demography by reviewing methods used in the demographic literature. We classify these methods according to important criteria, such as a design-based versus model-based approach and control for unobserved confounders. We present examples from the literature for each of the methods and discuss the assumptions and the advantages and disadvantages of the methods for the identifi cation of causal effects in demographic morbidity and mortality research. The differentiation between methods that control for unobserved confounders and those that do not reveal a fundamental difference between (1) methods that try to emulate a randomised experiment and have higher internal validity and (2) methods that attempt to achieve conditional independence by including all relevant factors in the model. The latter usually have higher external validity and require more assumptions and prior knowledge of relevant factors and their relationships. It is impossible to provide a general defi nition of the sort of validity that is more important, as there is always a trade-off between generalising the results to the population of interest and avoiding biases in the estimation of causal effects in the sample. We hope that our review will aid researchers in identifying strategies to answer their specifi c research question.


Introduction
Researchers in population studies have long dealt with descriptive analyses (Smith 2009) and ever more refi ned methods to use aggregate data for developing life tables, estimating vital rates and describing population dynamics. While causality has not been at the centre of demographic research (Smith 2009) for quite some time, the demographic community exploring mortality and morbidity has embraced a variety of causal study designs and methods. With the greater availability of panel studies and registers, demographers also began to ask causal questions (Engelhardt et al. 2009;Moffi tt 2005) and took up questions and methods from other disciplines, such as economics or epidemiology, that have a long tradition in causal analysis. However, there is not one specifi c approach to causal analysis, neither in demography nor in the other disciplines. As Moffi tt (2005: 92) pointed out: to conclude about causality "synthesis and reconciliation studies that are based on a variety of different approaches" are needed.
In this article we take up this argument and discuss demographic studies with causal approaches in the fi eld of mortality and morbidity. First, we briefl y describe the most common methods and designs focused on causal relationships in the social sciences. We use the general term "treatment", which may refl ect a binary treatment, such as retirement or divorce, but may also be a continuous treatment, such as socio-economic status (SES) or number of children. Then we use the methods described as search terms in a structured literature review based on the leading demographic journal "Demography". The aim of such a review is to explore how prominent causal research features in a journal which is generally assumed to display the core research in the discipline of demography, and which methods and data were used to control for unobserved heterogeneity or selection. We have limited our search to journals that explicitly have "Demography" or "Population Study" in their title, but we must bear in mind that demographers also frequently publish in epidemiological journals, as the boundaries between demographic research on health and mortality and neighbouring disciplines are often blurred. While it is conceivable that epidemiologic journals are preferred because of higher impact factors and faster peer review, it may also be that demographers fi nd their research appropriate for an epidemiologic rather than a demographic community when the issue is causality, and vice versa when methodological issues are involved. Finally, we give an overview of causal approaches in three important topics of research on morbidity and mortality by describing studies about the dual relationship between health outcomes and (1) retirement, (2) SES, and (3) characteristics of partnership and fertility history.
Refl ecting on our choice of the three research areas in demography, it is worth mentioning that we preferred non-recursive causal research questions that deal with the direction of causality, i.e. the dual relation between health/mortality and some other factor where health selection plays a major role in addition to unobserved heterogeneity. We consider these to be very clear and very diffi cult causal problems, ideal to illustrate the use of different causal methods. At the same time, there are many demographic questions of recursive causality that are of no less importance, e.g. the causal effect of a policy on health, where reverse causality is not an issue. In principle, the same fundamental problems apply to both types of causality, but in recursive causal questions the alternative to direct causality is rather indirect causality or no causality, while in non-recursive causality one tends to focus on the direction of causality, which does not mean that the alternatives "indirect causality" and "no causality" do not exist here either. We close with a discussion of our fi ndings.

2
Causal methods in the social sciences

Fixed effects (FE) and random effects models (RE)
Both models assume individual specifi c effects and try to control for omitted variable bias due to unobserved heterogeneity. In this defi nition, "individual" can be (1) an individual or (2) a population segment, such as social group, it may be (3) kinships, mothers, grandmothers, twins, etc., or an aggregate group measure, e.g. in an (4) area or cohort analysis (Moffi tt 2005(Moffi tt , 2009). The random effects assumption is that the individual-specifi c unobserved effects are uncorrelated with the independent variables. The fi xed effects model does not need the random effects assumption. Therefore, fi xed effects estimates are consistent even if the random effects assumption does not hold. Random effects estimates will be biased in this situation. However, if the random effects assumption holds, both models will provide consistent estimates (if the "strict exogeneity assumption" holds), but the random effects estimator is more effi cient (Brüderl/Ludwig 2015;Wooldridge 2002).
The fi rst difference removes the time-invariant heterogeneity in the individuallevel FE model; in the case of area-or group-level FE models, this is done by subtracting the group average over time. The estimator is also called the withinvariance estimator and provides the average effect in the subgroup of the population that received the treatment. The latter often raises the criticism of extrapolation from the treatment group to the total population (Cameron/Trivedi 2005;Nerlove 2005).
It is common to combine FE and RE in one model specifi cation with prior testing of whether the assumptions are fulfi lled (Baltagi 2008;Hsiao 2014). There is also the "between-within estimator" (also known as the Mundlak estimator), which combines the advantages of random and fi xed effect estimators (Schunck 2013). As far as we know, it has not yet been used in demography.

Instrumental variables
The instrumental variable approach addresses the problem of unobserved confounding due to selection; it is less concerned with the bias that individual unobservables introduce into the estimation process, but rather with the problem that the treatment effect might be biased by non-random selection into the treatment group. An instrumental variable needs to be mean-independent of the unobservables directly infl uencing the outcome. In addition, it must be relevant, and thus be correlated with the probability of receiving the treatment (Moffi tt 2009). These relationships may be specifi ed in two equations: fi rst the equation estimating the relationship between the instrumental variable and the treatment, and the second, estimating the relationship between the predicted treatment (from the fi rst equation) and the outcome. Often these equations are combined into one, integrating the second equation into to fi rst, which is then called the reduced form. There are several types of instrumental variables and Moffi tt (2005,2009) distinguishes between 1. cross-sectional ecological variables, such as differences in policies, laws and social structure, which are independent of an individual's own choice 2. population-segment fi xed effects instruments, where the segments are defi ned by social or demographic groups and the instruments pertain to groups, such as welfare reforms for certain income or demographic groups (e.g. single mothers) 3. siblings and related instruments, where the instrument is the deviation of each individuals' treatment from the average group-specifi c treatment, e.g. one sibling has children, the other does not.
4. natural experiments, which are often defi ned as a residual category of instruments which appear to be random, such as the month of birth, the birth of (naturally conceived) twins, etc.
Combining the instrumental variable approach with FE panel analysis is considered the superior approach to simultaneously control for selection and for estimation bias by individual unobserved heterogeneity (Moffi tt 2009). Another prominent type of instrumental variable approach is Mendelian Randomisation, where the instrument is genes (Mills et al. 2020).

Regression discontinuity
Regression discontinuity can be used when individuals or other units are assigned to a programme or policy depending on a cut-off point of a continuous measure, e.g. eligibility according to age (Ludwig/Miller 2007). While other sources of external variation are (unexpected) historical events, in regression discontinuity designs, it is a known (deterministic or probabilistic, i.e. "fuzzy") institutional rule to assign or withhold treatment that creates the relevant variation (Gangl 2010); see Imbens and Lemieux (2008) for a review. The basic idea is that -depending on the relationship between the assignment variable and the outcome -the exposure at the cut-off point is as good as random. Thus, a comparison of the outcome of those just below and just above the cut-off point provides an estimate of the effect of the programme or policy (Hu et al. 2017).

Difference-in-differences
Difference-in-differences is similar to regression discontinuity, but improves it by adding a control group. This method compares the change in the outcome for an exposed group before and after an event (e.g. the implementation of a policy) to the change in the outcome over the same time period for a non-exposed group (Athey/Imbens 2006; Jones/Rice 2011). The two groups may have different levels of the outcome before the policy, thus confounding by unobserved time-constant factors that differ between the exposed and unexposed is taken into account (Hu et al. 2017). Under the assumption that both groups follow a common trend, the difference in the change in the outcome between the exposed and non-exposed groups can be interpreted as an effect of the event or the policy (Harper et al. 2014).

Interrupted time series
Where time series data are available and there is a clear-cut event at a specifi c point in time, interrupted time series analysis can be used to estimate the effect of this event. This event can be a shock to individuals or change in an institution, programme or policy. Repeated-measures logistic regression analysis can be used to detect any sudden change in the level of the outcome (in regression terms: a change of intercept) or a more sustained change in the trend of the outcome (a change of slope) around the time of the event. The analysis estimates the causal effect by comparing the outcomes before and after the event. Interrupted time series is different from a difference-in-differences analysis because it does not use a control group. When randomised controlled trials are not feasible, a time series design is an alternative to estimate the effect of events (Fretheim et al. 2013) because it controls for prior trends before the event and studies the dynamics of change afterwards (Burdorf 2012;Taljaard et al. 2014;Wagner et al. 2002). However, it is problematic in situations with lagged-effects or other events affecting the outcome at the same time.

Growth curve model
The main focus of a (latent) growth curve model (GC) is on changes or development over time. This requires the subjects to be followed over time with repeated measures of each variable of interest. The goal of this model is to make inferences about the features of growth trajectories, i.e. the initial levels of outcome measures and their rate of change. In GCs, the changes are represented by growth parameters or trajectories (which are specifi ed as latent variables): the intercept, the initial value of the outcome measure, and the slope, which indicates how much the curve grows or the rate of outcome changes over time . In terms of causal analysis, the model can estimate the causal effect of the initial level on the rate of change. GCs assume that the subject's growth trajectories vary randomly around the overall mean of growth trajectories (Bollen/Curran 2006; Wang/Wang 2012).

Propensity scores
The propensity score was defi ned by Rosenbaum and Rubin (1983) to be the probability of treatment assignment conditional on observed baseline covariates. In randomised experiments, the true propensity score is known and is defi ned by the study design. In observational studies, the true propensity score is not, in general, known. However, it can be estimated using the study data. In practice, the propensity score is most often estimated using a logistic regression model in which treatment status is regressed on observed baseline characteristics. The estimated propensity score is the predicted probability of treatment derived from the fi tted regression model and conditional on the propensity score; the distribution of observed baseline covariates will be similar between treated and untreated subjects. Propensity scores control for observable characteristics at baseline in the hope that they also control for correlated unobservable characteristics.
Four different propensity score methods are used to remove the effects of confounding when estimating the effects of treatment on outcomes 1. Propensity score matching forms matched sets of treated and untreated subjects who share a similar value of the propensity score (Rosenbaum/ Rubin 1983) and is used to estimate the average treatment effect among the treated (ATT) (Imbens 2004).
2. Stratifi cation by the propensity score groups individuals into mutually exclusive subsets based on their estimated propensity score, often into fi ve equal-size groups using the quintiles of the estimated propensity score. Within each stratum, the effect of treatment on outcomes can be estimated and, using meta-analysis, the mean treatment-effect for the study population can be estimated.
3. Inverse probability of treatment weighting (IPTW) using the propensity score uses weights based on the propensity score to create a synthetic sample in which the distribution of measured baseline covariates is independent of treatment assignment (Morgan/Todd 2008).
4. Covariate adjustment using the propensity score regresses the outcome variable on an indicator variable denoting treatment status and the estimated propensity score.
More and more often, propensity scores are being estimated using machine learning methods (Brand et al. 2019).

Structural equations
A structural equation model (SEM) is a multivariate regression model that extends standard regression by allowing multiple outcomes, known as "endogenous" variables and unobserved "latent" variables. For each endogenous variable there is a corresponding regression equation, which can depend on other endogenous variables, as well as on exogenous variables. Exogenous variables are the predictor variables that are not determined by any other variable in the model. A SEM combines the approach of confi rmatory factor analysis for the measurement model and path analysis for the structural model. The measurement model describes how well the observed indicator variables measure the underlying latent variable, whereas the structural model describes the causal relationship among the variables. This combination is the core advantage of SEM: together, they can simultaneously take into account random measurement errors, the multiple dependent variables of the model, and estimate direct, indirect and total effects (Acock 2013; Bollen 1989; Wang/Wang 2012).

G-computation
Similar to structural equations, G-computation is a new and fl exible approach to causal analysis that is especially useful for mediation analysis. The parametric g-formula, originally developed by biostatisticians (Robins/Hernán 2009), has recently been used in demographic studies on interdependent life course processes with time-varying confounding (Bijlsma/Wilson 2020). The method fi rst runs multivariable regression models to estimate interrelationships between variables, based on the assumptions of a specifi ed Directed Acyclic Graph (DAG). Then it uses these fi tted regression models to perform a series of micro-simulations and compare counterfactual scenarios to estimate causal effects. The g-formula approach assumes that all relevant variables are observed. But it is possible to explore the risk of unmeasured confounding by sensitivity analyses (Carnegie et al. 2016;Lin et al. 2017;VanderWeele 2015). The g-formula approach can be summarised as a series of three analytical phases (Bijlsma/Wilson 2020): 1. Causal diagram: A DAG is created to describe the causal interrelationships to be studied. It describes the assumed relationships between the observed variables, including time-varying effects, confounders, mediators and outcomes. 2. Estimation: Based on the DAG, a series of multivariable regression models are estimated using observed data. These models can take any parametric (functional) form, and the number of models will depend on the number of variables involved. 3. Simulation: With the estimated parameters from the models, a series of causal processes can be simulated.

Causal methods used in demographic research published in the journal "Demography"
We searched the journal "Demography" using the online search tool provided by the publisher Springer. Publications from 2010 until 13 May 2020 were included and the following search was applied (("Unobserved heterogeneity" OR "Random effects" OR "Regression Discontinuity" OR "Interrupted time series" OR "G-Computation" OR "Structural Equation" OR "Growth Curves" OR "Instrumental variables" OR "Propensity Score" OR "difference-in-differences" OR "fi xed effects") AND ("health" OR "morbidity" OR "mortality" OR "death") AND ("panel" OR "longitudinal")).
After the screening of titles and method sections, this search produced 37 publications (Table 1). The most common causal method was fi xed effects models (23) with individual FE, but also kinship FE (mothers, grandmothers, siblings, twins) and region/country/cohort FE. The latter were most often used with aggregate data and thus did not use individual panel data. FE models were sometimes combined with other design features, such as random effects (RE), the difference-in-differences approach, propensity score matching/regression, or growth curve models. Both register data and panel data were used, and panel and registers were often linked to accommodate the various designs. The second most common method was (latent) growth curve models (9), followed by difference-in-differences designs (4), instrumental variables (3), propensity scores (3), and random effects models (3). A second search in the journal "Population Studies" confi rmed the overall picture with a strong reliance on FE models with sibling designs based on register data (not shown).

Topic 1: The dual relationship between retirement and health
The transition to retirement is an important step in the life course and an important study subject in demography. This is because retirement policies are a widely used instrument to address demographic change and the harmonisation of labour market demands, individual preferences, health development and public fi nances is a core problem of public policy. The empirical analysis of the effects between health and retirement is complex because the retirement process depends on health, and retirement has an effect on subsequent health (Oksanen/Virtanen 2012;Radó/Boissonneault 2018). It is relatively straightforward to hypothesise and to fi nd empirical evidence that poor health leads to earlier retirement (van Rijn et al. 2014). Thus, we concentrate on the question as to how retirement affects health, which is the more complicated question, because health is a confounder that infl uences both the retirement transition and health after retirement. Available theory suggests two possible effects: On the one hand, leaving an active role on the labour market can lead to a form of crisis (Atchley 1975); on the other hand, retirement can be a relief from work-related burdens and stress (Westerlund et al. 2009). Given these theoretical assumptions it is not surprising that the related empirical fi ndings are inconsistent: Some studies show a positive infl uence of retirement on health (Eibich 2015;Insler 2014;Jokela et al. 2010;Westerlund et al. 2009), while others fi nd a negative effect (e.g. Behncke 2012; Stenholm et al. 2014). A systematic review (Shim et al. 2013) shows a positive correlation between retirement and mortality, but points out that the problem of selection (i.e. poor health not only being a consequence but also a cause of retirement) is only insuffi ciently dealt with. Two other systematic reviews suggest that the effect of health can be different between occupational status groups (effect heterogeneity) (Schaap et al. 2018;van der Heide et al. 2013).
In the following we describe the four main causal methods that have been applied to longitudinal data to explore the effect of retirement on health. a) Fixed effects models (FE): as mentioned above, FE models use multiple measurements per individual to take account of unobserved constant confounders. However, the main confounder in our setting is prior health infl uencing both retirement and health after retirement, and this confounder might not be constant and not be accounted for by FE. Using data from the Health and Retirement Study (HRS) Calvo et al. (2013) combine fi xed and random effects with an instrumental variable approach and fi nd that early retirements decrease health.
b) Several studies used the statutory retirement age as an instrumental variable (IV) that infl uences retirement without affecting health. For example, Hessel c) The regression discontinuity design is similar to the IV approach in that it also uses the statutory retirement age, e.g. different age cut-offs in different countries. Coe andZamarro (2011) use SHARE andEibich (2015) uses data from the German Socio-Economic Panel (SOEP). Both fi nd that retirement improves health. Giesecke (2019) uses administrative data from the German federal pension insurance and fi nds substantial effect heterogeneity between people with different pension types and different life time earnings; poorer people experience a mortality decrease while richer people a mortality increase. Also with this method, it is possible that only local causal effects are measured (among those who are just below and above the age threshold) that may not be generalisable to the whole population. d) Finally, it is possible to study the effects of retirement on health with an interrupted time series analysis that compares the health trend before and after retirement. Schuring et al. (2015) study health trends before and after transitions into early retirement with data from the European Community Household Panel (ECHP). They conclude that lower educated people are more likely selected into early retirement because of poor health, while higher educated people who retire early experience a health decline after retirement.

Topic 2: The dual relationship between socio-economic status (SES) and health
The well-known association between SES and health can be explained by two opposed causal mechanisms: social causation, by which SES infl uences health, and health selection, by which health infl uences SES (Goldman 2001). For a complete account of causal models explaining health inequalities, one needs to add indirect selection, which means that a common underlying factor, such as genes, cognitive ability, family factors, personality, or general life-style orientations, infl uences both SES and health, so that there is no causal effect between SES and health. It is important to note that SES in itself is a highly debated concept. We use it here as a simplifying label for all studies that look at the relation between any aspect of SES (the most important being education, occupation and income) and health. But it is an open question whether there is a latent variable, such as the "socio-economic status" of a person, which can be measured and approximated by education, occupation or income as indicators, or whether SES is too imprecise and meaningful (causal) relationships can only be established between specifi c resources, such as education, income, or those related to occupational status or occupational class. On the one hand, specifi c variables (e.g. income) more convincingly fulfi l the requirement that any causal interpretation should be based on theoretical knowledge of the pathways and mechanisms. On the other hand, such an analytical approach to social variables can become unrealistic because social variables do not work in isolation. In this regard, it is possible to study the effect of lottery wins on health (Lindahl 2005): while this isolated and randomised effect of income is closer to the causal "rules" of an experiment, the assumption is that it is far different from the mechanisms that create the health gradient between income groups in the general population.
While the association between SES and health is generally accepted, and probably also the assumption that all three models are real to some extent (Smith 1999), which of them actually contributes most to the social gradient in health is highly debated . The highest level of agreement can probably be reached for the effect of education on health, partly because it has been studied with a variety of approaches, such as natural experiments on school reforms or compulsory schooling law (see below) and twin studies (Madsen et al. 2010). However, for the association between income and health there is still disagreement on the direction of causation between different studies and disciplines. While countless studies by social epidemiologists and medical sociologists implicitly or explicitly assume that income has an effect on health, e.g. Galama and van Kippersluis (2018) believe that the losses of income and wealth as a consequence of poor health is the most dominant causal relation between health and dimensions of SES. In the following we present a selection of approaches, studies and the respective results which, we believe, have contributed to this ongoing discussion. They also illustrate the methods that can be used for this and similar questions exploiting panel data for causal analysis.

a) Two examples shall be presented with fi xed effects models: fi rst, Foverskov
and Holm (2016) is one of the few studies that address all three causal models in the same study. They analyse the British Household Panel Survey and fi nd no support for social causation, and limited support for health selection. Their conclusion is that indirect selection may be the most important mechanism. This conclusion is questionable because persons aged 30 to 60 are observed for 5 years and everything before age 30 is defi ned as a common background factor. Second, a quite different application of a fi xed effects model to a natural experiment was done by Frijters et al. (2005), who only look at a causal effect of changes in income on health after German reunifi cation using the German Socio-Economic Panel. They fi nd a statistically signifi cant but small effect.
b) In the same vein as the study above, and again refl ecting the great interest of economists in this question, a large number of studies analyse the effects between material wealth and health with instrumental variables. It is interesting to see how in the following three studies sources of external variation have been exploited to create causal evidence. Michaud and Soest (2008) use inheritances and fi nd no evidence that wealth affects health, but strong evidence of effects from both spouses' health on household wealth in the Health and Retirement Study. Lindahl (2005) uses the Swedish Level of Living Survey and lottery prizes as an exogenous source of variation in income and fi nds causal effects from income to health. Finally, in several publications based on the Health and Retirement Study and the Panel Study of Income Dynamics, Smith (e.g. 2004) uses stock market changes as an instrument for changes in income and concludes that income as such has no effect on health, but the socio-economic status does, namely through education. As conceded by Smith himself, it is questionable whether the instrument used in his study or by the other authors above represent the causal effects of material wealth in the general population.   (2019) use this framework to compare the effects between health and education, occupation, and income (see discussion about SES above). A crucial and untestable assumption of this approach is that all relevant confounders are taken into account so that treatment assignment and outcome are conditionally independent. There are more methods within the wide defi nition of structural equation models that have been used to study effects between SES and health, e.g. path analysis (Palloni et al. 2009) andG-computation (Bijlsma et al. 2017).
Finally, it is worth mentioning that also other methods from our overview above have been used for causal analysis between SES and health, e.g. propensity score matching, often combined with difference-in-differences, but these are more focused on the effect of policies on an SES-or health-related outcome.

Topic 3: The dual relationship between partnership, fertility history, and health/mortality
The relationship between characteristics of family formation and health has been a long-standing issue in demography. Following the recent review by Hank and Steinbach (2018) we differentiate between the aspects of partnership and fertility history; both are confronted with the underlying question as to whether certain characteristics of these biographies have protective or detrimental effects on laterlife health, or whether any empirical relations are caused by selection forces into or out of partnership, and parenthood. While these two aspects are often explored separately, more frequently there are also attempts to jointly model their effects on health (see below, e.g. O'Flaherty et al. 2016).

Partnership
Over a long period, the difference in health between single, married, divorced and widowed individuals was at the centre of research. It has been repeatedly suggested that marriage has a protective effect on health due to economic and social benefi ts, as well as benefi cial lifestyle choices particularly among men. As Hank and Steinbach, however, point out in their review, selection into marriage may be related to better health in the fi rst place, by affecting an individual's chance of getting married. On the other hand, marital disruption and divorce appears to have detrimental effects both in the short and long-term, even after remarriage. And the effect of divorce differs according to marriage satisfaction and by gender. Women's health appears to suffer more after divorce than men's (Monden/Uunk 2013).
a) The following example demonstrates how a fi xed effects model, to control for unobserved confounding, together with propensity score matching, to control for observed confounding, was used to explore the effect of divorce on men's health. Using the Survey of Income and Program Participation (SIPP) panel study matched to US social security administrative data, Couch et al. (2015) followed continuously married men for 20 years and compared them with their counterparts experiencing divorce. In addition, they fi ltered the data for previous health issues to help control for health selection into divorce and fi nd that among those not re-marrying, divorce increases men's longterm probability of both self-reported work limitations and federal disability benefi t receipt. They attribute the negative health impact to the lack of marital resources.
b) The following two studies used growth curve models to explore marriage separation in the context of Chinese migration, and of marriage disruption in Australia. Based on the "China Health and Nutrition Survey", Chen et al. (2015) investigated health trajectories of left-behind rural individuals whose spouses migrated for work. Their linear growth curve models take into account that individuals start with different levels of self-rated physical health and that each individual could experience a different rate of change across age dependent on their marital status and the residence of the spouse. The time varying-covariates allow that each individual is taken as its own control to account for within-and between-individual unobserved heterogeneity.
Their results point towards a clear health disadvantage of married individuals whose spouses are absent compared with those whose spouses are living in the same household. Longer absence led to worse health and health defi cits were stronger for men than women. The second article incorporates the whole family life cycle from age 18 to 50 exploring both fertility and partnership histories (O'Flaherty et al. 2016). Using the Household, Income and Labour Dynamics in Australia (HILDA) survey they used a twofold analysis strategy. First, applying sequence analysis, they grouped individuals with similar fertility and partnership histories and used this categorisation as a primary independent variable for the second stage, where they applied growth curve models to establish the relationship with later-life health. They found gendered results with a stronger link of long-term family trajectories and health among men than women. Early or no family formation, marital disruption or high fertility was particularly harmful for men, and high fertility levels with a disrupted marital biography for women.

Fertility history
Turning to the second aspect, Hank and Steinbach (2018) in their review provided an extensive overview of the different fertility characteristics, such as parity, childlessness, birth spacing/intervals etc., that have been explored. They discussed both biological and social factors which may cause the relationship between (late) life health and fertility characteristics among both genders, stressing the lack of knowledge about the relative importance of these factors. The biological factors include diseases such as breast cancer, as well as other cancers of the female reproductive system, which were shown to be associated with pregnancy, childbirth and lactation; among the social factors, differences in socio-economic status, social relationships, and health behaviours across the life-course have been suggested. At the same time, health selection into certain fertility characteristics may play an important role, producing biased estimates if not accounted for (Doblhammer/ Oeppen 2003).
a) Fixed effects siblings design approaches are common with register data due to the availability of family links in the data and large sample sizes. Einiö et al. (2016) applied such a design, which is described in more detail below, when addressing the research by Barclay andKolk (2015, 2018), and explored the relationship between the number of children and later-life mortality among Finns. They confi rmed earlier fi ndings, which did not use causal modelling techniques, that all-cause mortality relative to those with two children is highest among childless women followed by women with one child and could also extend these fi ndings to men. They concluded that living conditions in adulthood contributed to the association between the number of children and mortality to a greater extent than childhood background, and chronic conditions contributed to the excess mortality of the childless, probably revealing health selection into childlessness. These designs suffer from the problem of extrapolation because they can only use sibships and thus exclude childless individuals and those with one child only. In addition, they need heterogeneity in the outcome, thus sibships with all individuals still alive are also excluded. The results of the study, however, were reproducible for the total population without a sibling design. When studying mortality by cause of death, sample size issues are common in sibling designs despite the large number of observations in register studies. Although the sibling comparison design with sibling fi xed effects analysis is very common in demography, concerns about the causal interpretation of the fi ndings have been raised.
As far as we know, these concerns have not been widely acknowledged in demography: e.g. problems related to precision and bias of estimates (Gilman/Loucks 2014), the violation of the assumption that the exposure and outcome of an individual do not affect the exposure and outcome of his/her siblings ("sibling carryover") (Sjölander et al. 2016), or overcontrolling by the indiscriminate control for confounders, mediators and colliders (Sjölander/ Zetterqvist 2017).
b) Recent studies extended the topic to siblings and explored the effect of sibship size, birth order, and birth intervals between siblings on their later life health. Using the Swedish multigenerational population register data, Barclay and Kolk (2015) applied fi xed-effect discrete-time survival analysis using a within-family comparison of siblings with the same biological mother-father pairing. The fi xed effects were applied at the level of siblings to adjust for all factors that remained constant within the sibling group (e.g. sibling size) as well as factors diffi cult to observe and measure, such as shared socioeconomic background and general parenting style. Exploiting within-family variation in mortality by cause of death, they showed that the relative effect of birth order was greater among sisters than among brothers, particularly for mortality attributable to cancers of the respiratory system and to external causes. Social pathways only mediated the relationship between birth order and mortality risk in adulthood to a limited degree. In a second article with the Swedish multi-generational register data linked to the Swedish military conscription register and applying the same modelling strategy, Barclay and Kolk (2018) concluded that birth intervals had little effect on long-term health outcomes of brothers.
c) To explore the relationship between the level and rate of change in cognitive functioning and associations with fertility history, Read and Grundy (2017) used ELSA and applied growth curve models with random effects to capture individual differences, and fi xed effects to estimate the average growth of the entire sample. The results showed associations between the number and timing of births and cognitive functioning in older age, in particular adverse effects of high parity, early childbearing and low parity which appeared to refl ect underlying socio-economic and health disparities.

Discussion
Our empirical account of causal methods applied to longitudinal observational data in the fi eld of demography rests on two parts: fi rst, we searched the causal methods used in the studies in the leading demographic journal "Demography" using specifi c keywords. Fixed effects models accounting for individual or family invariant characteristics in combination with other methods were the leading approach, followed by growth curve models. These methods were applied to both panel and register data, sometimes the two were linked but also FE area/cohort models with aggregate trend data were used. Second, we used our own knowledge and experience to analyse three important areas of research in demography and described the contributions of prominent methods to the evidence base in these fi elds and to the concrete research questions. The fi rst approach is more systematic and superfi cial, while the second is admittedly more arbitrary, but allows a deeper insight into how a variety of methods have been used to answer core demographic questions. These methods can complement each other, but can also produce confl icting evidence. Overall, fi xed effects models seem to be the most prominent causal approach in demographic analysis of health and mortality. FE models eliminate stable unmeasured factors related to invariant characteristics. For individuals, this works well for assessing treatment effects of variant characteristics, but excludes the analysis of invariant characteristics, such as completed family size, completed parity, or birth order. In these cases, individual FE models are not possible because all invariant characteristics are cancelled out of the equation. Researchers therefore try to control for invariant unobserved characteristics, such as genes, upbringing, living circumstances, values and norms, by using FE for higher level entities, such as siblings, mothers, or grandmothers. Register data, in particular, lend themselves to within-family comparisons using siblings-designs with FE since observations of this nature are seldom available in panel studies. The FE approach with individual fi xed effects is more naturally tailored for the study of SES, partnership status, or retirement processes on health since these characteristics are largely time variant. It does not work for the study of (long-term) effects of (completed) fertility characteristics and other characteristics related to early life because there is either little or no variation in these features over the life course. Growth curve models are an extension of FE models and are the second most common approach in the journal "Demography". These models are often combined with FE approaches. In contrast to FE models, they permit estimates of the effect of observables on the outcome at baseline as well as on the growth rate.
In our literature review, a couple of studies used instrumental variables to identify recursive causality by exploring pension eligibility, political climate, differences in infant mortality between black and white, the decline in child labour, or the decline in family size and their relation to health. Apart from unobserved heterogeneity and reverse causation, the hierarchical structure of data presents an additional source of bias, and multi-level modelling is often applied to address this bias. Usually regional macro-data are combined with individual-level data to differentiate between contextual and individual-level determinants of mortality and morbidity. Two of the many examples for Germany are a study of individual area-level effects on mortality on the basis of data from the German pension fund (Kibele 2014), and individual and contextual determinants of health among ethnic German immigrants (Kreft/Doblhammer 2012).
In the following, we want to discuss the categorisation of causal methods into design-based versus model-based approaches (Koch/Gillings 2006) and a similar categorisation into methods that address unobserved confounders versus methods that deal with observed confounders. These two categorisations are largely overlapping in the sense that the conditioning on (time-constant) unobservables is generally design-based (fi xed effects, instrumental variables, regression discontinuity, and difference-in-differences), while model-based methods address observable confounders (interrupted time series, growth curve models, propensity scores, structural equation models, and G-computation). The fi rst group originates in econometrics, while the second originates in biostatistics and epidemiology. However, this distinction can be fuzzy, e.g. the g-formula can use fi xed effects intercepts and difference-in-differences approaches will sometimes use additional covariates in order to strengthen the parallel trend assumption. Twin studies, which is arguably a design, use twin-fi xed effects, with fi xed effects generally being a model with no need for a special design (in the sense that it exploits external variation) other than the longitudinal design. Finally, it is diffi cult to decide whether interrupted time series is a design-based approach (because it is a quasi-experimental method that needs a natural experiment situation) or a model-based approach (because specifi c modelling approaches have been invented to deal with them, e.g. ARIMA). It is easier to claim that interrupted time series cannot deal with unobserved confounders. We would argue that all studies have a model and a design, and they deal or do not deal with unobserved confounders, therefore we give preference to the categorisation into unobservable vs observable confounders.
Methods conditioning on unobservables generally have higher internal validity, while methods conditioning on observables have higher external validity. However, this external validity depends on the sample and whether it is a random sample of the population of interest. The trade-off between these two validities when assessing different causal methods has long been discussed (Moffi tt 2005; Smith 2013). Disciplinary traditions and changes over time infl uence which of these two aspects of scientifi c inquiry is deemed more important. Moffi tt (2005) mentions the "danger in maximising internal validity at the expense of external validity" and Smith (2013) advocates a balance of randomisation, representation and realism as integrated aspects of a meaningful causal analysis. Since not all researchers across disciplines are open to all methods, this calls for an intensifi ed mix and comparison of methods which can yield the highest synergies in interdisciplinary cooperation, e.g. of demographers, sociologists and (social) epidemiologists in research on health and mortality.
What is the special contribution of demographers to the fi eld of causality in health research? One may say: (1) While demographers, like other social scientists, use many different longitudinal surveys they also have a special preference for the use of register data, sometimes linked with panel data and the exploitation of designs that uniquely fi t to register data, such as kinship designs.
(2) As demographers are used to work on the total population level, they tend to explore causal relationships in health on the total population level rather than for specifi c groups in specifi c situations. Thus, they often search for high external validity which may come with reduced internal validity and may weaken the claim of causality.
(3) Compared to epidemiologists, they use general health outcomes (in addition to mortality) and explore factors of health that may have a number of different causal pathways, such as the relationship between fertility history, SES, or marital status and health. Both the general health outcome and the multi-faceted infl uence factors certainly complicate the search for causality and often the relative importance of these factors cannot be determined. This also calls for the interdisciplinary cooperation mentioned above because results for more specifi c and isolated variables need to complement results from more holistic concepts.
(4) There is a long demographic tradition of conducting ecological studies with panel data which, although not suitable for exploring causal mechanisms, provide important information on associations between socio-economic factors and health and mortality. Complex decomposition methods have been developed to distinguish the effects of age structure from those of disease/mortality rates on regional or temporal differences in health/mortality. One could also argue that this focus on ecological study designs and decomposition has delayed the adoption of causal analysis once individual panel data became available.
We close this chapter with a citation of Paul W. Holland (2003: 9-10), who reminds us that it is important to establish methods and their rules for identifying causal effects, but that adherence to these rules alone does not justify the ranking of empirical research into better or worse: "Being able to assert that the association is based on a causal connection is, in many circumstances, merely a status symbol, one that confers importance to the fi nding without any consequence for improved public health […] it is the use of an association for important purposes that has enduring value and not its status as a causal variable."