Articles

A Review of Causal Identifiability Techniques across Different Observational Datasets

We present an aggregation of the causal identifiability solutions techniques and their assumptions as advanced in extant literatures with datasets of odd origins, which do not necessarily conform to the independent and identically distributed (i.i.d) dataset, multinomial datasets and the Gaussian datasets settings; alongside their concomitant assumptions. The transformation process in data generation can sometimes be a desideratum of datasets of the following forms: linear and non-Gaussian, nonlinear & non-Gaussian, datasets with missing values, datasets tainted with selection biases, datasets with whose variables forms cycles, datasets with heterogeneous/nonstationary variables, datasets with confounding or latent variables, time-series datasets, deterministic datasets, etc. The study begins proper in section 2 after the introduction with the basic background into the concept of causality with observational data. The concept of graph as an embodiment of the background knowledge with structural causal model (SCM) is explicated in section 3; followed by the basic assumptions employed especially with common observational data settings in section 4. An exposition into the categorization of the algorithms used in causality is presented in section 4. Section 5 aggregates and expounds the causal identifiability techniques and their associated assumptions athwart varying datasets; which is the crux of the study and a recapitulation of same is presented in table 1. This study’s main contribution is to present an aggregate review of the causal techniques and their assumptions across different data settings especially in data settings of odd origins, as reviews such as this are grossly lacking in extant literatures.