Counterfactual Analysis on DHI data: A need for Causal Perspective for Risk Assessment

 Collaborators: Prithwijit Chowdhury, Ahmad Mustafa, Mohit Prabhushankarand Ghassan AlRegib

Goal/Motivation: Causal analysis of a dataset allows us to find the exact effect a change in a desired feature-set might have on the outcome without having to consider the influence of the other features present, leading us towards better explainability [1] [2] and ultimately decision making.

Challenges: Finding association between a certain feature-set and the outcome in a high dimensional dataset for hydrocarbon detection, through straight-forward metric measurement of Neural Network (NN) performances caused by the observational data may not be a suitable approach due to the presence of unknown or unobserved correlation between the individual feature-set themselves. This may lead to unaccounted influences during the calculation of the association metrics. 

High Level Description of the Work: 

Confidence score:

While doing ML classification based tasks on counterfactual based high dimensional datasets, we need to ensure the custom counterfactual that is generated is a valid datapoint close to the actual observational manifold. For this we need a confidence score to give us a form of distance metric for us to know how much of an outlier our new generated counterfactual is.

Uncertainty Metric:

Prediction Uncertainty refers to the variability in y associated with input uncertainty and is characterized by the prediction probability distribution fy . The model structure m(.) is assumed known and fixed, which is the usual case, and so the probability distribution is conditioned on m(.).

Beyond Correlation: Towards Causality [3]

We propose analyzing feature data for prospect risk assessment from a causal attribution perspective. In the interventionist definition of causality, we say that an event A causes another event B if we observe a difference in B’s value after changing A, keeping everything else constant.

References:

  1. A. Mustafa and G. AlRegib, "Explainable Machine Learning for Hydrocarbon Prospect Risking," in International Meeting for Applied Geoscience & Energy (IMAGE), Houston, TX, Aug. 28-Sept. 1 2022.[PDF]

  2. AlRegib, Ghassan, and Mohit Prabhushankar. "Explanatory Paradigms in Neural Networks." arXiv preprint arXiv:2202.11838 (2022).[PDF][CODE]

  3. Sharma, Amit and Emre Kıcıman. “DoWhy: An End-to-End Library for Causal Inference.” ArXiv abs/2011.04216 (2020): n. Pag. [PDF][CODE]