Practical Active Learning for Seismic Interpretation

 Collaborators: Ahmad Mustafa and Ghassan AlRegib


Goal/Motivation: To (i) devise a novel application of the popular machine learning framework called active learning (AL) to seismic interpretation and (ii) rethink the typical AL formulation to understand the various unique challenges it faces in the second domain

Challenges: While there is a plethora of literature on active learning as applied to natural images for classification tasks, there are not many works dealing with application of AL applied to dense prediction tasks like image segmentation. Another issue that makes it challenging to translate AL strategies for the said purpose is the lack of image-level labels that form a core component of much of AL work done in traditional image classification settings.

High Level Description of the Work: We propose in this work a novel active learning methodology based on learning reconstruction manifolds with Deep Autoencoders for seismic interpretation. Autoencoders refer to a family of learning models that are trained to reconstruct their inputs. They are designed so that they are only able to reconstruct data sampled from the training distribution, preventing them from regressing to a simple identity mapping. As a useful by-product, they are able to learn the manifold structure of high dimensional data  [1, 2]. [3] utilize such a learned manifold for the task of anomaly detection on image datasets by thresholding the distribution of reconstruction error-based scores on input training examples. 

We view the active learning paradigm as the challenge to identify new training samples most dissimilar from the manifold learnt from preexisting training samples. These training samples are also the ones likely to add the most information to the learning model about the dataset that it already doesn't have with preexisting training samples. But there is a caveat: we are majorly interested in supervised, discriminative tasks like classification, segmentation etc. In real life, we would not have ground truth labels for these tasks ahead of time. We could however, make decisions about informative training samples based on the reconstruction manifolds learnt via deep autoencoders. To strengthen the link between the learned manifolds for reconstruction and supervised tasks like segmentation, we present a network architecture that simultaneously learns the representations for the two tasks in a joint learning framework. This way, there is a stronger guarantee that informative training samples identified for the reconstruction task would also apply to the supervised task. We show later that indeed turns out to be the case. 

 We train an encoder-decoder architecture simultaneously for reconstruction and seismic facies segmentation using the same feature representations within the training phase. In the inference phase, all unlabeled seismic sections/images are scanned and the one with the highest reconstruction error is sampled, labelled, and added to the training dataset for retraining the network for the next cycle (see figure above). The underlying assumption---based on the shared representation learning framework---is that seismic sections with the highest reconstruction errors are also going to be the ones the network would have performed more poorly on at segmentation. Identifying such training examples would lead to an improved generalization over the whole seismic volume compared to if a similar number of training sections had been sampled randomly or in some other arbitrary fashion. We verify this hypothesis by comparing the proposed work to a baseline data sampling technique in the seismic interpretation domain, as demonstrated in the figure below. 

References

  1. Martinez-Murcia, Francisco J., et al. "Studying the manifold structure of Alzheimer's disease: a deep learning approach using convolutional autoencoders." IEEE journal of biomedical and health informatics 24.1 (2019): 17-26.

  2. Mustafa, Ahmad, and Ghassan AlRegib. "Man-recon: manifold learning for reconstruction with deep autoencoder for smart seismic interpretation." 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 2021. [PDF]

  3. Kwon, Gukyeong, et al. "Backpropagated gradient representations for anomaly detection." European Conference on Computer Vision. Springer, Cham, 2020. [PDF] [Code]

  4. Alaudah, Yazeed, Patrycja Michałowicz, Motaz Alfarraj, and Ghassan AlRegib. "A machine-learning benchmark for facies classification." Interpretation 7, no. 3 (2019): SE175-SE187. [PDF][Code]