Unpaired image-to-image translation refers to learning inter-image-domain mapping without corresponding image pairs. Existing methods learn deterministic mappings without explicitly modelling the robustness to outliers or predictive uncertainty, leading to performance degradation when encountering unseen perturbations at test time. To address this, we propose a novel probabilistic method based on Uncertainty-aware Generalized Adaptive Cycle Consistency (UGAC), which models the per-pixel residual by generalized Gaussian distribution, capable of modelling heavy-tailed distributions. We compare our model with a wide variety of state-of-the-art methods on various challenging tasks including unpaired image translation of natural images, using standard datasets, spanning autonomous driving, maps, facades, and also in medical imaging domain consisting of MRI. Experimental results demonstrate that our method exhibits stronger robustness towards unseen perturbations in test data. Code is released here: https://github.com/ExplainableML/UncertaintyAwareCycleConsistency.
Long Summary
Translating an image from a distribution, i.e. source domain, to an image in another distribution, i.e.
target domain, with a distribution shift is an ill-posed problem as a unique deterministic one-to-one
mapping may not exist between the two domains. Furthermore, since the correspondence between
inter-domain samples may be missing, their joint-distribution needs to be inferred from a set of
marginal distributions. However, as infinitely many joint distributions can be decomposed into a fixed
set of marginal distributions, the problem is ill-posed in the absence of additional constraints.
Image translation approaches often learn a deterministic mapping between
the domains where every pixel in the input domain is mapped to a fixed pixel value in the output
domain. However, such a deterministic formulation can lead to mode collapse while at the same time
not being able to quantify the model predictive uncertainty important for critical applications, e.g.,
medical image analysis. We propose an unpaired probabilistic image-to-image
translation method trained without inter-domain correspondence in an end-to-end manner. The
probabilistic nature of this method provides uncertainty estimates for the predictions. Moreover,
modelling the residuals between the predictions and the ground-truth with heavy-tailed distributions
makes our model robust to outliers and various unseen data.
Notations
Let there be two image domains A and B. Let the set of images from domain A and B be defined by
(i) SA:={a1,a2...an}, where ai∼PA∀i
and
(ii) SB:={b1,b2...bm}, where bi∼PB∀i, respectively.
The elements ai and bi represent the ith image from domain A and B respectively, and are drawn from an underlying unknown probability distribution PA and PB respectively.
Let each image have K pixels, and uik represent the kth pixel of a particular image ui.
We are interested in learning a mapping from domain A to B (A→B) and B to A (B→A) in an unpaired manner so that the correspondence between the samples from PA and PB is not required at the learning stage.
In other words, we want to learn the underlying joint distribution PAB from the given marginal distributions PA and PB.
This work utilizes CycleGANs that leverage the cycle consistency to learn mappings from both directions (A→B and B→A).
Cycle Consistency and its interpretation as Maximum Likelihood Estimation (MLE)
CycleGAN enforces an additional structure on the joint distribution using a set of primary networks (forming a GAN) and a set of auxiliary networks. The primary networks are represented by {GA(⋅;θAG),DA(⋅;θAD)}, where GA represents a generator and DA represents a discriminator. The auxiliary networks are represented by {GB(⋅;θBG),DB(⋅;θBD)}.
While the primary networks learn the mapping A→B, the auxiliary networks learn B→A.
Let the output of the generator GA translating samples from domain A (say ai) to domain B be called b^i. Similarly, for the generator GB translating samples from domain B (say bi) to domain A be called a^i, i.e.,
b^i=GA(ai;θAG) and a^i=GB(bi;θBG).
To simplify the notation, we will omit writing parameters of the networks in the equation.
The cycle consistency constraint re-translates the above predictions (b^i,a^i) to get back the reconstruction in the original domain (aˉi,bˉi), where,
aˉi=GB(b^i) and bˉi=GA(a^i),
and attempts to make reconstructed images (aˉi,bˉi) similar to original input (ai,bi) by penalizing the residuals with L1 norm between the reconstructions and the original input images, giving the cycle consistency,
The underlying assumption when penalizing with the L1 norm is that the residual at \textit{every pixel} between the reconstruction and the input follow \textit{zero-mean and fixed-variance Laplace} distribution, i.e.,
aˉij=aij+ϵija and bˉij=bij+ϵijb with,
ϵija,ϵijb∼Laplace(ϵ;0,2σ)≡2σ21e−2σ∣ϵ−0∣,
where σ2 represents the fixed-variance of the distribution, aij represents the jth pixel in image ai, and ϵija represents the noise in the jth pixel for the estimated image aˉij.
This assumption on the residuals between the reconstruction and the input enforces the likelihood (i.e., L(Θ∣X)=P(X∣Θ), where Θ:=θAG∪θBG∪θAD∪θBD and X:=SA∪SB) to follow a factored Laplace distribution:
where minimizing the negative-log-likelihood yields Lcyc with the following limitations.
The residuals in the presence of outliers may not follow the Laplace distribution but instead a heavy-tailed distribution, whereas the i.i.d assumption leads to fixed variance distributions for the residuals that do not allow modelling of heteroscedasticity to aid in uncertainty estimation.
Building Uncertainty-aware Cycle Consistency
We propose to alleviate the mentioned issues by modelling the underlying per-pixel residual distribution as independent but non-identically distributed zero-mean generalized Gaussian distribution} (GGD), i.e., with no fixed shape (β>0) and scale (α>0) parameters. Instead, all the shape and scale parameters of the distributions are predicted from the networks and formulated as follows:
For each ϵij, the parameters of the distribution {αˉij,βˉij} may not be the same as parameters for other ϵiks; therefore, they are non-identically distributed allowing modelling with heavier tail distributions.
The likelihood for our proposed model is,
where (βˉija) represents the jth pixel of domain A's shape parameter βia (similarly for others). G(βˉiju,αˉiju,uˉij,uij) is the pixel-likelihood at jth pixel of image ui (that can represent images of both domain A and B) formulated as,
G(βˉiju,αˉiju,uˉij,uij)=GGD(uij;uˉij,αˉiju,βˉiju).
minimizing the negative-log-likelihood yields a new cycle consistency loss, which we call as the uncertainty-aware generalized adaptive cycle consistency loss Lucyc, given A={aˉi,αˉia,βˉia,ai} and B={bˉi,αˉib,βˉib,bi},
Lucyc(A,B)=Lαβ(A)+Lαβ(B),
where Lαβ(A)=Lαβ(aˉi,αˉia,βˉia,ai) is the new objective function corresponding to domain A,
where (aˉi,bˉi) are the reconstructions for (ai,bi) and (αˉia,βˉia),(αˉib,βˉib) are scale and shape parameters for the reconstruction (aˉi,bˉi), respectively.
The L1 norm-based cycle consistency Lcyc is a special case of Lucyc with
(αˉija,βˉija,αˉijb,βˉijb)=(1,1,1,1)∀i,j.
To utilize Lucyc, one must have the α maps and the β maps for the reconstructions of the inputs.
To obtain the reconstructed image, α (scale map), and β (shape map), we modify the head of the generators (the last few convolutional layers) and split them into three heads,
connected to a common backbone.
Once we train the model, for every input image, the model will provide the scale (α) and the shape (β) maps that can be used to obtain the aleatoric uncertainty given by,
σaleatoric2=Γ(β1)α2Γ(β3)
To see the resulting uncertainty maps along with our perturbation analysis of the trained model please check Section 4 of the paper.