Multimodal Learning: Improved Representation Learning in Multimodal VAEs

Date: February, 2024

Innovated a multimodal VAE by introducing a soft constraint to a data-dependent mixture-of-experts prior inspired by VampPrior and comparable to contrastive learning yet adopting a generative model framework
Enhanced the quality of latent representations and conditional generations, while maintaining the integrity of original uncompressed features by effectively guiding modalities toward a shared latent space
Validated the performance on benchmark datasets and a neuroscience case study, showing improvements in downstream task accuracy (~25+pp increase) and data imputation capabilities over conventional methods