Multimodal Learning: Improved Representation Learning in Multimodal VAEs

Date:

  • Innovated a multimodal VAE by introducing a soft constraint to a data-dependent mixture-of-experts prior inspired by VampPrior and comparable to contrastive learning yet adopting a generative model framework
  • Enhanced the quality of latent representations and conditional generations, while maintaining the integrity of original uncompressed features by effectively guiding modalities toward a shared latent space
  • Validated the performance on benchmark datasets and a neuroscience case study, showing improvements in downstream task accuracy (~25+pp increase) and data imputation capabilities over conventional methods