Multimodal Learning: Improved Representation Learning in Multimodal VAEs
Date:
- Innovated a multimodal VAE by introducing a soft constraint to a data-dependent mixture-of-experts prior inspired by VampPrior and comparable to contrastive learning yet adopting a generative model framework
- Enhanced the quality of latent representations and conditional generations, while maintaining the integrity of original uncompressed features by effectively guiding modalities toward a shared latent space
- Validated the performance on benchmark datasets and a neuroscience case study, showing improvements in downstream task accuracy (~25+pp increase) and data imputation capabilities over conventional methods