ORCID
- Amir Aly: 0000-0001-5169-0679
- Thomas Wennekers: 0000-0002-2917-8895
Abstract
Word grounding tasks aim to associate individual words withcorresponding elements in visual scenes, enabling machines to link language with perception for effective human–machine interaction. However,existing grounding models struggle to generalize to synonyms or unseenlexical variants, limiting their performance in open-domain scenarios. Inthis paper, we present a Bayesian multimodal grounding model that incorporates word embeddings as priors within a probabilistic generativeprocess to improve robustness under lexical variation. We compare theeffects of static FastText and contextual BERT embeddings on grounding accuracy by conditioning word–visual associations on their semanticrepresentations. Experiments use CLEVR-generated 3D scenes pairedwith structured compositional descriptions to test the grounding of object categories, colors, and spatial relations across lexical shifts. Resultsshow that contextual embeddings such as BERT consistently outperform static embeddings like FastText in overall grounding accuracy andin resolving spatial relations. We demonstrate that integrating structuredprobabilistic inference with rich semantic embeddings offers a principledand scalable solution for robust, interpretable word grounding.
Publication Date
2025-11-17
Event
22nd Pacific Rim International Conference on Artificial Intelligence (PRICAI 2025)
Publication Title
PRICAI Conference 2025
Publisher
Springer
Acceptance Date
2025-08-22
Deposit Date
2025-09-14
Recommended Citation
Shaukat, S., Aly, A., Wennekers, T., & Cangelosi, A. (2025) 'Evaluating Semantic Representations in Multimodal Word Grounding', PRICAI Conference 2025, . Springer: Retrieved from https://pearl.plymouth.ac.uk/secam-research/2180
