| Issue |
Int. J. Simul. Multidisci. Des. Optim.
Volume 16, 2025
Multi-modal Information Learning and Analytics on Cross-Media Data Integration
|
|
|---|---|---|
| Article Number | 12 | |
| Number of page(s) | 16 | |
| DOI | https://doi.org/10.1051/smdo/2025016 | |
| Published online | 17 September 2025 | |
Research Article
Modeling analysis of perceived differences in product appearance design based on visual communication
Academy of Art and Design, Jingdezhen Ceramic University, Jingdezhen 333000, Jiangxi, China
* e-mail: 041014@jcu.edu.cn
Received:
16
June
2025
Accepted:
18
August
2025
Aiming at the problems of reliance on subjective evaluation and imperfect index systems in the modeling of perceived differences in product appearance design, this study proposes a quantitative analysis method of improving the Contrastive Language-Image Pre-training (CLIP) model. Based on the theoretical framework of visual semiotics, the visual communication mechanism is analyzed from the dimensions of morphological layer (physical structure of color and form), semantic layer (symbolic meaning of color symbolism and morphological metaphor), and pragmatic layer (user cognitive logic and cultural background). First, the channel attention mechanism is introduced to construct a joint embedding space of color, form, and semantics. The geometric features are extracted from the morphological grammar. The color and texture combinations with strong emotional relevance are screened based on the design semantics theory. The key design elements are deconstructed from a theoretical level to enhance the ability to align visual and textual features. Secondly, the CLIP model is improved to extract fine-grained visual-text embedding, and a 12-dimensional perceptual difference matrix is generated through cosine similarity. A dual-path feature interaction module is designed: the visual path uses Grad-CAM++ to locate key morphological areas, and the semantic path improves the text style transfer capability through adversarial training, and uses GRU to achieve cross-modal dynamic fusion. The gradient weighted class activation mapping technology is further combined to visualize the model's attention area, and the contribution weights of color, texture, shape and other factors to the perceptual difference are quantified through Shapley value decomposition. Experiments show that this method reduces the error in cross-cultural perceptual prediction, the top-5 accuracy of cross-modal matching is as high as 0.85, and the median MSE of perceptual difference quantification is reduced to 0.13, which effectively solves the problems of insufficient objective quantification and a lack of an indicator system.
Key words: Product appearance design / cross-modal feature fusion / perceptual difference quantification / channel attention mechanism / dynamic temperature coefficient optimization
© J. Liu, Published by EDP Sciences, 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.
