多模态大语言模型与框架分析:理论、方法与实践Multimodal Large Language Models and Frame Analysis: Theory, Methods, and Practice
程萧潇,杜依淇
摘要(Abstract):
近年来,多模态大语言模型的快速发展为人工智能领域带来了重大突破,其在跨模态理解与生成方面的卓越能力备受关注。随着视觉传播时代的来临和多模态媒介现实的深化,传统单一模态的框架分析方法在处理多模态内容时面临诸多挑战。本研究系统梳理了框架理论从单模态向多模态演进的知识脉络,深入剖析了既有多模态框架分析方法的局限性。在此基础上,本研究提出了一套基于多模态大语言模型的框架分析方法。该方法根植于对多模态框架元素进行“拆解—重组—析出”的思路,整合了文本、图像及图文跨模态互动关系分析,并充分发挥了多模态大语言模型的技术优势。通过对气候变化多模态新闻框架的实证研究,本研究验证了所提出方法取径的有效性。研究发现,多模态大语言模型在不同模态框架元素的识别和推理任务中具有良好的表现。本研究在一定程度上推进了多模态框架分析的方法论创新。
关键词(KeyWords): 多模态;大语言模型;框架分析;媒介框架;视觉传播
基金项目(Foundation): 2025年度浙江省自然科学基金“中国气候政策跨国扩散的结构、机理与效果研究”(LQN25G030005);; 2023年度国家社会科学基金“中国气候议题对外传播效果与提升策略研究”(23CXW034)的阶段性研究成果
作者(Author): 程萧潇,杜依淇
参考文献(References):
- 陈昌凤、张舒媛(2024):视觉优势?生成式人工智能应用于传播的模态偏向问题,《新闻与写作》,第10期,5-14页。
- 程萧潇、吴栎骞(2024):生成式人工智能在内容分析中的应用及测量效度评估,《全球传媒学刊》,第2期第11卷,51-78页。
- 陈露、张思拓、俞凯(2023):跨模态语言大模型:进展及展望,《中国科学基金》,第37卷第5期,776-785页。
- 董媛媛、王鑫(2020):恐怖袭击事件报道中新闻图片视觉框架构建与视觉语法解析——以《人民日报》为例,《中国出版》,第6期,32-36页。
- 盖伊·塔克曼(2022).做新闻:现实的社会建构[M].李红涛,译.北京:中国人民大学出版社。
- 官璐、周葆华(2022):计算机视觉技术在新闻传播研究中的应用,《当代传播》,第3期,20-26页。
- 黄阳坤、苏思妮、高远(2024):作为刻板印象“容器”的生成式智能:从“像化”框架到传播后效——基于文-图生成模型的混合研究,《新闻记者》,第6期,61-82页。
- 胡翼青、姚文苑(2023):作为背景的“框架”:媒介研究视角下的框架理论再诠释,《新闻界》,第7期,44-54页。
- 刘静、郭龙腾(2023):GPT-4对多模态大模型在多模态理解、生成、交互上的启发,《中国科学基金》,第37卷第5期,793-802页。
- 刘涛(2022):视觉框架分析:图像研究的框架视角及其理论范式,《新闻大学》,第3期,1-21页。
- 王超群(2019):情感激发与意象表达:新媒体事件图像传播的受众视觉框架研究,《国际新闻界》,第10期第41卷,75-99页。
- 王雪晔(2019):图像与情感:情感动员实践中的图像框架及其视觉修辞分析,《南京社会科学》,第5期,121-127页。
- 王亚珅、方勇、江昊、曾园园、白然(2024):2023年生成式人工智能技术主要发展动向分析,《无人系统技术》,第7卷第2期,101-112页。
- 肖伟(2010):论欧文·戈夫曼的框架思想,《国际新闻界》,第12期第32卷,30-36页。
- 易妍、宋宝婧(2024):从视觉文本到超文本:数字时代美国总统的形象记忆塑造,《中国网络传播研究》,第2期,212-236,263页。
- 张雯、张睿婕(2024):人工智能对新闻业形象的想象与期待——基于文生图模型生成图像的诠释分析,《新闻记者》,第11期,49-64页。
- 周葆华、吴雨晴(2024):超越单一模态:多模态计算传播研究的进展与前瞻,《传媒观察》,第1期,16-27页。
- 周勇、黄雅兰(2012):从图像到舆论:网络传播中的视觉形象建构与意义生成,《国际新闻界》,第34卷第9期,82-90页。
- Anne DiFrancesco,D.& Young,N.(2011).Seeing climate change:The visual construction of global warming in Canadian national print media.Cultural Geographies,18(4),517-536.doi:10.1177/1474474010382072.
- Bock,M.A.(2020).Theorising visual framing:Contingency,materiality and ideology.Visual Studies,35(1),1-12.doi:10.1080/1472586X.2020.1715244.
- Brennen,J.S.,Simon,F.M.& Nielsen,R.K.(2021).Beyond (Mis)representation:Visuals in COVID-19 misinformation.The International Journal of Press/Politics,26(1),277-299.doi:10.1177/1940161220964780.
- Chaiken,S.(1980).Heuristic versus systematic information processing and the use of source versus message cues in persuasion.Journal of Personality and Social Psychology,39(5),752-766.doi:10.1037/0022-3514.39.5.752.
- Coleman,R.(2010).Framing the Pictures in Our Heads:Exploring the Framing and Agenda-Setting Effects of Visual Images.In D'Angelo,P.& Kuypers,J.A.(Eds.),Doing News Framing Analysis:Empirical and Theoretical Perspectives (pp.233-261).London:Routledge.
- Dan,V.(2018).Integrative Framing Analysis:Framing Health Through Words and Visuals.London:Routledge.
- Dan,V.& Ren,C.B.(2021).Understanding variations in the framing of people living with HIV:A mixed-methods study of photos in Chinese news.Journalism & Mass Communication Quarterly,98(1),200-220.doi:10.1177/1077699020984762.
- Downs,D.(2002).Representing gun owners.Frame identification as social responsibility in news media discourse.Written Communication,19,44-75.
- Ehmer,E.A.& Kothari,A.(2018).Coverage of Burmese refugees in Indiana news media:An analysis of textual and visual frames.Journalism,19(11),1552-1569.doi:10.1177/1464884916671896.
- Entman,R.M.(1993).Framing:Toward Clarification of a Franctured Paradigm,Journal of Communication,43(4),51-58.
- Forcha,D.E.(2021).Visual framing of the Cameroon Anglophone crisis in newspapers.Communicatio,47(2),20-43.doi:10.1080/02500167.2020.1857808.
- Garcia,M.& Stark,P.(1991).Eyes on the News.St Petersburg,FL:Poynter Institute for Media Studies.
- Geise,S.& Baden,C.(2015).Putting the image back into the frame:Modeling the linkage between visual communication and frame-processing theory.Communication Theory,25(1),46-69.doi:10.1111/comt.12048.
- Geise,S.(2017).Visual framing.In R?ssler,P.,Hoffner,C.A.& Zoonen,L.(Eds.),The International Encyclopedia of Media Effects (pp.1-12).Chichester:Wiley.doi:10.1002/9781118783764.wbieme0120.
- Geise,S.& Xu,Y.(2024).Effects of visual framing in multimodal media environments:A systematic review of studies between 1979 and 2023.Journalism & Mass Communication Quarterly,doi:10.1177/10776990241257586.
- Gitlin,T.(1980).The Whole World Is Watching:Mass Media in the Making and (Un)making of the New Left,Berkeley:University of California Press.
- Goffman,E.(1974).Frame Analysis:An Essay on the Organization of Experience.Boston,MA.:Northeastern University Press.
- Greenwood,K.& Jenkins,J.(2015).Visual framing of the Syrian conflict in news and public affairs magazines.Journalism Studies,16(2),207-227.doi:10.1080/1461670X.2013.865969.
- Hameleers,M.,Powell,T.E.,Van Der Meer,T.G.L.A.& Bos,L.(2020).A picture paints a thousand lies?The effects and mechanisms of multimodal disinformation and rebuttals disseminated via social media.Political Communication,37(2),281-301.doi:10.1080/10584609.2019.1674979.
- Jewitt,C.2009.An Introduction to Multimodality.In The Routledge Handbook of Multimodal Analysis,Jewitt,C.(Ed),14-27.London:Routledge.
- Jungblut,M.& Zakareviciute,I.(2019).Do pictures tell a different story?A multimodal frame analysis of the 2014 Israel-Gaza conflict.Journalism Practice,13(2),206-228.doi:10.1080/17512786.2017.1412804.
- Kress,G.& van Leeuven,T.(2006).Reading images:The grammar of visual design (2nd edition).London:Routledge.
- Lee,S.Y.,Lim,J.R.& Shi,D.L.(2024).Visually framing disasters:Humanitarian aid organizations' use of visuals on social media.Journalism & Mass Communication Quarterly,101(3),749-773.doi:10.1177/10776990221081046.
- Lyu,H.,Huang,J.,Zhang,D.,Yu,Y.,Mou,X.,Pan,J.,… & Luo,J.(2025).Gpt-4v (ision) as a social media analysis engine.ACM Transactions on Intelligent Systems and Technology,16(3),1-54.
- MacKenzie,D.2006.An engine,not a camera:How financial models shape markets.Cambridge:MIT Press.
- Martinec,R.(2005).A system for image-text relations in new (and old) media.Visual Communication,4(3),337-371.doi:10.1177/1470357205055928.
- Matthes,J.,Kohring,M.(2008).The content analysis of media frames:Toward improving reliability and validity.Journal of Communication,58(2),258-279,doi:10.1111/j.1460-2466.2008.00384.x
- Matthes,J.(2009).What's in a frame?A content analysis of media framing studies in the world's leading communication journals,1990—2005.Journalism & Mass Communication Quarterly,86(2),349-367.https://doi.org/10.1177/107769900908600206
- Messaris,P.& Abraham,L.(2001).The role of images in framing news stories.In Reese,S.D.,Gandy,O.H.& Grant,A.E.(Eds.),Framing public life (pp.215-226).Mahwah,NJ:Erlbaum.
- Mitchell,W.T.(1994).Picture theory:Essays on verbal and visual representation.Chicago:University of Chicago Press.
- Molder,A.L.,Lakind,A.,Clemmons,Z.E.& Chen,K.P.(2022).Framing the global youth climate movement:A qualitative content analysis of Greta Thunberg's moral,hopeful,and motivational framing on Instagram.The International Journal of Press/Politics,27(3),668-695.doi:10.1177/19401612211055691.
- O'Neill,S.,Williams,H.T.P.,Kurz,T.,Wiersma,B.& Boykoff,M.(2015).Dominant frames in legacy and social media coverage of the IPCC Fifth Assessment Report.Nature Climate Change,5(4),380-385.doi:10.1038/nclimate2535.
- Peng,Y.L.,Lock,I.& Ali Salah,A.(2024).Automated visual analysis for the study of social media effects:Opportunities,approaches,and challenges.Communication Methods and Measures,18(2),163-185.doi:10.1080/19312458.2023.2277956.
- Powell,T.E.,Boomgaarden,H.G.,De Swert,K.& De Vreese,C.H.(2015).A clearer picture:The contribution of visuals and text to framing effects.Journal of Communication,65(6),997-1017.doi:10.1111/jcom.12184.
- Powell,T.E.,Boomgaarden,H.G.,De Swert,K.& De Vreese,C.H.(2019).Framing fast and slow:A dual processing account of multimodal framing effects.Media Psychology,22(4),572-600.doi:10.1080/15213269.2018.1476891.
- Rafiee,A.,Spooren,W.& Sanders,J.(2023).Framing similar issues differently:A cross-cultural discourse analysis of news images.Social Semiotics,33(3),515-538.doi:10.1080/10350330.2021.1900719.
- Rebich-Hespanha,S.,Rice,R.E.,Montello,D.R.,Retzloff,S.,Tien,S.& Hespanha,J.P.(2015).Image themes and frames in US print news stories about climate change.Environmental Communication,9(4),491-519.doi:10.1080/17524032.2014.983534.
- Rodriguez,L.& Dimitrova,D.V.(2011).The levels of visual framing.Journal of Visual Literacy,30(1),48-65.doi:10.1080/23796529.2011.11674684.
- Russmann,U.& Svensson,J.2017.Introduction to Visual Communication in the Age of Social Media:Conceptual,Theoretical and Methodological Challenges.Media and Communication,5(4):1-5.
- Scheufele,D.A.(1999):Framing as a theory of media effects,Journal of Communication,49(1),103-122.doi:10.1111/j.1460-2466.1999.tb02784.x
- Schneider,W.,Dumais,S.T.& Shiffrin,R.M.(1984).Automatic and control processing and attention.In R.Parasuraman & D.R.Davies (Eds.),Varieties of attention (pp.1-25).Orlando,FL:Academic.
- Sun,L.,Wei,M.,Sun,Y.,Suh,Y.J.,Shen,L.& Yang,S.(2024).Smiling women pitching down:Auditing representational and presentational gender biases in image-generative AI.Journal of Computer-Mediated Communication,29(1),zmad045.
- Thomas,R.J.& Thomson,T.J.(2025).What does a journalist look like?Visualizing journalistic roles through AI.Digital Journalism,13(4),631-653.doi:10.1080/21670811.2023.2229883.
- Wessler,H.,Wozniak,A.,Hofer,L.& Lück,J.(2016).Global multimodal news frames on climate change:A comparison of five democracies around the world.The International Journal of Press/Politics,21(4),423-445.doi:10.1177/1940161216661848.
- Wiedicke,A.,Reifegerste,D.,Temmann,L.J.& Scherr,S.(2022).Verbal and visual framing of responsibility for type 1 diabetes by patient influencers on Instagram.Social Media + Society,8(4).doi:10.1177/20563051221136114.
- Xu,Y.& L?ffelholz,M.(2021).Multimodal framing of Germany's national image:Comparing news on Twitter (USA) and Weibo (China).Journalism Studies,22(16),2256-2278.doi:10.1080/1461670X.2021.1994445.
- Xu,Y.,Yu,J.Y.& L?ffelholz,M.(2024).Portraying the pandemic:Analysis of textual-visual frames in German news coverage of COVID-19 on Twitter.Journalism Practice,18(4),858-878.doi:10.1080/17512786.2022.2058063.
- Zhai,Y.X.,Tong,S.B.,Li,X.,Cai,M.,Qu,Q.,Lee,Y.J.& Ma,Y.(2023).Investigating the catastrophic forgetting in multimodal large language models.arXiv preprint arXiv:2309.10313.
- Zhang,H.Y.& Wei,Y.H.(2024).Visual frames in promotional video:A semiotic analysis of What is Peppa?Semiotica,2024(257),177-201.doi:10.1515/sem-2022-0006.