Multimedia

Authors and titles for recent submissions

Mon, 3 Jun 2024
Fri, 31 May 2024
Thu, 30 May 2024
Wed, 29 May 2024
Tue, 28 May 2024

[ total of 24 entries: 1-24 ]
[ showing up to 25 entries per page: fewer | more ]

Mon, 3 Jun 2024

[1] arXiv:2405.20775 (cross-list from cs.CR) [pdf, other]: Title: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Authors: Xijie Huang, Xinyuan Wang, Hantao Zhang, Jiawen Xi, Jingkun An, Hao Wang, Chengwei Pan

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[2] arXiv:2405.20687 (cross-list from cs.CV) [pdf, other]: Title: Conditioning GAN Without Training Dataset

Authors: Kidist Amde Mekonnen

Comments: 5 pages, 2 figures, Part of my MSc project course, School Project Course 2022

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[3] arXiv:2405.20675 (cross-list from cs.CV) [pdf, other]: Title: Adv-KD: Adversarial Knowledge Distillation for Faster Diffusion Sampling

Authors: Kidist Amde Mekonnen, Nicola Dall'Asen, Paolo Rota

Comments: 7 pages, 11 figures, ELLIS Doctoral Symposium 2023 in Helsinki, Finland

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[4] arXiv:2405.20606 (cross-list from cs.CV) [pdf, other]: Title: Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning

Authors: Yang Chen, Tian He, Junfeng Fu, Ling Wang, Jingcai Guo, Hong Cheng

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)

Fri, 31 May 2024

[5] arXiv:2405.20078 [pdf, ps, other]: Title: NeRF View Synthesis: Subjective Quality Assessment and Objective Metrics Evaluation

Authors: Pedro Martin, Antonio Rodrigues, Joao Ascenso, Maria Paula Queluz

Subjects: Multimedia (cs.MM)
[6] arXiv:2405.19802 [pdf, other]: Title: Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models

Authors: Shuyuan Liu, Jiawei Chen, Shouwei Ruan, Hang Su, Zhaoxia Yin

Subjects: Multimedia (cs.MM)
[7] arXiv:2405.20032 (cross-list from cs.NI) [pdf, other]: Title: Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion

Authors: Jiangkai Wu, Liming Liu, Yunpeng Tan, Junlin Hao, Xinggong Zhang

Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[8] arXiv:2405.19889 (cross-list from eess.SP) [pdf, other]: Title: Deep Joint Semantic Coding and Beamforming for Near-Space Airship-Borne Massive MIMO Network

Authors: Minghui Wu, Zhen Gao, Zhaocheng Wang, Dusit Niyato, George K. Karagiannidis, Sheng Chen

Comments: Major Revision by IEEE JSAC

Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG); Multimedia (cs.MM)

Thu, 30 May 2024

[9] arXiv:2405.19226 (cross-list from cs.CV) [pdf, other]: Title: ContextBLIP: Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions

Authors: Honglin Lin, Siyu Li, Guoshun Nan, Chaoyue Tang, Xueting Wang, Jingxin Xu, Rong Yankai, Zhili Zhou, Yutong Gao, Qimei Cui, Xiaofeng Tao

Comments: Accepted in ACL 2024 Findings

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[10] arXiv:2405.18991 (cross-list from cs.CV) [pdf, other]: Title: EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Authors: Jiaqi Xu, Xinyi Zou, Kunzhe Huang, Yunkuo Chen, Bo Liu, MengLi Cheng, Xing Shi, Jun Huang

Comments: 6 pages, 5 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[11] arXiv:2405.18959 (cross-list from cs.CV) [pdf, other]: Title: Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval

Authors: Rui Yang, Shuang Wang, Yingping Han, Yuanheng Li, Dong Zhao, Dou Quan, Yanhe Guo, Licheng Jiao

Comments: 16 pages, 9 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[12] arXiv:2405.18790 (cross-list from cs.CV) [pdf, other]: Title: Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics

Authors: Zhangkai Ni, Yue Liu, Keyan Ding, Wenhan Yang, Hanli Wang, Shiqi Wang

Comments: Accepted to IEEE Transactions on Multimedia 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[13] arXiv:2405.18726 (cross-list from cs.SD) [pdf, other]: Title: Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI

Authors: Che Liu, Changde Du, Xiaoyu Chen, Huiguang He

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Wed, 29 May 2024

[14] arXiv:2405.18386 (cross-list from cs.SD) [pdf, other]: Title: Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Authors: Yixiao Zhang, Yukara Ikemiya, Woosung Choi, Naoki Murata, Marco A. Martínez-Ramírez, Liwei Lin, Gus Xia, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon

Comments: Code and demo are available at: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[15] arXiv:2405.17842 (cross-list from cs.CV) [pdf, other]: Title: Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation

Authors: Akio Hayakawa, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[16] arXiv:2405.17730 (cross-list from cs.CV) [pdf, other]: Title: MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

Authors: Yake Wei, Di Hu

Comments: Accepted by ICML2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[17] arXiv:2405.17729 (cross-list from cs.CV) [pdf, other]: Title: Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

Authors: Rui Zhang, Shuailong Li, Junxiao Xue, Feng Lin, Qing Zhang, Xiao Ma, Xiaoran Yan

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Tue, 28 May 2024

[18] arXiv:2405.17147 [pdf, other]: Title: Large Language Models (LLMs): Deployment, Tokenomics and Sustainability

Authors: Haiwei Dong, Shuang Xie

Comments: Accepted by IEEE CTSoc-NCT

Subjects: Multimedia (cs.MM)
[19] arXiv:2405.16961 (cross-list from eess.IV) [pdf, other]: Title: Blind Data Adaptation to tackle Covariate Shift in Operational Steganalysis

Authors: Rony Abecidan (CRIStAL), Vincent Itier (IMT Nord Europe, CRIStAL), Jérémie Boulanger (CRIStAL), Patrick Bas (CRIStAL), Tomáš Pevný (CTU)

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Multimedia (cs.MM)
[20] arXiv:2405.16807 (cross-list from cs.CV) [pdf, other]: Title: Extreme Compression of Adaptive Neural Images

Authors: Leo Hoshikawa, Marcos V. Conde, Takeshi Ohashi, Atsushi Irie

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Multimedia (cs.MM)
[21] arXiv:2405.16728 (cross-list from cs.CV) [pdf, other]: Title: Towards Multi-Task Multi-Modal Models: A Video Generative Perspective

Authors: Lijun Yu

Comments: PhD thesis

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[22] arXiv:2405.16640 (cross-list from cs.AI) [pdf, other]: Title: A Survey of Multimodal Large Language Model from A Data-centric Perspective

Authors: Tianyi Bai, Hao Liang, Binwang Wan, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, Conghui He, Binhang Yuan, Wentao Zhang

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[23] arXiv:2405.16296 (cross-list from cs.CV) [pdf, other]: Title: Neural Network-Based Tracking and 3D Reconstruction of Baseball Pitch Trajectories from Single-View 2D Video

Authors: Jhen Hsieh

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM)
[24] arXiv:2405.16000 (cross-list from cs.SD) [pdf, other]: Title: Carnatic Raga Identification System using Rigorous Time-Delay Neural Network

Authors: Sanjay Natesan, Homayoon Beigi

Comments: 7 pages, 2 tables, 3 figures

Journal-ref: Recognition Technologies, Inc. Technical Report (2024), RTI-20240524-01

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Mon, 3 Jun 2024
Fri, 31 May 2024
Thu, 30 May 2024
Wed, 29 May 2024
Tue, 28 May 2024

[ total of 24 entries: 1-24 ]
[ showing up to 25 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2406, contact, help (Access key information)

> cs > cs.MM

Multimedia

Authors and titles for recent submissions

Mon, 3 Jun 2024

Fri, 31 May 2024

Thu, 30 May 2024

Wed, 29 May 2024

Tue, 28 May 2024