Audio and Speech Processing

Authors and titles for recent submissions

Mon, 3 Jun 2024
Fri, 31 May 2024
Thu, 30 May 2024
Wed, 29 May 2024
Tue, 28 May 2024

[ total of 41 entries: 1-41 ]
[ showing up to 46 entries per page: fewer | more ]

Mon, 3 Jun 2024

[1] arXiv:2405.21069 [pdf, other]: Title: Very Low Complexity Speech Synthesis Using Framewise Autoregressive GAN (FARGAN) with Pitch Prediction

Authors: Jean-Marc Valin, Ahmed Mustafa, Jan Büthe

Comments: 5 pages

Subjects: Audio and Speech Processing (eess.AS)
[2] arXiv:2405.20402 [pdf, other]: Title: Cross-Talk Reduction

Authors: Zhong-Qiu Wang, Anurag Kumar, Shinji Watanabe

Comments: in International Joint Conference on Artificial Intelligence (IJCAI), 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[3] arXiv:2405.20887 (cross-list from cs.SD) [pdf, other]: Title: On the Condition Monitoring of Bolted Joints through Acoustic Emission and Deep Transfer Learning: Generalization, Ordinal Loss and Super-Convergence

Authors: Emmanuel Ramasso, Rafael de O. Teloli, Romain Marcel

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[4] arXiv:2405.20884 (cross-list from cs.SD) [pdf, other]: Title: Effects of Dataset Sampling Rate for Noise Cancellation through Deep Learning

Authors: Brandon Colelough, Andrew Zheng

Comments: 16 pages, 8 pictures, 3 tables

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[5] arXiv:2405.20410 (cross-list from cs.CL) [pdf, other]: Title: SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought

Authors: Hongyu Gong, Bandhav Veluri

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Fri, 31 May 2024

[6] arXiv:2405.20064 [pdf, other]: Title: 1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem

Authors: Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu, Ziyang Ma, Peter Bell, Catherine Lai, Joshua Reiss, Lin Wang, Philip C. Woodland, Xie Chen, Huy Phan, Thomas Hain

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[7] arXiv:2405.19497 [pdf, other]: Title: Gaussian Flow Bridges for Audio Domain Transfer with Unpaired Data

Authors: Eloi Moliner, Sebastian Braun, Hannes Gamper

Comments: Submitted to IWAENC 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[8] arXiv:2405.20336 (cross-list from cs.CV) [pdf, other]: Title: RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text

Authors: Jiaben Chen, Xin Yan, Yihang Chen, Siyuan Cen, Qinwei Ma, Haoyu Zhen, Kaizhi Qian, Lie Lu, Chuang Gan

Comments: Project website: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9] arXiv:2405.20172 (cross-list from cs.SD) [pdf, other]: Title: Iterative Feature Boosting for Explainable Speech Emotion Recognition

Authors: Alaa Nfissi, Wassim Bouachir, Nizar Bouguila, Brian Mishara

Comments: Published in: 2023 International Conference on Machine Learning and Applications (ICMLA)

Journal-ref: 2023 International Conference on Machine Learning and Applications (ICMLA), Jacksonville, FL, USA, 2023, pp. 543-549

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[10] arXiv:2405.20101 (cross-list from cs.SD) [pdf, other]: Title: Fill in the Gap! Combining Self-supervised Representation Learning with Neural Audio Synthesis for Speech Inpainting

Authors: Ihab Asaad, Maxime Jacquelin, Olivier Perrotin, Laurent Girin, Thomas Hueber

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[11] arXiv:2405.20059 (cross-list from cs.SD) [pdf, other]: Title: Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation

Authors: Adam Sorrenti

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[12] arXiv:2405.19796 (cross-list from cs.SD) [pdf, other]: Title: Explainable Attribute-Based Speaker Verification

Authors: Xiaoliang Wu, Chau Luu, Peter Bell, Ajitha Rajan

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[13] arXiv:2405.19426 (cross-list from cs.CL) [pdf, other]: Title: Deep Learning for Assessment of Oral Reading Fluency

Authors: Mithilesh Vaidya, Binaya Kumar Sahoo, Preeti Rao

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:2405.19343 (cross-list from cs.SD) [pdf, other]: Title: Luganda Speech Intent Recognition for IoT Applications

Authors: Andrew Katumba, Sudi Murindanyi, John Trevor Kasule, Elvis Mugume

Comments: Presented as a conference paper at ICLR 2024/AfricaNLP

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[15] arXiv:2405.19342 (cross-list from cs.SD) [pdf, other]: Title: Sonos Voice Control Bias Assessment Dataset: A Methodology for Demographic Bias Assessment in Voice Assistants

Authors: Chloé Sekkat, Fanny Leroy, Salima Mdhaffar, Blake Perry Smith, Yannick Estève, Joseph Dureau, Alice Coucke

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Thu, 30 May 2024

[16] arXiv:2405.19041 (cross-list from cs.CL) [pdf, other]: Title: BLSP-KD: Bootstrapping Language-Speech Pre-training via Knowledge Distillation

Authors: Chen Wang, Minpeng Liao, Zhongqiang Huang, Jiajun Zhang

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17] arXiv:2405.18726 (cross-list from cs.SD) [pdf, other]: Title: Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI

Authors: Che Liu, Changde Du, Xiaoyu Chen, Huiguang He

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[18] arXiv:2405.18669 (cross-list from cs.LG) [pdf, other]: Title: Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

Authors: Vicky Zayats, Peter Chen, Melissa Ferrari, Dirk Padfield

Comments: Under review at NeurIPS

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[19] arXiv:2405.18639 (cross-list from q-bio.NC) [pdf, other]: Title: Improving Speech Decoding from ECoG with Self-Supervised Pretraining

Authors: Brian A. Yuan, Joseph G. Makin

Subjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20] arXiv:2405.18503 (cross-list from cs.SD) [pdf, other]: Title: SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Authors: Koichi Saito, Dongjun Kim, Takashi Shibuya, Chieh-Hsin Lai, Zhi Zhong, Yuhta Takida, Yuki Mitsufuji

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Wed, 29 May 2024

[21] arXiv:2405.18386 (cross-list from cs.SD) [pdf, other]: Title: Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Authors: Yixiao Zhang, Yukara Ikemiya, Woosung Choi, Naoki Murata, Marco A. Martínez-Ramírez, Liwei Lin, Gus Xia, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon

Comments: Code and demo are available at: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[22] arXiv:2405.18213 (cross-list from cs.SD) [pdf, other]: Title: NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields

Authors: Amandine Brunetto, Sascha Hornauer, Fabien Moutarde

Comments: Project Page: this https URL

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[23] arXiv:2405.18153 (cross-list from cs.SD) [pdf, other]: Title: Practical aspects for the creation of an audio dataset from field recordings with optimized labeling budget with AI-assisted strategy

Authors: Javier Naranjo-Alcazar, Jordi Grau-Haro, Ruben Ribes-Serrano, Pedro Zuccarello

Comments: Submitted to ICML 2024 Workshop on Data-Centric Machine Learning Research

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24] arXiv:2405.17927 (cross-list from cs.AI) [pdf, other]: Title: The Evolution of Multimodal Model Architectures

Authors: Shakti N. Wadekar, Abhishek Chaurasia, Aman Chadha, Eugenio Culurciello

Comments: 30 pages, 6 tables, 7 figures

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[25] arXiv:2405.17842 (cross-list from cs.CV) [pdf, other]: Title: Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation

Authors: Akio Hayakawa, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[26] arXiv:2405.17809 (cross-list from cs.CL) [pdf, other]: Title: TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Authors: Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Sheng Zhao, Michael Zeng

Comments: Work in progress

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[27] arXiv:2405.17615 (cross-list from cs.SD) [pdf, other]: Title: Listenable Maps for Zero-Shot Audio Classifiers

Authors: Francesco Paissan, Luca Della Libera, Mirco Ravanelli, Cem Subakan

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[28] arXiv:2405.17569 (cross-list from cs.LG) [pdf, other]: Title: Discriminant audio properties in deep learning based respiratory insufficiency detection in Brazilian Portuguese

Authors: Marcelo Matheus Gauy, Larissa Cristina Berti, Arnaldo Cândido Jr, Augusto Camargo Neto, Alfredo Goldman, Anna Sara Shafferman Levin, Marcus Martins, Beatriz Raposo de Medeiros, Marcelo Queiroz, Ester Cerdeira Sabino, Flaviane Romani Fernandes Svartman, Marcelo Finger

Comments: 5 pages, 2 figures, 1 table. Published in Artificial Intelligence in Medicine (AIME) 2023

Journal-ref: Artificial Intellingence in Medicine Proceedings 2023, page 271-275

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 28 May 2024

[29] arXiv:2405.17364 [pdf, other]: Title: Speech Loudness in Broadcasting and Streaming

Authors: Matteo Torcoli, Mhd Modar Halimeh, Thomas Leitz, Yannik Grewe, Michael Kratschmer, Bernhard Neugebauer, Adrian Murtaza, Harald Fuchs, Emanuël A. P. Habets

Comments: Accepted for presentation at the Audio Engineering Society (AES) 156th Convention, June 2024, Madrid, Spain

Subjects: Audio and Speech Processing (eess.AS)
[30] arXiv:2405.16952 [pdf, other]: Title: A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and Recognition

Authors: Zilu Guo, Qing Wang, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui

Subjects: Audio and Speech Processing (eess.AS)
[31] arXiv:2405.16834 [pdf, other]: Title: Speech enhancement deep-learning architecture for efficient edge processing

Authors: Monisankha Pal, Arvind Ramanathan, Ted Wada, Ashutosh Pandey

Subjects: Audio and Speech Processing (eess.AS)
[32] arXiv:2405.16677 [pdf, other]: Title: Crossmodal ASR Error Correction with Discrete Speech Units

Authors: Yuanchao Li, Pinzhen Chen, Peter Bell, Catherine Lai

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[33] arXiv:2405.17413 (cross-list from cs.SD) [pdf, ps, other]: Title: Enhancing Music Genre Classification through Multi-Algorithm Analysis and User-Friendly Visualization

Authors: Navin Kamuni, Dheerendra Panwar

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[34] arXiv:2405.17100 (cross-list from cs.CR) [pdf, other]: Title: Sok: Comprehensive Security Overview, Challenges, and Future Directions of Voice-Controlled Systems

Authors: Haozhe Xu, Cong Wu, Yangyang Gu, Xingcan Shang, Jing Chen, Kun He, Ruiying Du

Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[35] arXiv:2405.17028 (cross-list from cs.SD) [pdf, other]: Title: RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis

Authors: Haoxiang Shi, Jianzong Wang, Xulong Zhang, Ning Cheng, Jun Yu, Jing Xiao

Comments: Accepted by the 8th APWeb-WAIM International Joint Conference on Web and Big Data

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[36] arXiv:2405.16797 (cross-list from cs.SD) [pdf, ps, other]: Title: A Real-Time Voice Activity Detection Based On Lightweight Neural

Authors: Jidong Jia, Pei Zhao, Di Wang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[37] arXiv:2405.16687 (cross-list from cs.SD) [pdf, other]: Title: Reconstructing the Charlie Parker Omnibook using an audio-to-score automatic transcription pipeline

Authors: Xavier Riley, Simon Dixon

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[38] arXiv:2405.16136 (cross-list from cs.AI) [pdf, other]: Title: C3LLM: Conditional Multimodal Content Generation Using Large Language Models

Authors: Zixuan Wang, Qinkai Duan, Yu-Wing Tai, Chi-Keung Tang

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39] arXiv:2405.16000 (cross-list from cs.SD) [pdf, other]: Title: Carnatic Raga Identification System using Rigorous Time-Delay Neural Network

Authors: Sanjay Natesan, Homayoon Beigi

Comments: 7 pages, 2 tables, 3 figures

Journal-ref: Recognition Technologies, Inc. Technical Report (2024), RTI-20240524-01

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[40] arXiv:2405.15923 (cross-list from eess.SP) [pdf, ps, other]: Title: Spiketrum: An FPGA-based Implementation of a Neuromorphic Cochlea

Authors: MHD Anas Alsakkal, Jayawan Wijekoon

Comments: To be published at "IEEE Transactions on Circuits and Systems"

Subjects: Signal Processing (eess.SP); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[41] arXiv:2405.15863 (cross-list from cs.SD) [pdf, other]: Title: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

Authors: Chang Li, Ruoyu Wang, Lijuan Liu, Jun Du, Yixuan Sun, Zilu Guo, Zhenrong Zhang, Yuan Jiang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Mon, 3 Jun 2024
Fri, 31 May 2024
Thu, 30 May 2024
Wed, 29 May 2024
Tue, 28 May 2024

[ total of 41 entries: 1-41 ]
[ showing up to 46 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2406, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for recent submissions

Mon, 3 Jun 2024

Fri, 31 May 2024

Thu, 30 May 2024

Wed, 29 May 2024

Tue, 28 May 2024