eess.AS音频处理,共计44篇
【1】 Towards General-Purpose Text-Instruction-Guided Voice Conversion
链接:https://arxiv.org/abs/2309.14324
作者:Chun-Yi Kuan, Chen An Li, Tsu-Yuan Hsu, Tse-Yang Lin, Ho-Lam Chung, Kai-Wei Chang, Shuo-yiin Chang, Hung-yi Lee
备注:Accepted to ASRU 2023
【2】 Speaker anonymization using neural audio codec language models
链接:https://arxiv.org/abs/2309.14129
作者:Michele Panariello, Francesco Nespoli, Massimiliano Todisco, Nicholas Evans
备注:Submitted to ICASSP 2024
【3】 Haha-Pod: An Attempt for Laughter-based Non-Verbal Speaker Verification
链接:https://arxiv.org/abs/2309.14109
作者:Yuke Lin, Xiaoyi Qin, Ning Jiang, Guoqing Zhao, Ming Li
备注:accepted by ASRU 2023
【4】 Wav2vec-based Detection and Severity Level Classification of Dysarthria from Speech
链接:https://arxiv.org/abs/2309.14107
作者:Farhad Javanmardi, Saska Tirronen, Manila Kodali, Sudarsana Reddy Kadiri, Paavo Alku
【5】 BiSinger: Bilingual Singing Voice Synthesis
链接:https://arxiv.org/abs/2309.14089
作者:Huali Zhou, Yueqian Lin, Yao Shi, Peng Sun, Ming Li
备注:Accepted by ASRU2023
【6】 Analysis and Detection of Pathological Voice using Glottal Source Features
链接:https://arxiv.org/abs/2309.14080
作者:Sudarsana Reddy Kadiri, Paavo Alku
【7】 Unsupervised Accent Adaptation Through Masked Language Model Correction Of Discrete Self-Supervised Speech Units
链接:https://arxiv.org/abs/2309.13994
作者:Jakob Poncelet, Hugo Van hamme
备注:Submitted to ICASSP2024
【8】 Connecting Speech Encoder and Large Language Model for ASR
链接:https://arxiv.org/abs/2309.13963
作者:Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang
【9】 Evaluating Classification Systems Against Soft Labels with Fuzzy Precision and Recall
链接:https://arxiv.org/abs/2309.13938
作者:Manu Harju, Annamaria Mesaros
备注:published in DCASE 2023
【10】 Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors
链接:https://arxiv.org/abs/2309.13916
作者:Di Liang, Nian Shao, Xiaofei Li
【11】 AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
链接:https://arxiv.org/abs/2309.13905
作者:Jianwei Yu, Hangting Chen, Yanyao Bian, Xiang Li, Yi Luo, Jinchuan Tian, Mengyang Liu, Jiayi Jiang, Shuai Wang
【12】 Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction
链接:https://arxiv.org/abs/2309.13874
作者:Leying Zhang, Yao Qian, Linfeng Yu, Heming Wang, Xinkai Wang, Hemin Yang, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng
备注:Submitted to ICASSP 2024
【13】 A Two-Step Approach for Narrowband Source Localization in Reverberant Rooms
链接:https://arxiv.org/abs/2309.13819
作者:Wei-Ting Lai, Lachlan Birnie, Thushara Abhayapala, Amy Bastine, Shaoheng Xu, Prasanga Samarasinghe
【14】 VoiceLDM: Text-to-Speech with Environmental Context
链接:https://arxiv.org/abs/2309.13664
作者:Yeonghyeon Lee, Inmo Yeon, Juhan Nam, Joon Son Chung
备注:Demos and code are available at this https URL
【15】 Cross-modal Alignment with Optimal Transport for CTC-based ASR
链接:https://arxiv.org/abs/2309.13650
作者:Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
备注:Accepted to IEEE ASRU 2023
【16】 Efficient Black-Box Speaker Verification Model Adaptation with Reprogramming and Backend Learning
链接:https://arxiv.org/abs/2309.13605
作者:Jingyu Li, Tan Lee
【17】 Speech enhancement with frequency domain auto-regressive modeling
链接:https://arxiv.org/abs/2309.13537
作者:Anurenjan Purushothaman, Debottam Dutta, Rohit Kumar, Sriram Ganapathy
备注:10 pages
【18】 Attention Is All You Need For Blind Room Volume Estimation
链接:https://arxiv.org/abs/2309.13504
作者:Chunxi Wang, Maoshen Jia, Meiran Li, Changchun Bao, Wenyu Jin
备注:5 pages, 4 figures, submitted ICASSP 2024
【19】 Contrastive Speaker Embedding With Sequential Disentanglement
链接:https://arxiv.org/abs/2309.13253
作者:Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien
备注:Submitted to ICASSP 2024
【20】 Importance of Smoothness Induced by Optimizers in FL4ASR: Towards Understanding Federated Learning for End-to-End ASR
链接:https://arxiv.org/abs/2309.13102
作者:Sheikh Shams Azam, Tatiana Likhomanenko, Martin Pelikan, Jan "Honza" Silovsky
备注:In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023
【21】 An Investigation of Distribution Alignment in Multi-Genre Speaker Recognition
链接:https://arxiv.org/abs/2309.14158
作者:Zhenyu Zhou, Junhui Chen, Namin Wang, Lantian Li, Dong Wang
备注:submitted to ICASSP 2024
【22】 Multi-Domain Adaptation by Self-Supervised Learning for Speaker Verification
链接:https://arxiv.org/abs/2309.14149
作者:Wan Lin, Lantian Li, Dong Wang
备注:submitted to ICASSP 2024
【23】 On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
链接:https://arxiv.org/abs/2309.14130
作者:Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney
备注:submitted to ICASSP 2024
【24】 VoiceLens: Controllable Speaker Generation and Editing with Flow
链接:https://arxiv.org/abs/2309.14094
作者:Yao Shi, Ming Li
【25】 Audio classification with Dilated Convolution with Learnable Spacings
链接:https://arxiv.org/abs/2309.13972
作者:Ismail Khalfaoui-Hassani, Timothée Masquelier, Thomas Pellegrini
【26】 Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training
链接:https://arxiv.org/abs/2309.13942
作者:Jiangliu Wang, Jianbo Jiao, Yibing Song, Stephen James, Zhan Tong, Chongjian Ge, Pieter Abbeel, Yun-hui Liu
备注:Published at the CVPR 2023 Sight and Sound workshop
【27】 Real-Time Emergency Vehicle Detection using Mel Spectrograms and Regular Expressions
链接:https://arxiv.org/abs/2309.13920
作者:Alberto Pacheco-Gonzalez, Raymundo Torres, Raul Chacon, Isidro Robledo
备注:in Spanish language
【28】 HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS
链接:https://arxiv.org/abs/2309.13907
作者:Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie
备注:Accepted by ASRU2023
【29】 Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
链接:https://arxiv.org/abs/2309.13876
作者:Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe
备注:Accepted at ASRU 2023
【30】 Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
链接:https://arxiv.org/abs/2309.13860
作者:Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen
【31】 The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR
链接:https://arxiv.org/abs/2309.13573
作者:Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu
备注:8 pages, Accepted by ASRU2023
【32】 Related Rhythms: Recommendation System To Discover Music You May Like
链接:https://arxiv.org/abs/2309.13544
作者:Rahul Singh, Pranav Kanuparthi
【33】 Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
链接:https://arxiv.org/abs/2309.13509
作者:Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari
备注:Submitted to ASRU2023
【34】 Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection
链接:https://arxiv.org/abs/2309.13476
作者:Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia
备注:5 pages, 3 figures, submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing
【35】 Asca: less audio data is more insightful
链接:https://arxiv.org/abs/2309.13373
作者:Xiang Li, Junhao Chen, Chao Li, Hongwu Lv
备注:6 pages,3 figures
【36】 My Science Tutor (MyST) -- A Large Corpus of Children's Conversational Speech
链接:https://arxiv.org/abs/2309.13347
作者:Sameer S. Pradhan, Ronald A. Cole, Wayne H. Ward
【37】 Two vs. Four-Channel Sound Event Localization and Detection
链接:https://arxiv.org/abs/2309.13343
作者:Julia Wilkins, Magdalena Fuentes, Luca Bondi, Shabnam Ghaffarzadegan, Ali Abavisani, Juan Pablo Bello
【38】 Beyond Fairness: Age-Harmless Parkinson's Detection via Voice
链接:https://arxiv.org/abs/2309.13292
作者:Yicheng Wang, Xiaotian Han, Leisheng Yu, Na Zou
【39】 WikiMT++ Dataset Card
链接:https://arxiv.org/abs/2309.13259
作者:Monan Zhou, Shangda Wu, Yuan Wang, Wei Li
【40】 Importance of negative sampling in weak label learning
链接:https://arxiv.org/abs/2309.13227
作者:Ankit Shah, Fuyu Tang, Zelin Ye, Rita Singh, Bhiksha Raj
【41】 Invisible Watermarking for Audio Generation Diffusion Models
链接:https://arxiv.org/abs/2309.13166
作者:Xirong Cao, Xiang Li, Divyesh Jadav, Yanzhao Wu, Zhehui Chen, Chen Zeng, Wenqi Wei
备注:This is an invited paper for IEEE TPS, part of the IEEE CIC/CogMI/TPS 2023 conference
【42】 Towards Lexical Analysis of Dog Vocalizations via Online Videos
链接:https://arxiv.org/abs/2309.13086
作者:Yufei Wang, Chunhao Zhang, Jieyi Huang, Mengyue Wu, Kenny Zhu
【43】 Does My Dog ''Speak'' Like Me? The Acoustic Correlation between Pet Dogs and Their Human Owners
链接:https://arxiv.org/abs/2309.13085
作者:Jieyi Huang, Chunhao Zhang, Yufei Wang, Mengyue Wu, Kenny Zhu
【44】 Applied design thinking in urban air mobility: creating the airtaxi cabin design of the future from a user perspective
链接:https://arxiv.org/abs/2309.05353
作者:F.Reimer, J.Herzig, L.Winkler, J.Biedermann, F.Meller, B.Nagel
备注:13 pages
转自:“arXiv每日学术速递”微信公众号
如有侵权,请联系本站删除!