投稿问答最小化  关闭

万维书刊APP下载

音频处理学术速递[9.26]

2023/9/27 9:38:19  阅读:40 发布者:

eess.AS音频处理,共计44

1Towards General-Purpose Text-Instruction-Guided Voice Conversion

链接:https://arxiv.org/abs/2309.14324

作者:Chun-Yi Kuan,  Chen An Li,  Tsu-Yuan Hsu,  Tse-Yang Lin,  Ho-Lam Chung,  Kai-Wei Chang,  Shuo-yiin Chang,  Hung-yi Lee

备注:Accepted to ASRU 2023

2Speaker anonymization using neural audio codec language models

链接:https://arxiv.org/abs/2309.14129

作者:Michele Panariello,  Francesco Nespoli,  Massimiliano Todisco,  Nicholas Evans

备注:Submitted to ICASSP 2024

3Haha-Pod: An Attempt for Laughter-based Non-Verbal Speaker Verification

链接:https://arxiv.org/abs/2309.14109

作者:Yuke Lin,  Xiaoyi Qin,  Ning Jiang,  Guoqing Zhao,  Ming Li

备注:accepted by ASRU 2023

4Wav2vec-based Detection and Severity Level Classification of Dysarthria  from Speech

链接:https://arxiv.org/abs/2309.14107

作者:Farhad Javanmardi,  Saska Tirronen,  Manila Kodali,  Sudarsana Reddy Kadiri,  Paavo Alku

5BiSinger: Bilingual Singing Voice Synthesis

链接:https://arxiv.org/abs/2309.14089

作者:Huali Zhou,  Yueqian Lin,  Yao Shi,  Peng Sun,  Ming Li

备注:Accepted by ASRU2023

6Analysis and Detection of Pathological Voice using Glottal Source  Features

链接:https://arxiv.org/abs/2309.14080

作者:Sudarsana Reddy Kadiri,  Paavo Alku

7Unsupervised Accent Adaptation Through Masked Language Model Correction  Of Discrete Self-Supervised Speech Units

链接:https://arxiv.org/abs/2309.13994

作者:Jakob Poncelet,  Hugo Van hamme

备注:Submitted to ICASSP2024

8Connecting Speech Encoder and Large Language Model for ASR

链接:https://arxiv.org/abs/2309.13963

作者:Wenyi Yu,  Changli Tang,  Guangzhi Sun,  Xianzhao Chen,  Tian Tan,  Wei Li,  Lu Lu,  Zejun Ma,  Chao Zhang

9Evaluating Classification Systems Against Soft Labels with Fuzzy  Precision and Recall

链接:https://arxiv.org/abs/2309.13938

作者:Manu Harju,  Annamaria Mesaros

备注:published in DCASE 2023

10Frame-wise streaming end-to-end speaker diarization with  non-autoregressive self-attention-based attractors

链接:https://arxiv.org/abs/2309.13916

作者:Di Liang,  Nian Shao,  Xiaofei Li

11AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech  Data

链接:https://arxiv.org/abs/2309.13905

作者:Jianwei Yu,  Hangting Chen,  Yanyao Bian,  Xiang Li,  Yi Luo,  Jinchuan Tian,  Mengyang Liu,  Jiayi Jiang,  Shuai Wang

12Diffusion Conditional Expectation Model for Efficient and Robust Target  Speech Extraction

链接:https://arxiv.org/abs/2309.13874

作者:Leying Zhang,  Yao Qian,  Linfeng Yu,  Heming Wang,  Xinkai Wang,  Hemin Yang,  Long Zhou,  Shujie Liu,  Yanmin Qian,  Michael Zeng

备注:Submitted to ICASSP 2024

13A Two-Step Approach for Narrowband Source Localization in Reverberant  Rooms

链接:https://arxiv.org/abs/2309.13819

作者:Wei-Ting Lai,  Lachlan Birnie,  Thushara Abhayapala,  Amy Bastine,  Shaoheng Xu,  Prasanga Samarasinghe

14VoiceLDM: Text-to-Speech with Environmental Context

链接:https://arxiv.org/abs/2309.13664

作者:Yeonghyeon Lee,  Inmo Yeon,  Juhan Nam,  Joon Son Chung

备注:Demos and code are available at this https URL

15Cross-modal Alignment with Optimal Transport for CTC-based ASR

链接:https://arxiv.org/abs/2309.13650

作者:Xugang Lu,  Peng Shen,  Yu Tsao,  Hisashi Kawai

备注:Accepted to IEEE ASRU 2023

16Efficient Black-Box Speaker Verification Model Adaptation with  Reprogramming and Backend Learning

链接:https://arxiv.org/abs/2309.13605

作者:Jingyu Li,  Tan Lee

17Speech enhancement with frequency domain auto-regressive modeling

链接:https://arxiv.org/abs/2309.13537

作者:Anurenjan Purushothaman,  Debottam Dutta,  Rohit Kumar,  Sriram Ganapathy

备注:10 pages

18Attention Is All You Need For Blind Room Volume Estimation

链接:https://arxiv.org/abs/2309.13504

作者:Chunxi Wang,  Maoshen Jia,  Meiran Li,  Changchun Bao,  Wenyu Jin

备注:5 pages, 4 figures, submitted ICASSP 2024

19Contrastive Speaker Embedding With Sequential Disentanglement

链接:https://arxiv.org/abs/2309.13253

作者:Youzhi Tu,  Man-Wai Mak,  Jen-Tzung Chien

备注:Submitted to ICASSP 2024

20Importance of Smoothness Induced by Optimizers in FL4ASR: Towards  Understanding Federated Learning for End-to-End ASR

链接:https://arxiv.org/abs/2309.13102

作者:Sheikh Shams Azam,  Tatiana Likhomanenko,  Martin Pelikan,  Jan "Honza" Silovsky

备注:In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023

21An Investigation of Distribution Alignment in Multi-Genre Speaker  Recognition

链接:https://arxiv.org/abs/2309.14158

作者:Zhenyu Zhou,  Junhui Chen,  Namin Wang,  Lantian Li,  Dong Wang

备注:submitted to ICASSP 2024

22Multi-Domain Adaptation by Self-Supervised Learning for Speaker  Verification

链接:https://arxiv.org/abs/2309.14149

作者:Wan Lin,  Lantian Li,  Dong Wang

备注:submitted to ICASSP 2024

23On the Relation between Internal Language Model and Sequence  Discriminative Training for Neural Transducers

链接:https://arxiv.org/abs/2309.14130

作者:Zijian Yang,  Wei Zhou,  Ralf Schlüter,  Hermann Ney

备注:submitted to ICASSP 2024

24VoiceLens: Controllable Speaker Generation and Editing with Flow

链接:https://arxiv.org/abs/2309.14094

作者:Yao Shi,  Ming Li

25Audio classification with Dilated Convolution with Learnable Spacings

链接:https://arxiv.org/abs/2309.13972

作者:Ismail Khalfaoui-Hassani,  Timothée Masquelier,  Thomas Pellegrini

26Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training

链接:https://arxiv.org/abs/2309.13942

作者:Jiangliu Wang,  Jianbo Jiao,  Yibing Song,  Stephen James,  Zhan Tong,  Chongjian Ge,  Pieter Abbeel,  Yun-hui Liu

备注:Published at the CVPR 2023 Sight and Sound workshop

27Real-Time Emergency Vehicle Detection using Mel Spectrograms and Regular  Expressions

链接:https://arxiv.org/abs/2309.13920

作者:Alberto Pacheco-Gonzalez,  Raymundo Torres,  Raul Chacon,  Isidro Robledo

备注:in Spanish language

28HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for  Expressive Long-form TTS

链接:https://arxiv.org/abs/2309.13907

作者:Dake Guo,  Xinfa Zhu,  Liumeng Xue,  Tao Li,  Yuanjun Lv,  Yuepeng Jiang,  Lei Xie

备注:Accepted by ASRU2023

29Reproducing Whisper-Style Training Using an Open-Source Toolkit and  Publicly Available Data

链接:https://arxiv.org/abs/2309.13876

作者:Yifan Peng,  Jinchuan Tian,  Brian Yan,  Dan Berrebbi,  Xuankai Chang,  Xinjian Li,  Jiatong Shi,  Siddhant Arora,  William Chen,  Roshan Sharma,  Wangyou Zhang,  Yui Sudo,  Muhammad Shakeel,  Jee-weon Jung,  Soumi Maiti,  Shinji Watanabe

备注:Accepted at ASRU 2023

30Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech  Representation Learning

链接:https://arxiv.org/abs/2309.13860

作者:Guanrou Yang,  Ziyang Ma,  Zhisheng Zheng,  Yakun Song,  Zhikang Niu,  Xie Chen

31The second multi-channel multi-party meeting transcription challenge  (M2MeT) 2.0): A benchmark for speaker-attributed ASR

链接:https://arxiv.org/abs/2309.13573

作者:Yuhao Liang,  Mohan Shi,  Fan Yu,  Yangze Li,  Shiliang Zhang,  Zhihao Du,  Qian Chen,  Lei Xie,  Yanmin Qian,  Jian Wu,  Zhuo Chen,  Kong Aik Lee,  Zhijie Yan,  Hui Bu

备注:8 pages, Accepted by ASRU2023

32Related Rhythms: Recommendation System To Discover Music You May Like

链接:https://arxiv.org/abs/2309.13544

作者:Rahul Singh,  Pranav Kanuparthi

33Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics  Description for Prompt-based Control

链接:https://arxiv.org/abs/2309.13509

作者:Aya Watanabe,  Shinnosuke Takamichi,  Yuki Saito,  Wataru Nakata,  Detai Xin,  Hiroshi Saruwatari

备注:Submitted to ASRU2023

34Hierarchical attention interpretation: an interpretable speech-level  transformer for bi-modal depression detection

链接:https://arxiv.org/abs/2309.13476

作者:Qingkun Deng,  Saturnino Luz,  Sofia de la Fuente Garcia

备注:5 pages, 3 figures, submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing

35Asca: less audio data is more insightful

链接:https://arxiv.org/abs/2309.13373

作者:Xiang Li,  Junhao Chen,  Chao Li,  Hongwu Lv

备注:6 pages,3 figures

36My Science Tutor (MyST) -- A Large Corpus of Children's Conversational  Speech

链接:https://arxiv.org/abs/2309.13347

作者:Sameer S. Pradhan,  Ronald A. Cole,  Wayne H. Ward

37Two vs. Four-Channel Sound Event Localization and Detection

链接:https://arxiv.org/abs/2309.13343

作者:Julia Wilkins,  Magdalena Fuentes,  Luca Bondi,  Shabnam Ghaffarzadegan,  Ali Abavisani,  Juan Pablo Bello

38Beyond Fairness: Age-Harmless Parkinson's Detection via Voice

链接:https://arxiv.org/abs/2309.13292

作者:Yicheng Wang,  Xiaotian Han,  Leisheng Yu,  Na Zou

39WikiMT++ Dataset Card

链接:https://arxiv.org/abs/2309.13259

作者:Monan Zhou,  Shangda Wu,  Yuan Wang,  Wei Li

40Importance of negative sampling in weak label learning

链接:https://arxiv.org/abs/2309.13227

作者:Ankit Shah,  Fuyu Tang,  Zelin Ye,  Rita Singh,  Bhiksha Raj

41Invisible Watermarking for Audio Generation Diffusion Models

链接:https://arxiv.org/abs/2309.13166

作者:Xirong Cao,  Xiang Li,  Divyesh Jadav,  Yanzhao Wu,  Zhehui Chen,  Chen Zeng,  Wenqi Wei

备注:This is an invited paper for IEEE TPS, part of the IEEE CIC/CogMI/TPS 2023 conference

42Towards Lexical Analysis of Dog Vocalizations via Online Videos

链接:https://arxiv.org/abs/2309.13086

作者:Yufei Wang,  Chunhao Zhang,  Jieyi Huang,  Mengyue Wu,  Kenny Zhu

43Does My Dog ''Speak'' Like Me? The Acoustic Correlation between Pet Dogs  and Their Human Owners

链接:https://arxiv.org/abs/2309.13085

作者:Jieyi Huang,  Chunhao Zhang,  Yufei Wang,  Mengyue Wu,  Kenny Zhu

44Applied design thinking in urban air mobility: creating the airtaxi  cabin design of the future from a user perspective

链接:https://arxiv.org/abs/2309.05353

作者:F.Reimer,  J.Herzig,  L.Winkler,  J.Biedermann,  F.Meller,  B.Nagel

备注:13 pages

转自:arXiv每日学术速递”微信公众号

如有侵权,请联系本站删除!


  • 万维QQ投稿交流群    招募志愿者

    版权所有 Copyright@2009-2015豫ICP证合字09037080号

     纯自助论文投稿平台    E-mail:eshukan@163.com