- ホーム
- > 洋書
- > 英文書
- > Computer / Languages
Full Description
This book constitutes the refereed proceedings of the 20th National Conference on Man-Machine Speech Communication, NCMMSC 2025, held in Zhenjiang, China, during October 16-19, 2025.
The 40 papers included in these proceedings were carefully reviewed and selected from 157 submissions. the conference will feature special events such as a Young Scholars Forum, Student Forum, Industry Forum, and Product and Technology Exhibition. Beyond the main program, the conference will also include publicoutreach activities, grant-writing workshops, and several special sessions.
Contents
.- Zero- and One-Shot Data Augmentation for Sentence-Level Dysarthric Speech
Recognition in Constrained Scenarios.
.- Multilevel and Granular L2 Pronunciation Assessment Using Stress-Based
Suprasegmental Features and Proficiency Adaptation.
.- CDMGTU-Net: A Causal Dual-Branch Multi-Channel Speech Enhancement Network
with Multi-Scale Gateted Feature Fusion.
.- A Two-Stage Band-Split Mamba-2 Network For Music Source Separation.
.- Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text.
.- MambaVoc: State Space Models for High-Fidelity Audio Synthesis.
.- StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding.
.- Automatic Speech Evaluation Method Leveraging Deep Feature Fusion.
.- Curriculum Reinforcement Learning for Robust Low-Resource Chinese Dialect Speech Recognition.
.- An Acoustic Study on Intonation Production of English Learners from Guanzhong Region in Shaanxi Province.
.- Improving Anomalous Sound Detection with Top-M Pseudo-Labeling.
.- Dementia Detection via Speech Temporal Sequences with Shifted Windows.
.- CL-EDiff: Cross-lingual emotional TTS system based on diffusion model.
.- When AI Speaks, Do We Follow? Phonetic Entrainment in Human-AI Dialogues.
.- Aishell1Mix: Towards Robust Mandarin Speech Separation with Scalable Audio Language Models.
.- Study of the Low-Rank Minimum Variance Distortionless Response Beamformer for Speech Enhancement.
.- Exploring Gender Bias in Alzheimer's Disease Detection: Insights from Mandarin and Greek Speech Perception.
.- UniDaugMamba: A Unimodal Data-augmented Mamba for Speech-Based Depression Detection.
.- Serial-Parallel Dual-Path Architecture for Speaking Style Recognition.
.- Knowledge Augmented Finetuning Matters in Both RAG and Agent Based Dialog Systems.
.- NC-KWS: Few-Shot Class-Incremental Keyword Spotting Based on Neural Collapse.
.- ZSEmo-MTVITS: A Zero-Shot Cross-Lingual Emotional Speech Synthesis Model for Mandarin and Tibetan Based on VITS.
.- CUHK-EE Systems for the vTAD Challenge at NCMMSC 2025.
.- Accent Familiarity and Phonological Weighting in Spoken-Word Recognition.
.- Audio Deepfake Detection via Dual Branch Classifier with Self-Supervised Pre-Trained Model.
.- A Multi-Subspace Attention Approach for Robust Speech Spoofing Detection in Silence-Trimming Conditions.
.- Temporally Consistent Teeth Restoration for Talking Heads.
.- EEG as a Biometric Identifier: The Impact of Electrode Arrangement, Brain Areas, and Frequency Bands.
.- The Phonetic Modification and Facial Movements Made During Mandarin Vowel and Tone Production in Noise.
.- Exploring Audio-Visual Fusion for Sound Event Localization and Detection with BEATs.
.- On Multi-Input Multi-Frame MVDR Filter for Speech Enhancement with Heterophasic Presentation.
.- Adaptive Multi-source Fusion for Uyghur ASR Error Correction.
.- The determinants of Chinese lexical stress.
.- Introducing Discriminative Speaker Embeddings for Voice Timbre Attribute Detection.
.- TSELM: Target Speaker Extraction using Discrete Tokens and Language Models.
.- A Timbre Attribute Discrimination System Fusing Pre-trained Speaker Feature Extractors with Gender Prior Features.
.- Improving the Robustness of Audio-Visual Target Speaker Extraction With AV-HuBERT Based Lip Features.
.- A Hierarchical Fusion Modeling from Perception to Prediction with Personalized Features for Multimodal Depression Detection.
.- Revisiting Target Signal Definitions in Distortionless Superdirective Beamforming for Reverberant Speech Enhancement.
.- HiStyle: Hierarchical Style Embedding Prediction for Text-Prompt-Guided Controllable Speech Synthesis.



