Last update:
Sat Jun 8 14:56:32 MDT 2024
Anonymous Table of Contents . . . . . . . . . . . 1--2
Anonymous Table of Contents . . . . . . . . . . . 3--4
L. Deng and
S. Renals and
M. Federico and
M. Ostendorf Editorial: Expanding the Technical Reach
of our Transactions . . . . . . . . . . 5--5
J. Taghia and
R. Martin Objective Intelligibility Measures Based
on Mutual Information for Speech
Subjected to Speech Enhancement
Processing . . . . . . . . . . . . . . . 6--16
Liang Lu and
A. Ghoshal and
S. Renals Cross-Lingual Subspace Gaussian Mixture
Models for Low-Resource Speech
Recognition . . . . . . . . . . . . . . 17--27
M. Gasic and
S. Young Gaussian Processes for POMDP-Based
Dialogue Manager Optimization . . . . . 28--40
I. Mezghani-Marrakchi and
G. Mahe and
S. Djaziri-Larbi and
M. Jaidane and
M. Turki-Hadj Alouane Nonlinear Audio Systems Identification
Through Audio Input Gaussianization . . 41--53
J. B. Crespo and
R. C. Hendriks Multizone Speech Reinforcement . . . . . 54--66
Chao Pan and
Jingdong Chen and
J. Benesty Performance Study of the MVDR Beamformer
as a Function of the Source Incidence
Angle . . . . . . . . . . . . . . . . . 67--79
Hung-yi Lee and
Lin-shan Lee Improved Semantic Retrieval of Spoken
Content by Document/Query Expansion with
Random Walk Over Acoustic Similarity
Graphs . . . . . . . . . . . . . . . . . 80--94
V. Leutnant and
A. Krueger and
R. Haeb-Umbach A New Observation Model in the
Logarithmic Mel Power Spectral Domain
for the Automatic Recognition of Noisy
Reverberant Speech . . . . . . . . . . . 95--109
N. F. Chen and
S. W. Tam and
Wade Shen and
J. P. Campbell Characterizing Phonetic Transformations
and Acoustic Differences Across English
Dialects . . . . . . . . . . . . . . . . 110--124
D. Markovic and
K. Kowalczyk and
F. Antonacci and
C. Hofmann and
A. Sarti and
W. Kellermann Estimation of Acoustic Reflection
Coefficients Through Pseudospectrum
Matching . . . . . . . . . . . . . . . . 125--137
Zhiyao Duan and
Jinyu Han and
B. Pardo Multi-pitch Streaming of Harmonic Sound
Mixtures . . . . . . . . . . . . . . . . 138--150
Shilin Liu and
Khe Chai Sim Temporally Varying Weight Regression: A
Semi-Parametric Trajectory Model for
Automatic Speech Recognition . . . . . . 151--160
V. S. Tomar and
R. C. Rose A Family of Discriminative Manifold
Learning Algorithms and Their
Application to Speech Recognition . . . 161--171
H. Doi and
T. Toda and
K. Nakamura and
H. Saruwatari and
K. Shikano Alaryngeal Speech Enhancement Based on
One-to-Many Eigenvoice Conversion . . . 172--183
E. Arisoy and
S. F. Chen and
B. Ramabhadran and
A. Sethy Converting Neural Network Language
Models into Back-off Language Models for
Efficient Decoding in Automatic Speech
Recognition . . . . . . . . . . . . . . 184--192
C. T. Jin and
N. Epain and
A. Parthy Design, Optimization and Evaluation of a
Dual-Radius Spherical Microphone Array 193--204
R. Mignot and
G. Chardon and
L. Daudet Low Frequency Interpolation of Room
Impulse Responses Using Compressed
Sensing . . . . . . . . . . . . . . . . 205--216
M. Senoussaoui and
P. Kenny and
T. Stafylakis and
P. Dumouchel A Study of the Cosine Distance-Based
Mean Shift for Telephone Speech
Diarization . . . . . . . . . . . . . . 217--227
H. Tachibana and
N. Ono and
S. Sagayama Singing Voice Enhancement in Monaural
Music Signals Based on Two-stage
Harmonic/Percussive Sound Separation on
Multiple Resolution Spectrograms . . . . 228--237
N. R. Shabtai and
B. Rafaely Generalized Spherical Array Beamforming
for Binaural Speech Reproduction . . . . 238--247
S. Cumani and
P. Laface Factorized Sub-Space Estimation for Fast
and Memory Effective $I$-vector
Extraction . . . . . . . . . . . . . . . 248--259
Yuan Zeng and
R. C. Hendriks Distributed Delay and Sum Beamformer for
Speech Enhancement via Randomized Gossip 260--273
Zhenghua Li and
Min Zhang and
Wanxiang Che and
Ting Liu and
Wenliang Chen Joint Optimization for Chinese POS
Tagging and Dependency Parsing . . . . . 274--286
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing --- EDICS . . . 289--290
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Information for
Authors . . . . . . . . . . . . . . . . 291--292
Anonymous Open Access . . . . . . . . . . . . . . 293--293
Anonymous [Blank page] . . . . . . . . . . . . . . B287--B288
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 289--290
Anonymous Table of contents . . . . . . . . . . . 291--292
Dehong Gao and
Wenjie Li and
Xiaoyan Cai and
Renxian Zhang and
You Ouyang Sequential Summarization: a Full View of
Twitter Trending Topics . . . . . . . . 293--302
P. W. J. van Hengel and
J. D. Krijnders A Comparison of Spectro-Temporal
Representations of Audio Signals . . . . 303--313
I. Zitouni and
Y. Benajiba Aligned-Parallel-Corpora Based
Semi-Supervised Learning for Arabic
Mention Detection . . . . . . . . . . . 314--324
E. Molina and
A. M. Barbancho and
L. J. Tardon and
I. Barbancho Dissonance Reduction In Polyphonic Audio
Using Harmonic Reorganization . . . . . 325--334
D. P. K. Lun and
Tak-Wai Shen and
K. C. Ho A Novel Expectation-Maximization
Framework for Speech Enhancement in
Non-Stationary Noise Environments . . . 335--346
S. Cosentino and
T. H. Falk and
D. McAlpine and
T. Marquardt Cochlear Implant Filterbank Design and
Optimization: A Simulation Study . . . . 347--353
M. Souden and
K. Kinoshita and
M. Delcroix and
T. Nakatani Location Feature Integration for
Clustering-Based Speech Separation in
Distributed Microphone Arrays . . . . . 354--367
H. Kallasjoki and
J. F. Gemmeke and
K. J. Palomaki Estimating Uncertainty to Improve
Exemplar-Based Feature Enhancement for
Noise Robust Speech Recognition . . . . 368--380
T. Hasan and
J. H. L. Hansen Maximum Likelihood Acoustic Factor
Analysis Models for Robust Speaker
Verification in Noise . . . . . . . . . 381--391
O. Schwartz and
S. Gannot Speaker Tracking Using Recursive EM
Algorithms . . . . . . . . . . . . . . . 392--402
Yu Tsao and
S. Matsuda and
C. Hori and
H. Kashioka and
Chin-Hui Lee A MAP-based Online Estimation Approach
to Ensemble Speaker and Speaking
Environment Modeling . . . . . . . . . . 403--416
Pui-Yu Hui and
H. Meng Latent Semantic Analysis for Multimodal
User Input With Speech and Gestures . . 417--429
J. Jensen and
C. H. Taal Speech Intelligibility Prediction Based
on Mutual Information . . . . . . . . . 430--440
A. Primavera and
S. Cecchi and
Junfeng Li and
F. Piazza Objective and Subjective Investigation
on a Novel Method for Digital
Reverberator Parameters Estimation . . . 441--452
M. Speed and
D. Murphy and
D. Howard Modeling the Vocal Tract Transfer
Function Using a $3$D Digital Waveguide
Mesh . . . . . . . . . . . . . . . . . . 453--464
Hüseyim Hacìhabibo\uglu Theoretical Analysis of Open Spherical
Microphone Arrays for Acoustic Intensity
Measurements . . . . . . . . . . . . . . 465--476
Taemin Cho and
J. P. Bello On the Relative Importance of Individual
Components of Chord Recognition Systems 477--492
T. Otsuka and
K. Ishiguro and
H. Sawada and
H. G. Okuno Bayesian Nonparametrics for Microphone
Array Processing . . . . . . . . . . . . 493--504
Jianjun He and
Ee-Leng Tan and
Woon-Seng Gan Linear Estimation Based Primary-Ambient
Extraction for Stereo Audio Signals . . 505--517
S. Gonzalez and
M. Brookes PEFAC --- A Pitch Estimation Algorithm
Robust to High Levels of Noise . . . . . 518--530
Min Zhang and
Xiangyu Duan and
Wenliang Chen Bayesian Constituent Context Model for
Grammar Induction . . . . . . . . . . . 531--541
Dah-Chung Chang and
Fei-Tao Chu Feedforward Active Noise Control With a
New Variable Tap-Length and Step-Size
Filtered-X LMS Algorithm . . . . . . . . 542--555
M. McVicar and
R. Santos-Rodriguez and
Yizhao Ni and
Tijl De Bie Automatic Chord Estimation from Audio: a
Review of the State of the Art . . . . . 556--575
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing --- EDICS . . . 576--577
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Information for
Authors . . . . . . . . . . . . . . . . 578--579
Anonymous Open Access . . . . . . . . . . . . . . 580--580
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 581--582
Anonymous Table of Contents . . . . . . . . . . . 583--584
Chung-Hsien Wu and
Yi-Chin Huang and
Chung-Han Lee and
Jun-Cheng Guo Synthesis of Spontaneous Speech With
Syllable Contraction Using State-Based
Context-Dependent Voice Transformation 585--595
M. Airaksinen and
T. Raitio and
B. Story and
P. Alku Quasi Closed Phase Glottal Inverse
Filtering Analysis With Weighted Linear
Prediction . . . . . . . . . . . . . . . 596--607
Jae-Mo Yang and
Hong-Goo Kang Online Speech Dereverberation Algorithm
Based on Adaptive Multichannel Linear
Prediction . . . . . . . . . . . . . . . 608--619
A. Asaei and
M. Golbabaee and
H. Bourlard and
V. Cevher Structured Sparsity Models for
Reverberant Speech Separation . . . . . 620--633
R. S. Rashobh and
A. W. H. Khong and
Di Liu Multichannel Equalization in the KLT and
Frequency Domains With Application to
Speech Dereverberation . . . . . . . . . 634--646
P. Samarasinghe and
T. Abhayapala and
M. Poletti Wavefield Analysis Over Large Areas
Using Distributed Higher Order
Microphones . . . . . . . . . . . . . . 647--658
Wen-Li Wei and
Chung-Hsien Wu and
Jen-Chun Lin and
Han Li Exploiting Psychological Factors for
Interaction Style Recognition in Spoken
Conversation . . . . . . . . . . . . . . 659--671
S. A. Raczy\'nski and
E. Vincent Genre-Based Music Language Modeling with
Latent Hierarchical Pitman-Yor Process
Allocation . . . . . . . . . . . . . . . 672--681
Dalei Wu and
Wei-Ping Zhu and
M. N. S. Swamy The Theory of Compressive Sensing
Matching Pursuit Considering Time-domain
Noise with Application to Speech
Enhancement . . . . . . . . . . . . . . 682--696
T. Nanjundaswamy and
K. Rose Cascaded Long Term Prediction for
Enhanced Compression of Polyphonic Audio
Signals . . . . . . . . . . . . . . . . 697--710
K. Audhkhasi and
A. M. Zavou and
P. G. Georgiou and
S. S. Narayanan Theoretical Analysis of Diversity in an
Ensemble of Automatic Speech Recognition
Systems . . . . . . . . . . . . . . . . 711--726
J. Nikunen and
T. Virtanen Direction of Arrival Based Spatial
Covariance Model for Blind Sound Source
Separation . . . . . . . . . . . . . . . 727--739
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 741--742
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Information for
Authors . . . . . . . . . . . . . . . . 743--744
Anonymous Open Access . . . . . . . . . . . . . . 745--745
Anonymous Publish your article in IEEE Access . . 746--746
Anonymous [Blank page] . . . . . . . . . . . . . . B740
Anonymous [Front cover] . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 741--742
Anonymous Table of contents . . . . . . . . . . . 743--744
Jinyu Li and
Li Deng and
Yifan Gong and
R. Haeb-Umbach An Overview of Noise-Robust Automatic
Speech Recognition . . . . . . . . . . . 745--777
R. Sarikaya and
G. E. Hinton and
A. Deoras Application of Deep Belief Networks for
Natural Language Understanding . . . . . 778--784
R. Serizel and
M. Moonen and
B. Van Dijk and
J. Wouters Low-rank Approximation Based
Multichannel Wiener Filter Algorithms
for Noise Reduction with Application in
Cochlear Implants . . . . . . . . . . . 785--799
M. Crocco and
A. Trucco Design of Superdirective Planar Arrays
With Sparse Aperiodic Layouts for
Processing Broadband Signals via $3$-D
Beamforming . . . . . . . . . . . . . . 800--815
J. R. Zapata and
M. E. P. Davies and
E. Gomez Multi-Feature Beat Tracking . . . . . . 816--825
A. Narayanan and
Deliang Wang Investigation of Speech Separation as a
Front-End for Noise Robust Speech
Recognition . . . . . . . . . . . . . . 826--835
Xiaojia Zhao and
Yuxuan Wang and
Deliang Wang Robust Speaker Identification in Noisy
and Reverberant Conditions . . . . . . . 836--845
S. Cumani and
O. Plchot and
P. Laface On the use of $i$-vector posterior
distributions in Probabilistic Linear
Discriminant Analysis . . . . . . . . . 846--857
Chung-Hsien Wu and
Han-Ping Shen and
Yan-Ting Yang Chinese--English Phone Set Construction
for Code-Switching ASR Using Acoustic
and DNN-Extracted Articulatory Features 858--862
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 863--864
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Information for
Authors . . . . . . . . . . . . . . . . 865--866
Anonymous Open Access . . . . . . . . . . . . . . 867--867
Anonymous Publish your article in IEEE Access . . 868--868
Anonymous [Front cover] . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 869--870
Anonymous Table of Contents . . . . . . . . . . . 871--872
Weibin Zhang and
P. Fung Discriminatively Trained Sparse Inverse
Covariance Matrices for Speech
Recognition . . . . . . . . . . . . . . 873--882
Hung-yi Lee and
Sz-Rung Shiang and
Ching-Feng Yeh and
Yun-Nung Chen and
Yu Huang and
Sheng-Yi Kong and
Lin-shan Lee Spoken Knowledge Organization by
Semantic Structuring and a Prototype
Course Lecture System for Personalized
Learning . . . . . . . . . . . . . . . . 883--898
L. Zão and
R. Coelho and
P. Flandrin Speech Enhancement with EMD and
Hurst-Based Mode Selection . . . . . . . 899--911
D. Giacobello and
M. G. Christensen and
T. L. Jensen and
M. N. Murthi and
S. H. Jensen and
M. Moonen Stable $1$-Norm Error Minimization Based
Linear Predictors for Speech Modeling 912--922
Y. Lacouture-Parodi and
E. A. P. Habets and
Jingdong Chen and
J. Benesty Multichannel Noise Reduction in the
Karhunen--Lo\`eve Expansion Domain . . . 923--936
S. O. Sadjadi and
J. H. L. Hansen Blind Spectral Weighting for Robust
Speaker Identification under
Reverberation Mismatch . . . . . . . . . 937--945
G. Mantena and
S. Achanta and
K. Prahallad Query-by-Example Spoken Term Detection
using Frequency Domain Linear Prediction
and Non-Segmental Dynamic Time Warping 946--955
C. Osterwise and
S. L. Grant On Over-Determined Frequency Domain BSS 956--966
D. P. Jarrett and
M. Taseska and
E. A. P. Habets and
P. A. Naylor Noise Reduction in the Spherical
Harmonic Domain Using a Tradeoff
Beamformer and Narrowband DOA Estimates 967--978
V. Rieser and
O. Lemon and
S. Keizer Natural Language Generation as
Incremental Planning Under Uncertainty:
Adaptive Information Presentation for
Statistical Dialogue Systems . . . . . . 979--994
J. Cheer and
S. J. Elliott Comments on ``Complete Parallel
Narrowband Active Noise Control
Systems'' . . . . . . . . . . . . . . . 995--996
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 999--1000
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Information for
Authors . . . . . . . . . . . . . . . . 1001--1002
Anonymous Blank page . . . . . . . . . . . . . . . B997--B998
Anonymous [Front cover] . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 999--1000
Anonymous Table of contents . . . . . . . . . . . 1001--1002
V. Arora and
L. Behera Musical Source Clustering and
Identification in Polyphonic Audio . . . 1003--1012
R. C. Nongpiur Design of Minimax Broadband Beamformers
that are Robust to Microphone Gain,
Phase, and Position Errors . . . . . . . 1013--1022
A. Venkitaraman and
C. S. Seelamantula Binaural Signal Processing Motivated
Generalized Analytic Signal Construction
and AM--FM Demodulation . . . . . . . . 1023--1036
J. T. Geiger and
F. Weninger and
J. F. Gemmeke and
M. Wollmer and
B. Schuller and
G. Rigoll Memory-Enhanced Neural Networks and NMF
for Robust ASR . . . . . . . . . . . . . 1037--1046
Haiquan Zhao and
Yi Yu and
Shibin Gao and
Xiangping Zeng and
Zhengyou He Memory Proportionate APA with Individual
Activation Factors for Acoustic Echo
Cancellation . . . . . . . . . . . . . . 1047--1055
M. J. Gangeh and
P. Fewzee and
A. Ghodsi and
M. S. Kamel and
F. Karray Multiview Supervised Dictionary Learning
in Speech Emotion Recognition . . . . . 1056--1068
Jae-Hun Choi and
Joon-Hyuk Chang Dual-Microphone Voice Activity Detection
Technique Based on Two-Step Power Level
Difference Ratio . . . . . . . . . . . . 1069--1081
X. Alameda-Pineda and
R. Horaud A Geometric Approach to Sound Source
Localization from Time-Delay Estimates 1082--1095
K. Reindl and
S. Meier and
H. Barfuss and
W. Kellermann Minimum Mutual Information-Based
Linearly Constrained Broadband Signal
Extraction . . . . . . . . . . . . . . . 1096--1108
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 1109--1110
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Information for
Authors . . . . . . . . . . . . . . . . 1111--1112
Anonymous [Front cover] . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1113--1114
Anonymous Table of Contents . . . . . . . . . . . 1115--1116
M. H. Bahari and
N. Dehak and
H. Van hamme and
L. Burget and
A. M. Ali and
J. Glass Non-Negative Factor Analysis of Gaussian
Mixture Model Weight Adaptation for
Language and Dialect Recognition . . . . 1117--1129
Guangzhao Bao and
Yangfei Xu and
Zhongfu Ye Learning a Discriminative Dictionary for
Single-Channel Speech Separation . . . . 1130--1138
I. J. Kelly and
F. M. Boland Detecting Arrivals in Room Impulse
Responses With Dynamic Time Warping . . 1139--1147
M. Guldenschuh and
R. de Callafon Detection of Secondary-Path
Irregularities in Active Noise Control
Headphones . . . . . . . . . . . . . . . 1148--1157
Sin-Horng Chen and
Chiao-Hua Hsieh and
Chen-Yu Chiang and
Hsi-Chun Hsiao and
Yih-Ru Wang and
Yuan-Fu Liao and
Hsiu-Min Yu Modeling of Speaking Rate Influences on
Mandarin Speech Prosody and Its
Application to Speaking Rate-controlled
TTS . . . . . . . . . . . . . . . . . . 1158--1171
D. Comminiello and
M. Scarpiniti and
L. A. Azpicueta-Ruiz and
J. Arenas-Garcia and
A. Uncini Nonlinear Acoustic Echo Cancellation
Based on Sparse Functional Link
Representations . . . . . . . . . . . . 1172--1183
Wen Zhang and
T. D. Abhayapala Three Dimensional Sound Field
Reproduction using Multiple Circular
Loudspeaker Arrays: Functional Analysis
Guided Approach . . . . . . . . . . . . 1184--1194
M. Taseska and
E. A. P. Habets Informed Spatial Filtering for Sound
Extraction Using Distributed Microphone
Arrays . . . . . . . . . . . . . . . . . 1195--1207
Mo Shen and
D. Kawahara and
S. Kurohashi Dependency Parse Reranking with Rich
Subtree Features . . . . . . . . . . . . 1208--1218
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 1221--1222
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Information for
Authors . . . . . . . . . . . . . . . . 1223--1224
Anonymous Open Access . . . . . . . . . . . . . . 1225--1225
Anonymous [Blank page] . . . . . . . . . . . . . . B1219--B1220
Anonymous [Front cover] . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 1221--1222
Anonymous Table of contents . . . . . . . . . . . 1223--1224
Zhibao Li and
K. F. C. Yiu and
S. Nordholm On the Indoor Beamformer Design With
Reverberation . . . . . . . . . . . . . 1225--1235
M. B. Hawes and
Wei Liu Sparse Array Design for Wideband
Beamforming With Reduced Complexity in
Tapped Delay-Lines . . . . . . . . . . . 1236--1247
Yi FanChiang and
Cheng-Wen Wei and
Yi-Le Meng and
Yu-Wen Lin and
Shyh-Jye Jou and
Tian-Sheuan Chang Low Complexity Formant Estimation
Adaptive Feedback Cancellation for
Hearing Aids Using Pitch Based
Processing . . . . . . . . . . . . . . . 1248--1259
S. Conan and
O. Derrien and
M. Aramaki and
S. Ystad and
R. Kronland-Martinet A Synthesis Model With Intuitive Control
Capabilities for Rolling Sounds . . . . 1260--1273
C. Schuldt and
P. Handel Decay Rate Estimators and Their
Performance for Blind Reverberation Time
Estimation . . . . . . . . . . . . . . . 1274--1284
S. Ganapathy and
S. H. Mallidi and
H. Hermansky Robust Feature Extraction Using
Modulation Filtering of Autoregressive
Models . . . . . . . . . . . . . . . . . 1285--1295
Bo Li and
Khe Chai Sim A Spectral Masking Approach to
Noise-Robust Speech Recognition Using
Deep Neural Networks . . . . . . . . . . 1296--1305
E. Yilmaz and
J. F. Gemmeke and
H. Van hamme Noise Robust Exemplar Matching Using
Sparse Representations of Speech . . . . 1306--1319
D. Schmid and
G. Enzner and
S. Malik and
D. Kolossa and
R. Martin Variational Bayesian Inference for
Multichannel Dereverberation and Noise
Reduction . . . . . . . . . . . . . . . 1320--1335
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 1336--1337
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Information for
Authors . . . . . . . . . . . . . . . . 1338--1339
Anonymous Open Access . . . . . . . . . . . . . . 1340--1340
Anonymous [Front cover] . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1341--1342
Anonymous Table of Contents . . . . . . . . . . . 1343--1344
B. Masiero and
M. Vorlander A Framework for the Calculation of
Dynamic Crosstalk Cancellation Filters 1345--1354
A. Schasse and
R. Martin Estimation of Subband Speech
Correlations for Noise Reduction via
MVDR Processing . . . . . . . . . . . . 1355--1365
Michal Novotný and
Jan Rusz and
Roman \vCmejla and
Ev\vzen R\ru\vzi\vcka Automatic Evaluation of Articulatory
Disorders in Parkinson's Disease . . . . 1366--1378
F. Lim and
Wancheng Zhang and
E. A. P. Habets and
P. A. Naylor Robust Multichannel Dereverberation
using Relaxed Multichannel Least Squares 1379--1390
S. H. Ghalehjegh and
R. C. Rose Linear Regression Based Acoustic
Adaptation for the Subspace Gaussian
Mixture Model . . . . . . . . . . . . . 1391--1402
J. Botts and
L. Savioja Spectral and Pseudospectral Properties
of Finite Difference Models Used in
Audio and Room Acoustics . . . . . . . . 1403--1412
Yong Xiang and
I. Natgunanathan and
Song Guo and
Wanlei Zhou and
S. Nahavandi Patchwork-Based Audio Watermarking
Method Robust to De-synchronization
Attacks . . . . . . . . . . . . . . . . 1413--1423
I. V. McLoughlin Super-Audible Voice Activity Detection 1424--1433
A. Alinaghi and
P. J. Jackson and
Qingju Liu and
Wenwu Wang Joint Mixing Vector and Binaural Model
Based Stereo Source Separation . . . . . 1434--1448
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 1451--1452
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Information for
Authors . . . . . . . . . . . . . . . . 1453--1454
Anonymous Open Access . . . . . . . . . . . . . . 1455--1455
Anonymous Together, we are advancing technology 1456--1456
Anonymous [Blank page] . . . . . . . . . . . . . . B1449--B1450
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 1451--1452
Anonymous Table of contents . . . . . . . . . . . 1453--1454
Liheng Zhao and
J. Benesty and
Jingdong Chen Design of Robust Differential Microphone
Arrays . . . . . . . . . . . . . . . . . 1455--1466
P. Jain and
R. B. Pachori Event-Based Method for Instantaneous
Fundamental Frequency Estimation from
Voiced Speech Based on Eigenvalue
Decomposition of the Hankel Matrix . . . 1467--1482
Y. Vaizman and
B. McFee and
G. Lanckriet Codebook-Based Audio Feature
Representation for Music Information
Retrieval . . . . . . . . . . . . . . . 1483--1493
O. Nadiri and
B. Rafaely Localization of Multiple Speakers under
High Reverberation using a Spherical
Microphone Array and the Direct-Path
Dominance Test . . . . . . . . . . . . . 1494--1505
Zhizheng Wu and
T. Virtanen and
Eng Siong Chng and
Haizhou Li Exemplar-Based Sparse Representation
With Residual Compensation for Voice
Conversion . . . . . . . . . . . . . . . 1506--1521
D. S. Talagala and
Wen Zhang and
T. D. Abhayapala Efficient Multi-Channel Adaptive Room
Compensation for Spatial Soundfield
Reproduction Using a Modal Decomposition 1522--1532
O. Abdel-Hamid and
A.-R. Mohamed and
Hui Jiang and
Li Deng and
G. Penn and
Dong Yu Convolutional Neural Networks for Speech
Recognition . . . . . . . . . . . . . . 1533--1545
S. Koyama and
K. Furuya and
Y. Hiwasaki and
Y. Haneda and
Y. Suzuki Wave Field Reconstruction Filtering in
Cylindrical Harmonic Domain for
With-Height Recording and Reproduction 1546--1557
Chia-Ping Chen and
Yi-Chin Huang and
Chung-Hsien Wu and
Kuan-De Lee Polyglot Speech Synthesis Based on
Cross-Lingual Frame Selection Using
Auditory and Articulatory Features . . . 1558--1570
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 1571--1572
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Information for
Authors . . . . . . . . . . . . . . . . 1573--1574
Anonymous Open Access . . . . . . . . . . . . . . 1575--1575
Anonymous Together, we are advancing technology 1576--1576
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1577--1578
Anonymous Table of Contents . . . . . . . . . . . 1579--1580
Jian Xu and
Zhi-Jie Yan and
Qiang Huo An Unsupervised Adaptation Approach to
Leveraging Feedback Loop Data by Using
$i$-Vector for Data Clustering and
Selection . . . . . . . . . . . . . . . 1581--1589
S. Cumani and
P. Laface Large-Scale Training of Pairwise Support
Vector Machines for Speaker Recognition 1590--1600
Jun Du and
Qiang Huo An Improved VTS Feature Compensation
using Mixture Models of Distortion and
IVN Training for Noisy Speech
Recognition . . . . . . . . . . . . . . 1601--1611
M. Togami and
Y. Kawaguchi Simultaneous Optimization of Acoustic
Echo Reduction, Speech Dereverberation,
and Noise Reduction against Mutual
Interference . . . . . . . . . . . . . . 1612--1623
J. Lorente and
M. Ferrer and
M. de Diego and
A. Gonzalez GPU Implementation of Multichannel
Adaptive Algorithms for Local Active
Noise Control . . . . . . . . . . . . . 1624--1635
T. Helie Simulation of Fractional-Order Low-Pass
Filters . . . . . . . . . . . . . . . . 1636--1647
B. Defraene and
T. van Waterschoot and
M. Diehl and
M. Moonen Embedded-Optimization-Based Loudspeaker
Precompensation Using a Hammerstein
Loudspeaker Model . . . . . . . . . . . 1648--1659
Guangsen Wang and
Khe Chai Sim Regression-Based Context-Dependent
Modeling of Deep Neural Networks for
Speech Recognition . . . . . . . . . . . 1660--1669
R. Badeau and
M. D. Plumbley Multichannel High-Resolution NMF for
Modeling Convolutive Mixtures of
Non-Stationary Signals in the
Time-Frequency Domain . . . . . . . . . 1670--1680
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 1683--1684
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Information for
Authors . . . . . . . . . . . . . . . . 1685--1686
Anonymous [Blank page] . . . . . . . . . . . . . . B1681--B1682
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 1683--1685
Deng Farewell editorial: Keeping up the
momentum of innovations . . . . . . . . 1687--1687
S. H. Yella and
H. Bourlard Overlapping Speech Detection Using
Long-Term Conversational Features for
Speaker Diarization in Meeting Room
Conversations . . . . . . . . . . . . . 1688--1700
R. K. Chivukula and
Y. A. Reznik and
Yanyan Hu and
V. Devarajan and
M. Jayendra-Lakshman Fast Algorithms for Low-Delay TDAC
Filterbanks in MPEG-4 AAC--ELD . . . . . 1701--1712
Shaofei Xue and
O. Abdel-Hamid and
Hui Jiang and
Lirong Dai and
Qingfeng Liu Fast Adaptation of Deep Neural Network
Based on Discriminant Codes for Speech
Recognition . . . . . . . . . . . . . . 1713--1725
M. E. P. Davies and
P. Hamel and
K. Yoshii and
M. Goto AutoMashUpper: Automatic Creation of
Multi-Song Music Mashups . . . . . . . . 1726--1737
Chao Weng and
D. L. Thomson and
P. Haffner and
B.-H. F. Juang Latent Semantic Rational Kernels for
Topic Spotting on Conversational Speech 1738--1749
N. Wachowski and
M. R. Azimi-Sadjadi Detection and Classification of
Nonstationary Transient Signals Using
Sparse Approximations and Bayesian
Networks . . . . . . . . . . . . . . . . 1750--1764
G. Percival and
G. Tzanetakis Streamlined Tempo Estimation Based on
Autocorrelation and Cross-correlation
With Pulses . . . . . . . . . . . . . . 1765--1776
A. Barkefors and
M. Sternad and
L.-J. Brannmark Design and Analysis of Linear Quadratic
Gaussian Feedforward Controllers for
Active Noise Control . . . . . . . . . . 1777--1791
M. Cobos and
J. J. Perez-Solano and
S. Felici-Castell and
J. Segura and
J. M. Navarro Cumulative-Sum-Based Localization of
Sound Events in Low-Cost Wireless
Acoustic Sensor Networks . . . . . . . . 1792--1802
V. Tourbabin and
B. Rafaely Theoretical Framework for the
Optimization of Microphone Array
Configuration for Humanoid Robot
Audition . . . . . . . . . . . . . . . . 1803--1814
Y. Zakharov and
V. H. Nascimento Sliding-Window RLS Low-Cost
Implementation of Proportionate Affine
Projection Algorithms . . . . . . . . . 1815--1824
S. D'Angelo and
V. Valimaki Generalized Moog Ladder Filter: Part I
--- Linear Analysis and Parameterization 1825--1832
Na Yang and
He Ba and
Weiyang Cai and
I. Demirkol and
W. Heinzelman BaNa: a Noise Resilient Fundamental
Frequency Detection Algorithm for Speech
and Music . . . . . . . . . . . . . . . 1833--1848
Yuxuan Wang and
A. Narayanan and
Deliang Wang On Training Targets for Supervised
Speech Separation . . . . . . . . . . . 1849--1858
Ling-Hui Chen and
Zhen-Hua Ling and
Li-Juan Liu and
Li-Rong Dai Voice Conversion Using Deep Neural
Networks With Layer-Wise Generative
Training . . . . . . . . . . . . . . . . 1859--1872
S. D'Angelo and
V. Valimaki Generalized Moog Ladder Filter: Part II
--- Explicit Nonlinear Model through a
Novel Delay-Free Loop Implementation
Method . . . . . . . . . . . . . . . . . 1873--1883
Z. Rafii and
Zhiyao Duan and
B. Pardo Combining Rhythm-Based and Pitch-Based
Methods for Background and Melody
Separation . . . . . . . . . . . . . . . 1884--1893
J. Ramo and
V. Valimaki and
B. Bank High-Precision Parallel Graphic
Equalizer . . . . . . . . . . . . . . . 1894--1904
Y. Panagakis and
C. L. Kotropoulos and
G. R. Arce Music Genre Classification via Joint
Sparse Low-Rank Representation of Audio
Features . . . . . . . . . . . . . . . . 1905--1917
A. Maezawa and
K. Itoyama and
K. Yoshii and
H. G. Okuno Nonparametric Bayesian Dereverberation
of Power Spectrograms Based on
Infinite-Order Autoregressive Processes 1918--1930
M. Krawczyk and
T. Gerkmann STFT Phase Reconstruction in Voiced
Speech for an Improved Single-Channel
Speech Enhancement . . . . . . . . . . . 1931--1940
V. Khanagha and
K. Daoudi and
H. M. Yahia Detection of Glottal Closure Instants
Based on the Microcanonical Multiscale
Formalism . . . . . . . . . . . . . . . 1941--1950
A. Venturini and
L. Zao and
R. Coelho On speech features fusion, $ \alpha
$-integration Gaussian modeling and
multi-style training for noise robust
speaker classification . . . . . . . . . 1951--1964
P. Foster and
M. Mauch and
S. Dixon Sequential Complexity as a Descriptor
for Musical Similarity . . . . . . . . . 1965--1977
Gang Liu and
J. H. L. Hansen An Investigation into Back-end
Advancements for Speaker Recognition in
Multi-Session and Noisy Enrollment
Scenarios . . . . . . . . . . . . . . . 1978--1992
Jitong Chen and
Yuxuan Wang and
Deliang Wang A Feature Study for Classification-Based
Speech Separation at Low Signal-to-Noise
Ratios . . . . . . . . . . . . . . . . . 1993--2002
J. van Mourik and
D. Murphy Explicit Higher-Order FDTD Schemes for
$3$D Room Acoustic Simulation . . . . . 2003--2011
Pei Chee Yong and
S. Nordholm and
Hai Huyen Dam Effective Binaural Multi-Channel
Processing Algorithm for Improved
Environmental Presence . . . . . . . . . 2012--2024
A. Chen and
M. A. Hasegawa-Johnson Mixed Stereo Audio Classification Using
a Stereo-Input Mixed-to-Panned Level
Feature . . . . . . . . . . . . . . . . 2025--2033
Gongping Huang and
J. Benesty and
Tao Long and
Jingdong Chen A Family of Maximum SNR Filters for
Noise Reduction . . . . . . . . . . . . 2034--2047
Su Yan and
Xiaojun Wan SRRank: Leveraging Semantic Roles for
Extractive Multi-Document Summarization 2048--2058
H. Tachibana and
N. Ono and
H. Kameoka and
S. Sagayama Harmonic/Percussive Sound Separation
Based on Anisotropic Smoothness of
Spectrograms . . . . . . . . . . . . . . 2059--2073
J. M. Gil-Cacho and
T. van Waterschoot and
M. Moonen and
S. H. Jensen A Frequency-Domain Adaptive Filter
(FDAF) Prediction Error Method (PEM)
Framework for Double-Talk-Robust
Acoustic Echo Cancellation . . . . . . . 2074--2086
Qi Wang and
W. L. Woo and
S. S. Dlay Informed Single-Channel Speech
Separation Using HMM--GMM User-Generated
Exemplar Source . . . . . . . . . . . . 2087--2100
D. Erro and
T.-C. Zorila and
Y. Stylianou Enhancing the Intelligibility of
Statistically Generated Synthetic Speech
by Means of Noise-Independent
Modifications . . . . . . . . . . . . . 2101--2111
Yi Jiang and
Deliang Wang and
Runsheng Liu and
ZhenMing Feng Binaural Classification for Reverberant
Speech Segregation Using Deep Neural
Networks . . . . . . . . . . . . . . . . 2112--2121
Li Su and
Hsin-Ming Lin and
Yi-Hsuan Yang Sparse Modeling of Magnitude and
Phase-Derived Spectra for Playing
Technique Classification . . . . . . . . 2122--2132
V. V. Reddy and
A. W. H. Khong and
Boon Poh Ng Unambiguous Speech DOA Estimation Under
Spatial Aliasing Conditions . . . . . . 2133--2145
A. Mohammadi and
S. S. Sarfjoo and
C. Demiroglu Eigenvoice Speaker Adaptation with
Minimal Data for Statistical Speech
Synthesis Systems Using a MAP Approach
and Nearest-Neighbors . . . . . . . . . 2146--2157
Kun Han and
Deliang Wang Neural Network Based Pitch Tracking in
Very Noisy Speech . . . . . . . . . . . 2158--2168
Yongsheng Mu and
Peifeng Ji and
Wei Ji and
Ming Wu and
Jun Yang Modeling and Compensation for the
Distortion of Parametric Loudspeakers
Using a One-Dimension Volterra Filter 2169--2181
O. Thiergart and
M. Taseska and
E. A. P. Habets An Informed Parametric Spatial Filter
Based on Instantaneous
Direction-of-Arrival Estimates . . . . . 2182--2196
J. F. Santos and
T. H. Falk Updating the SRMR--CI Metric for
Improved Intelligibility Prediction for
Cochlear Implant Users . . . . . . . . . 2197--2206
Seon Man Kim and
Hong Kook Kim Direction-of-Arrival Based SNR
Estimation for Dual-Microphone Speech
Enhancement . . . . . . . . . . . . . . 2207--2217
T. Otsuka and
K. Ishiguro and
T. Yoshioka and
H. Sawada and
H. G. Okuno Multichannel Sound Source
Dereverberation and Separation for
Arbitrary Number of Sources Based on
Bayesian Nonparametrics . . . . . . . . 2218--2232
J. Traa and
P. Smaragdis Multichannel Source Separation and
Tracking With RANSAC and Directional
Statistics . . . . . . . . . . . . . . . 2233--2243
Weifeng Li and
Longbiao Wang and
Yicong Zhou and
J. Dines and
M. Magimai-Doss and
H. Bourlard and
Qingmin Liao Feature Mapping of Multiple Beamformed
Sources for Robust Overlapping Speech
Recognition Using a Microphone Array . . 2244--2255
Y. FanChiang and
C.-W. Wei and
Y.-L. Meng and
Y.-W. Lin and
S.-J. Jou and
T.-S. Chang Correction to ``Low Complexity Formant
Estimation Adaptive Feedback
Cancellation for Hearing Aids Using
Pitch Based Processing'' [Aug \bf 14
1248--1259] . . . . . . . . . . . . . . 2256--2256
Anonymous List of Reviewers . . . . . . . . . . . 2257--2259
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 2260--2261
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Information for
Authors . . . . . . . . . . . . . . . . 2262--2263
Anonymous 2014 Index IEEE/ACM Transactions on
Audio, Speech, and Language Processing
Vol. 22 . . . . . . . . . . . . . . . . 2264--2288
Anonymous [Blank page] . . . . . . . . . . . . . . B1686
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 1--2
Anonymous Table of contents . . . . . . . . . . . 3--4
H Li Inaugural Editorial: Embracing New
Opportunities for Growth . . . . . . . . 5--6
Yong Xu and
Jun Du and
Li-Rong Dai and
Chin-Hui Lee A Regression Approach to Speech
Enhancement Based on Deep Neural
Networks . . . . . . . . . . . . . . . . 7--19
H. Phan and
M. Maas and
R. Mazur and
A. Mertins Random Regression Forests for Acoustic
Event Detection and Classification . . . 20--31
Yuntao Wu and
L. Amir and
J. R. Jensen and
Guisheng Liao Joint Pitch and DOA Estimation Using the
ESPRIT Method . . . . . . . . . . . . . 32--45
R. Decorsiere and
P. L. Sòndergaard and
E. N. MacDonald and
T. Dau Inversion of Auditory Spectrograms,
Traditional Spectrograms, and Other
Envelope Representations . . . . . . . . 46--56
J. Poignant and
L. Besacier and
G. Quénot Unsupervised Speaker Identification in
TV Broadcast Based on Written Names . . 57--68
Renjie Tong and
Yingyue Zhou and
Long Zhang and
Guangzhao Bao and
Zhongfu Ye A Robust Time-Frequency Decomposition
Model for Suppression of Mixed
Gaussian-Impulse Noise in Audio Signals 69--79
S. Ahani and
S. Ghaemmaghami and
Z. J. Wang A Sparse Representation-Based Wavelet
Domain Speech Steganography Method . . . 80--91
A. Narayanan and
Deliang Wang Improving Robustness of Deep Neural
Network Acoustic Models via Speech
Separation and Joint Adaptive Training 92--101
Rongfeng Su and
Xunying Liu and
Lan Wang Automatic Complexity Control of
Generalized Variable Parameter HMMs for
Noise Robust Speech Recognition . . . . 102--114
Zixing Zhang and
E. Coutinho and
Jun Deng and
B. Schuller Cooperative Learning and its Application
to Emotion Recognition from Speech . . . 115--126
Pei-hao Su and
Chuan-hsun Wu and
Lin-shan Lee A Recursive Dialogue Game for
Personalized Computer-Aided
Pronunciation Training . . . . . . . . . 127--141
A. Rakotomamonjy and
G. Gasso Histogram of Gradients of
Time--Frequency Representations for
Audio Scene Classification . . . . . . . 142--153
S. A. Khoubrouy and
I. M. S. Panahi and
J. H. L. Hansen Howling Detection in Hearing Aids Based
on Generalized Teager--Kaiser Operator 154--161
J. B. B. Nielsen and
J. Nielsen and
J. Larsen Perception-Based Personalization of
Hearing Aids Using Gaussian Processes
and Active Learning . . . . . . . . . . 162--173
J. R. Jensen and
M. G. Christensen and
J. Benesty and
S. H. Jensen Joint Spatio-Temporal Filtering Methods
for DOA and Fundamental Frequency
Estimation . . . . . . . . . . . . . . . 174--185
J. Jensen and
Zheng-Hua Tan Minimum Mean-Square Error Estimation of
Mel-Frequency Cepstral Features --- A
Theoretically Consistent Approach . . . 186--197
C.-D. Martinez-Hinarejos and
J.-M. Benedi and
V. Tamarit Unsegmented Dialogue Act Annotation and
Decoding With $N$-Gram Transducers . . . 198--211
Lin Wang and
Zhe Chen and
Fuliang Yin A Novel Hierarchical Decomposition
Vector Quantization Method for
High-Order LPC Parameters . . . . . . . 212--221
Anonymous [Blank page] . . . . . . . . . . . . . . 222--222
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 223--224
Anonymous Information for Authors . . . . . . . . 225--226
Anonymous Open Access . . . . . . . . . . . . . . 227--227
Anonymous [Front cover] . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 223--224
Anonymous Table of contents . . . . . . . . . . . 225--226
Guang Hua and
J. Goh and
V. L. L. Thing Time-Spread Echo-Based Audio
Watermarking With Optimized
Imperceptibility and Robustness . . . . 227--239
O. Schwartz and
S. Gannot and
E. A. P. Habets Multi-Microphone Speech Dereverberation
and Noise Reduction Using Relative Early
Transfer Functions . . . . . . . . . . . 240--251
E. Molina and
L. J. Tardon and
A. M. Barbancho and
I. Barbancho SiPTH: Singing Transcription Based on
Hysteresis Defined on the Pitch-Time
Curve . . . . . . . . . . . . . . . . . 252--263
Haipeng Wang and
Tan Lee and
Cheung-Chi Leung and
Bin Ma and
Haizhou Li Acoustic Segment Modeling with Spectral
Clustering Methods . . . . . . . . . . . 264--277
V. Arora and
L. Behera Multiple F0 Estimation and Source
Clustering of Polyphonic Music Audio
Using PLCA and HMRFs . . . . . . . . . . 278--287
R. Sugiura and
Y. Kamamoto and
N. Harada and
H. Kameoka and
T. Moriya Resolution Warped Spectral
Representation for Low-Delay and
Low-Bit-Rate Audio Coder . . . . . . . . 288--299
Chao Weng and
B.-H. F. Juang Discriminative Training Using
Non-Uniform Criteria for Keyword
Spotting on Spontaneous Speech . . . . . 300--312
Y. Matsuyama and
A. Saito and
S. Fujie and
T. Kobayashi Automatic Expressive Opinion Sentence
Generation for Enjoyable Conversational
Systems . . . . . . . . . . . . . . . . 313--326
P. N. Petkov and
W. B. Kleijn Spectral Dynamics Recovery for Enhanced
Speech Intelligibility in Noise . . . . 327--338
E. Bicici and
D. Yuret Optimizing Instance Selection for
Statistical Machine Translation with
Feature Decay Algorithms . . . . . . . . 339--350
Mengqiu Zhang and
R. A. Kennedy and
T. D. Abhayapala Empirical Determination of Frequency
Representation in Spherical
Harmonics-Based HRTF Functional Modeling 351--360
Zu-Ren Feng and
Qing Zhou and
Jun Zhang and
Ping Jiang and
Xue-Wen Yang A Target Guided Subband Filter for
Acoustic Event Detection in Noisy
Environments Using Wavelet Packets . . . 361--372
N. Hirayama and
K. Yoshino and
K. Itoyama and
S. Mori and
H. G. Okuno Automatic Speech Recognition for Mixed
Dialect Utterances by Mixing Dialect
Language Models . . . . . . . . . . . . 373--382
A. Schasse and
T. Gerkmann and
R. Martin and
W. Sorgel and
T. Pilgrim and
H. Puder Two-Stage Filter-Bank System for
Improved Single-Channel Noise Reduction
in Hearing Aids . . . . . . . . . . . . 383--393
B. Schwartz and
S. Gannot and
E. A. P. Habets Online Speech Dereverberation Using
Kalman Filter and EM Algorithm . . . . . 394--406
B. Gerazov and
Z. Ivanovski Kernel Power Flow Orientation
Coefficients for Noise-Robust Speech
Recognition . . . . . . . . . . . . . . 407--419
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 420--421
Anonymous Information for Authors . . . . . . . . 422--423
Anonymous Open Access . . . . . . . . . . . . . . 424--424
Anonymous [Front cover] . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 425--426
H. Li and
M. Federico and
X. He and
H. Meng and
I. Trancoso Introduction to the Special Section on
Continuous Space and Related Methods in
Natural Language Processing . . . . . . 427--430
H. Adel and
Ngoc Thang Vu and
K. Kirchhoff and
D. Telaar and
T. Schultz Syntactic and Semantic Features For
Code-Switching Factored Language Models 431--440
Xiaodong Zeng and
D. F. Wong and
L. S. Chao and
I. Trancoso Graph-Based Lexicon Regularization for
PCFG With Latent Annotations . . . . . . 441--450
Wenliang Chen and
Min Zhang and
Yue Zhang Distributed Feature Representations for
Dependency Parsing . . . . . . . . . . . 451--460
Ruiji Fu and
Jiang Guo and
Bing Qin and
Wanxiang Che and
Haifeng Wang and
Ting Liu Learning Semantic Hierarchies: a
Continuous Vector Space Approach . . . . 461--471
R. E. Banchs and
L. F. D'Haro and
Haizhou Li Adequacy--Fluency Metrics: Evaluating MT
in the Continuous Space Model Framework 472--482
Deyi Xiong and
Min Zhang and
Xing Wang Topic-Based Coherence Modeling for
Statistical Machine Translation . . . . 483--493
B. Hutchinson and
M. Ostendorf and
M. Fazel A Sparse Plus Low-Rank Exponential
Language Model for Limited Resource
Scenarios . . . . . . . . . . . . . . . 494--504
M. A. A. Rashwan and
A. A. Al Sallab and
H. M. Raafat and
A. Rafea Deep Learning Framework with Confused
Sub-Set Resolution Architecture for
Automatic Arabic Diacritization . . . . 505--516
M. Sundermeyer and
H. Ney and
R. Schluter From Feedforward to Recurrent LSTM
Neural Networks for Language Modeling 517--529
G. Mesnil and
Y. Dauphin and
Kaisheng Yao and
Y. Bengio and
Li Deng and
D. Hakkani-Tur and
Xiaodong He and
L. Heck and
G. Tur and
Dong Yu and
G. Zweig Using Recurrent Neural Networks for Slot
Filling in Spoken Language Understanding 530--539
I. McLoughlin and
Haomin Zhang and
Zhipeng Xie and
Yan Song and
Wei Xiao Robust Sound Event Classification Using
Deep Neural Networks . . . . . . . . . . 540--552
D. Zahoransky and
I. Polasek Text Search of Surnames in Some Slavic
and Other Morphologically Rich Languages
Using Rule Based Phonetic Algorithms . . 553--563
Yow-Bang Wang and
Lin-shan Lee Supervised Detection and Unsupervised
Discovery of Pronunciation Error
Patterns for Computer-Assisted Language
Learning . . . . . . . . . . . . . . . . 564--579
T. Nakashika and
T. Takiguchi and
Y. Ariki Voice Conversion Using RNN Pre-Trained
by Recurrent Temporal Restricted
Boltzmann Machines . . . . . . . . . . . 580--587
N. Obin and
P. Lanchantin Symbolic Modeling of Prosody: From
Linguistics to Statistics . . . . . . . 588--599
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 601--602
Anonymous Information for Authors . . . . . . . . 603--604
Anonymous IEEE Member Digital Library . . . . . . 606--606
Anonymous Blank page . . . . . . . . . . . . . . . B600
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 601--602
Anonymous Table of Contents . . . . . . . . . . . 603--604
Langzhou Chen and
N. Braunschweiler and
M. J. F. Gales Speaker and Expression Factorization for
Audiobook Data: Expressiveness and
Transplantation . . . . . . . . . . . . 605--618
Xinjie Zhou and
Xiaojun Wan and
Jianguo Xiao CLOpinionMiner: Opinion Target
Extraction in a Cross-Language Scenario 619--630
Pan Zhou and
Hui Jiang and
Li-Rong Dai and
Yu Hu and
Qing-Feng Liu State-Clustering Based Multiple Deep
Neural Networks Modeling Approach for
Speech Recognition . . . . . . . . . . . 631--642
Ying Hu and
Guizhong Liu Separation of Singing Voice Using
Nonnegative Matrix Partial
Co-Factorization for Singer
Identification . . . . . . . . . . . . . 643--653
D. Kitamura and
H. Saruwatari and
H. Kameoka and
Yu. Takahashi and
K. Kondo and
S. Nakamura Multichannel Signal Separation Combining
Directional Clustering and Nonnegative
Matrix Factorization with Spectrogram
Restoration . . . . . . . . . . . . . . 654--669
Van-Khanh Mai and
D. Pastor and
A. Aissa-El-Bey and
R. Le-Bidan Robust Estimation of Non-Stationary
Noise Power Spectrum for Speech
Enhancement . . . . . . . . . . . . . . 670--682
E. Blanco and
D. Moldovan A Semantic Logic-Based Approach to
Determine Textual Similarity . . . . . . 683--693
Myung Jong Kim and
Younggwan Kim and
Hoirin Kim Automatic Intelligibility Assessment of
Dysarthric Speech Using
Phonologically-Structured Sparse Linear
Model . . . . . . . . . . . . . . . . . 694--704
G. Aneeja and
B. Yegnanarayana Single Frequency Filtering Approach for
Discriminating Speech and Nonspeech . . 705--717
A. Deleforge and
R. Horaud and
Y. Y. Schechner and
L. Girin Co-Localization of Audio Sources in
Images Using Binaural Features and
Locally-Linear Regression . . . . . . . 718--731
D. Dov and
R. Talmon and
I. Cohen Audio-Visual Voice Activity Detection
Using Diffusion Maps . . . . . . . . . . 732--745
M. Habibi and
A. Popescu-Belis Keyword Extraction and Clustering for
Document Recommendation in Conversations 746--759
N. Mamun and
W. A. Jassim and
M. S. A. Zilany Prediction of Speech Intelligibility
Using a Neurogram Orthogonal Polynomial
Measure (NOPM) . . . . . . . . . . . . . 760--773
E. De Sena and
N. Antonello and
M. Moonen and
T. van Waterschoot On the Modeling of Rectangular
Geometries in Room Acoustic Simulations 774--786
Hao Huang and
Haihua Xu and
Xianhui Wang and
W. Silamu Maximum F1-Score Discriminative Training
Criterion for Automatic Mispronunciation
Detection . . . . . . . . . . . . . . . 787--797
Chung-Che Wang and
J.-S. R. Jang Improving Query-by-Singing/Humming by
Combining Melody and Lyric Information 798--806
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 807--808
Anonymous Information for Authors . . . . . . . . 809--810
Anonymous IEEE Member Digital Library . . . . . . 812--812
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 813--814
Anonymous Table of Contents . . . . . . . . . . . 815--816
F. Krebs and
A. Holzapfel and
A. T. Cemgil and
G. Widmer Inferring Metrical Structure in Music
Using Particle Filters . . . . . . . . . 817--827
Janghoon Cho and
C. D. Yoo Underdetermined Convolutive BSS: Bayes
Risk Minimization Based on a Mixture of
Super-Gaussian Posterior Approximation 828--839
Hao Mu and
Woon-Seng Gan and
Ee-Leng Tan An Objective Analysis Method for
Perceptual Quality of a Virtual Bass
System . . . . . . . . . . . . . . . . . 840--850
R. C. Hendriks and
J. B. Crespo and
J. Jensen and
C. H. Taal Optimal Near-End Speech Intelligibility
Improvement Incorporating Additive Noise
and Late Reverberation Under an
Approximation of the Short-Time SII . . 851--862
A. H. Abdelaziz and
S. Zeiler and
D. Kolossa Learning Dynamic Stream Weights For
Coupled-HMM-Based Audio-Visual Speech
Recognition . . . . . . . . . . . . . . 863--876
R. Berkun and
I. Cohen and
J. Benesty Combined Beamformers for Robust
Broadband Regularized Superdirective
Beamforming . . . . . . . . . . . . . . 877--886
J. Breebaart Evaluation of Statistical Inference
Tests Applied to Subjective Audio
Quality Data With Small Sample Size . . 887--897
M. Zivanovi\'c Harmonic Bandwidth Companding for
Separation of Overlapping Harmonics in
Pitched Signals . . . . . . . . . . . . 898--908
Jen-Tzung Chien Laplace Group Sensing for Acoustic
Models . . . . . . . . . . . . . . . . . 909--922
Ying Wei and
Yinfeng Wang Design of Low Complexity Adjustable
Filter Bank for Personalized Hearing Aid
Solutions . . . . . . . . . . . . . . . 923--931
A. Perez-Carrillo and
M. M. Wanderley Indirect Acquisition of Violin
Instrumental Controls from Audio Signal
with Hidden Markov Models . . . . . . . 932--940
A. Mansikkaniemi and
M. Kurimo Adaptation of Morph-Based Speech
Recognition for Foreign Names and
Acronyms . . . . . . . . . . . . . . . . 941--950
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 953--954
Anonymous Information for Authors . . . . . . . . 955--956
Anonymous Open Access . . . . . . . . . . . . . . 957--957
Anonymous Blank page . . . . . . . . . . . . . . . B951--B952
Anonymous [Front cover] . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 953--954
Anonymous Table of Contents . . . . . . . . . . . 955--956
Shih-Hung Liu and
Kuan-Yu Chen and
B. Chen and
Hsin-Min Wang and
Hsu-Chun Yen and
Wen-Lian Hsu Combining Relevance Language Modeling
and Clarity Measure for Extractive
Speech Summarization . . . . . . . . . . 957--969
M. Niedzwiecki and
M. Ciolek and
K. Cisowski Elimination of Impulsive Disturbances
From Stereo Audio Recordings Using
Vector Autoregressive Modeling and
Variable-order Kalman Filtering . . . . 970--981
Kun Han and
Yuxuan Wang and
Deliang Wang and
W. S. Woods and
I. Merks and
Tao Zhang Learning Spectral Mapping for Speech
Dereverberation and Denoising . . . . . 982--992
P. Foster and
S. Dixon and
A. Klapuri Identifying Cover Songs Using
Information-Theoretic Measures of
Similarity . . . . . . . . . . . . . . . 993--1005
A. Schwarz and
W. Kellermann Coherent-to-Diffuse Power Ratio
Estimation for Dereverberation . . . . . 1006--1018
M. Cernak and
P. N. Garner and
A. Lazaridis and
P. Motlicek and
Xingyu Na Incremental Syllable-Context Phonetic
Vocoding . . . . . . . . . . . . . . . . 1019--1030
M. Rouvier and
S. Oger and
G. Linares and
D. Matrouf and
B. Merialdo and
Y. Li Audio-Based Video Genre Identification 1031--1041
H. Kameoka and
K. Yoshizato and
T. Ishihara and
K. Kadowaki and
Y. Ohishi and
K. Kashino Generative Modeling of Voice Fundamental
Frequency Contours . . . . . . . . . . . 1042--1053
Dejan Markovi\'c and
Fabio Antonacci and
Augusto Sarti and
Stefano Tubaro Multiview Soundfield Imaging in the
Projective Ray Space . . . . . . . . . . 1054--1067
A. P. Bates and
Z. Khalid and
R. A. Kennedy Novel Sampling Scheme on the Sphere for
Head-Related Transfer Function
Measurements . . . . . . . . . . . . . . 1068--1081
Maoshen Jia and
Ziyu Yang and
Changchun Bao and
Xiguang Zheng and
C. Ritz Encoding Multiple Audio Objects Using
Intra-Object Sparsity . . . . . . . . . 1082--1095
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 1096--1097
Anonymous Information for Authors . . . . . . . . 1098--1099
Anonymous Open Access . . . . . . . . . . . . . . 1100--1100
Anonymous [Front cover] . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1101--1102
Anonymous Table of Contents . . . . . . . . . . . 1103--1104
M. McVicar and
S. Fukayama and
M. Goto AutoGuitarTab: Computer-Aided
Composition of Rhythm and Lead Guitar
Parts in the Tablature Space . . . . . . 1105--1117
M. Van Segbroeck and
R. Travadi and
S. S. Narayanan Rapid Language Identification . . . . . 1118--1129
D. Marelli and
R. Baumgartner and
P. Majdak Efficient Approximation of Head-Related
Transfer Functions in Subbands for
Accurate Sound Localization . . . . . . 1130--1143
Ching-Feng Yeh and
Lin-shan Lee An Improved Framework for Recognizing
Highly Imbalanced Bilingual
Code-Switched Lectures with
Cross-Language Acoustic Modeling and
Frame-Level Language Identification . . 1144--1159
D. Basaran and
A. T. Cemgil and
E. Anarim A Probabilistic Model-Based Approach for
Aligning Multiple Audio Sequences . . . 1160--1171
Dongpeng Chen and
B. K.-W. Mak Multitask Learning of Deep Neural
Networks for Low-Resource Speech
Recognition . . . . . . . . . . . . . . 1172--1183
T. Meyer and
N. Hajlaoui and
A. Popescu-Belis Disambiguating Discourse Connectives for
Statistical Machine Translation . . . . 1184--1197
U. Remes and
A. Ramirez Lopez and
K. Palomaki and
M. Kurimo Bounded Conditional Mean Imputation with
Observation Uncertainties and Acoustic
Model Adaptation . . . . . . . . . . . . 1198--1208
Rui Wang and
Hai Zhao and
Bao-Liang Lu and
M. Utiyama and
E. Sumita Bilingual Continuous-Space Language
Model Growing for Statistical Machine
Translation . . . . . . . . . . . . . . 1209--1220
Tze Yuang Chong and
R. E. Banchs and
Eng Siong Chng and
Haizhou Li Decoupling Word-Pair Distance and
Co-occurrence Information for Effective
Long History Context Language Modeling 1221--1232
Meng Sun and
Yinan Li and
J. F. Gemmeke and
Xiongwei Zhang Speech Enhancement Under Low SNR
Conditions Via Noise Estimation Using
Sparse and Low-Rank NMF with
Kullback--Leibler Divergence . . . . . . 1233--1242
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 1245--1246
Anonymous Information for Authors . . . . . . . . 1247--1248
Anonymous Blank page . . . . . . . . . . . . . . . B1243--B1244
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1245--1246
Anonymous Table of Contents . . . . . . . . . . . 1247--1248
H. Momeni and
H. R. Abutalebi and
A. Tadaion Joint Detection and Estimation of Speech
Spectral Amplitude Using Noncontinuous
Gain Functions . . . . . . . . . . . . . 1249--1258
Jen-Tzung Chien Hierarchical Pitman--Yor--Dirichlet
Language Model . . . . . . . . . . . . . 1259--1272
M. Fallahpour and
D. Megias Audio Watermarking Based on Fibonacci
Numbers . . . . . . . . . . . . . . . . 1273--1282
P. Mowlaee and
J. Kulmer Phase Estimation in Single-Channel
Speech Enhancement: Limits-Potential . . 1283--1294
M. Morchid and
M. Bouallegue and
R. Dufour and
G. Linares and
D. Matrouf and
R. De Mori Compact Multiview Representation of
Documents Based on the Total Variability
Space . . . . . . . . . . . . . . . . . 1295--1308
R. Sugiura and
Y. Kamamoto and
N. Harada and
H. Kameoka and
T. Moriya Optimal Coding of
Generalized-Gaussian-Distributed
Frequency Spectra for Low-Delay Audio
Coder With Powered All-Pole Spectrum
Estimation . . . . . . . . . . . . . . . 1309--1321
Kuan-Yu Chen and
Shih-Hung Liu and
B. Chen and
Hsin-Min Wang and
Ea-Ee Jan and
Wen-Lian Hsu and
Hsin-Hsi Chen Extractive Broadcast News Summarization
Leveraging Recurrent Neural Network
Language Modeling Techniques . . . . . . 1322--1334
Z. Koldovsky and
J. Malek and
S. Gannot Spatial Source Subtraction Based on
Incomplete Measurements of Relative
Transfer Function . . . . . . . . . . . 1335--1347
D. Dimitriadis and
E. Bocchieri Use of Micro-Modulation Features in
Large Vocabulary Continuous Speech
Recognition Tasks . . . . . . . . . . . 1348--1357
Xun Wang and
Y. Yoshida and
T. Hirao and
M. Nagata and
K. Sudoh Summarization Based on Task-Oriented
Discourse Parsing . . . . . . . . . . . 1358--1367
C. Spa and
A. Rey and
E. Hernandez A GPU Implementation of an Explicit
Compact FDTD Algorithm with a Digital
Impedance Filter for Room Acoustics
Applications . . . . . . . . . . . . . . 1368--1380
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 1381--1382
Anonymous Information for Authors . . . . . . . . 1383--1384
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1385--1386
Anonymous Table of Contents . . . . . . . . . . . 1387--1388
Lin-shan Lee and
J. Glass and
Hung-yi Lee and
Chun-an Chan Spoken Content Retrieval --- Beyond
Cascading Speech Recognition with Text
Retrieval . . . . . . . . . . . . . . . 1389--1420
Yishan Jiao and
V. Berisha and
Ming Tu and
J. Liss Convex Weighting Criteria for Speaking
Rate Estimation . . . . . . . . . . . . 1421--1430
Jianjun He and
Woon-Seng Gan and
Ee-Leng Tan Primary-Ambient Extraction Using Ambient
Spectrum Estimation for Immersive
Spatial Audio Reproduction . . . . . . . 1431--1444
Qing Shen and
Wei Liu and
Wei Cui and
Siliang Wu and
Y. D. Zhang and
M. G. Amin Low-Complexity Direction-of-Arrival
Estimation Based on Wideband Co-Prime
Arrays . . . . . . . . . . . . . . . . . 1445--1456
Yu-Ren Chien and
Hsin-Min Wang and
Shyh-Kang Jeng An Acoustic-Phonetic Model of F0
Likelihood for Vocal Melody Extraction 1457--1468
Xiaodong Cui and
V. Goel and
B. Kingsbury Data Augmentation for Deep Neural
Network Acoustic Modeling . . . . . . . 1469--1477
E. De Sena and
H. Hacìhabibo\uglu and
Z. Cvetkovi\'c and
J. O. Smith Efficient Synthesis of Room Acoustics
via Scattering Delay Networks . . . . . 1478--1492
Lin Wang and
T. Gerkmann and
S. Doclo Noise Power Spectral Density Estimation
Using MaxNSR Blocking Matrix . . . . . . 1493--1508
A. Jukic and
T. van Waterschoot and
T. Gerkmann and
S. Doclo Multi-Channel Linear Prediction-Based
Speech Dereverberation With Sparse
Priors . . . . . . . . . . . . . . . . . 1509--1520
P. Mowlaee and
J. Kulmer Harmonic Phase Estimation in
Single-Channel Speech Enhancement Using
Phase Decomposition and SNR Information 1521--1532
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 1535--1536
Anonymous Information for Authors . . . . . . . . 1537--1538
Anonymous How can you get your idea to market
first? . . . . . . . . . . . . . . . . . 1539--1539
Anonymous Blank page . . . . . . . . . . . . . . . B1533--B1534
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1535--1536
Anonymous Table of Contents . . . . . . . . . . . 1537--1538
S. Tervo and
A. Politis Direction of Arrival Estimation of
Reflections from Room Impulse Responses
Using a Spherical Microphone Array . . . 1539--1551
Jia-Ching Wang and
Yu-Hao Chin and
Bo-Wei Chen and
Chang-Hong Lin and
Chung-Hsien Wu Speech Emotion Verification Using
Emotion Variance Modeling and
Discriminant Scale-Frequency Maps . . . 1552--1562
A. Canclini and
P. Bestagini and
F. Antonacci and
M. Compagnoni and
A. Sarti and
S. Tubaro A Robust and Low-Complexity Source
Localization Algorithm for Asynchronous
Distributed Microphone Networks . . . . 1563--1575
Jianjun He and
Woon-Seng Gan and
Ee-Leng Tan Time-Shifting Based Primary-Ambient
Extraction for Spatial Audio
Reproduction . . . . . . . . . . . . . . 1576--1588
P. Shah and
I. Lewis and
S. Grant and
S. Angrignon Nonlinear Acoustic Echo Cancellation
Using Voltage and Current Feedback . . . 1589--1599
Li Su and
Yi-Hsuan Yang Combining Spectral and Temporal
Representations for Multipitch
Estimation of Polyphonic Music . . . . . 1600--1612
T. Fujioka and
Y. Nagata and
M. Abe High-Precision Harmonic Distortion Level
Measurement of a Loudspeaker Using
Adaptive Filters in a Noisy Environment 1613--1622
Tsz-Kin Hon and
Lin Wang and
J. D. Reiss and
A. Cavallaro Audio Fingerprinting for Multi-Device
Self-Localization . . . . . . . . . . . 1623--1636
Ye Tian and
Zhe Chen and
Fuliang Yin Distributed IMM-Unscented Kalman Filter
for Speaker Tracking in Microphone Array
Networks . . . . . . . . . . . . . . . . 1637--1647
Na Li and
Man-Wai Mak SNR-Invariant PLDA Modeling in
Nonparametric Subspace for Robust
Speaker Verification . . . . . . . . . . 1648--1659
J. Vilkamo and
S. Delikaris-Manias Perceptual Reproduction of Spatial Sound
Using Loudspeaker-Signal-Domain
Parametrization . . . . . . . . . . . . 1660--1669
Chao Weng and
Dong Yu and
M. L. Seltzer and
J. Droppo Deep Neural Networks for Single-Channel
Multi-Talker Speech Recognition . . . . 1670--1679
M. Ruhland and
J. Bitzer and
M. Brandt and
S. Goetze Reduction of Gaussian, Supergaussian,
and Impulsive Noise by Interpolation of
the Binary Mask Residual . . . . . . . . 1680--1691
Y. Dorfan and
S. Gannot Tree-Based Recursive
Expectation-Maximization Algorithm for
Localization of Acoustic Sources . . . . 1692--1703
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing EDICS . . . . . 1704--1705
Anonymous Information for Authors . . . . . . . . 1706--1707
Anonymous How can you get your idea to market
first? . . . . . . . . . . . . . . . . . 1708--1708
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
A. Sarmiento and
I. Duran-Diaz and
A. Cichocki and
S. Cruces A Contrast Function Based on Generalized
Divergences for Solving the Permutation
Problem in Convolved Speech Mixtures . . 1713--1726
Xiaojia Zhao and
Yuxuan Wang and
Deliang Wang Cochannel Speaker Identification in
Anechoic and Reverberant Conditions . . 1727--1736
Liang-Yu Chen and
J.-S. R. Jang Automatic Pronunciation Scoring with
Score Combination by Learning to Rank
and Class-Normalized DP-Based
Quantization . . . . . . . . . . . . . . 1737--1749
Duyu Tang and
Bing Qin and
Furu Wei and
Li Dong and
Ting Liu and
Ming Zhou A Joint Segmentation and Classification
Framework for Sentence Level Sentiment
Classification . . . . . . . . . . . . . 1750--1761
F.-M. Hoffmann and
F. M. Fazi Theoretical Study of Acoustic Circular
Arrays With Tangential Pressure Gradient
Sensors . . . . . . . . . . . . . . . . 1762--1774
N. Souviraa-Labastie and
A. Olivero and
E. Vincent and
F. Bimbot Multi-Channel Audio Source Separation
Using Multiple Deformed References . . . 1775--1787
D. Baby and
T. Virtanen and
J. F. Gemmeke and
H. Van hamme Coupled Dictionaries for Exemplar-Based
Speech Enhancement and Automatic Speech
Recognition . . . . . . . . . . . . . . 1788--1799
M. T. Islam and
C. Shahnaz and
Wei-Ping Zhu and
M. O. Ahmad Speech Enhancement Based on Student
Modeling of Teager Energy Operated
Perceptual Wavelet Packet Coefficients
and a Custom Thresholding Function . . . 1800--1811
Quynh Thi Ngoc Do and
S. Bethard and
M.-F. Moens Domain Adaptation in Semantic Role
Labeling Using a Neural Language Model
and Linguistic Resources . . . . . . . . 1812--1823
H. Aragonda and
C. S. Seelamantula Demodulation of Narrowband Speech
Spectrograms Using the Riesz Transform 1824--1834
D. T. Tran and
E. Vincent and
D. Jouvet Nonparametric Uncertainty Estimation and
Propagation for Noise Robust ASR . . . . 1835--1846
Mei Tu and
Yu Zhou and
Chengqing Zong Exploring Diverse Features for
Statistical Machine Translation Model
Pruning . . . . . . . . . . . . . . . . 1847--1857
G. Okopal and
S. Wisdom and
L. Atlas Speech Analysis With the Strong
Uncorrelating Transform . . . . . . . . 1858--1868
M. F. Simon Galvez and
S. J. Elliott and
J. Cheer Time Domain Optimization of Filters Used
in a Loudspeaker Array for Personal
Audio . . . . . . . . . . . . . . . . . 1869--1878
M. H. Bokaei and
H. Sameti and
Yang Liu Linear Discourse Segmentation of
Multi-Party Meetings Based on Local and
Global Information . . . . . . . . . . . 1879--1891
Chung-Hsien Wu and
Han-Ping Shen and
Chun-Shan Hsu Code-Switching Event Detection by Using
a Latent Language Space Model and the
Delta-Bayesian Information Criterion . . 1892--1903
Zhangli Chen and
V. Hohmann Online Monaural Speech Enhancement Based
on Periodicity Analysis and A Priori SNR
Estimation . . . . . . . . . . . . . . . 1904--1916
S. Sarreshtedari and
M. A. Akhaee and
A. Abbasfar A Watermarking Method for Digital Speech
Self-Recovery . . . . . . . . . . . . . 1917--1925
N. Moritz and
J. Anemuller and
B. Kollmeier An Auditory Inspired Amplitude
Modulation Filter Bank for Robust
Feature Extraction in Automatic Speech
Recognition . . . . . . . . . . . . . . 1926--1937
Yajie Miao and
Hao Zhang and
F. Metze Speaker Adaptive Training of Deep Neural
Network Acoustic Models Using
$I$-Vectors . . . . . . . . . . . . . . 1938--1949
V. Morfi and
G. Degottex and
A. Mouchtaris Speech Analysis and Synthesis with a
Computationally Efficient Adaptive
Harmonic Model . . . . . . . . . . . . . 1950--1962
J. Dennis and
H. D. Tran and
Haizhou Li Generalized Hough Transform for Speech
Pattern Classification . . . . . . . . . 1963--1972
Feng Deng and
Changchun Bao and
W. B. Kleijn Sparse Hidden Markov Models for Speech
Enhancement in Non-Stationary Noise
Environments . . . . . . . . . . . . . . 1973--1987
R. Ranjan and
Woon-Seng Gan Natural Listening over Headphones in
Augmented Reality Using Adaptive
Filtering Techniques . . . . . . . . . . 1988--2002
L.-H. Chen and
T. Raitio and
C. Valentini-Botinhao and
Z.-H. Ling and
J. Yamagishi A Deep Generative Architecture for
Postfiltering in Statistical Parametric
Speech Synthesis . . . . . . . . . . . . 2003--2014
Ho Seon Shin and
T. Fingscheidt and
Hong-Goo Kang A Priori SNR Estimation Using Air- and
Bone-Conduction Microphones . . . . . . 2015--2025
Ji Wu and
Miao Li and
Chin-Hui Lee A Probabilistic Framework for
Representing Dialog Systems and
Entropy-Based Dialog Management Through
Dynamic Stochastic State Evolution . . . 2026--2035
S. Cumani Fast Scoring of Full Posterior PLDA
Models . . . . . . . . . . . . . . . . . 2036--2045
V. Tourbabin and
B. Rafaely Direction of Arrival Estimation Using
Microphone Array Processing for Moving
Humanoid Robots . . . . . . . . . . . . 2046--2058
Y. J. Chu and
S. C. Chan A New Local Polynomial Modeling-Based
Variable Forgetting Factor RLS Algorithm
and Its Acoustic Applications . . . . . 2059--2069
F. de-la-Calle-Silos and
F. J. Valverde-Albacete and
A. Gallardo-Antolin and
C. Pelaez-Moreno Morphologically Filtered
Power-Normalized Cochleograms as Robust,
Biologically Inspired Features for ASR 2070--2080
T. Hirao and
M. Nishino and
Y. Yoshida and
J. Suzuki and
N. Yasuda and
M. Nagata Summarizing a Document by Trimming the
Discourse Tree . . . . . . . . . . . . . 2081--2092
Chao Pan and
Jingdong Chen and
J. Benesty Theoretical Analysis of Differential
Microphone Array Beamforming and an
Improved Solution . . . . . . . . . . . 2093--2105
Wanxiang Che and
Yanyan Zhao and
Honglei Guo and
Zhong Su and
Ting Liu Sentence Compression for Aspect-Based
Sentiment Analysis . . . . . . . . . . . 2111--2124
J. Sheaffer and
M. van Walstijn and
B. Rafaely and
K. Kowalczyk Binaural Reproduction of Finite
Difference Simulations Using Spherical
Array Processing . . . . . . . . . . . . 2125--2135
Po-Sen Huang and
Minje Kim and
M. Hasegawa-Johnson and
P. Smaragdis Joint Optimization of Masks and Deep
Recurrent Neural Networks for Monaural
Source Separation . . . . . . . . . . . 2136--2147
A. Heidel and
Hsiang-Hung Lu and
Lin-Shan Lee Finding Complex Features for Guest
Language Fragment Recovery in
Resource-Limited Code-Mixed Speech
Recognition . . . . . . . . . . . . . . 2148--2161
D. Marquardt and
V. Hohmann and
S. Doclo Interaural Coherence Preservation in
Multi-Channel Wiener Filtering-Based
Noise Reduction for Binaural Hearing
Aids . . . . . . . . . . . . . . . . . . 2162--2176
Kai Yu and
Kai Sun and
Lu Chen and
Su Zhu Constrained Markov Bayesian Polynomial
for Efficient Dialogue State Tracking 2177--2188
C. A. Anderson and
P. D. Teal and
M. A. Poletti Spatially Robust Far-field Beamforming
Using the von Mises(--Fisher)
Distribution . . . . . . . . . . . . . . 2189--2197
J. Schroder and
S. Goetze and
J. Anemuller Spectro-Temporal Gabor Filterbank
Features for Acoustic Event Detection 2198--2208
Inseok Heo and
W. A. Sethares Classification Based on Speech Rhythm
via a Temporal Alignment of Spoken
Sentences . . . . . . . . . . . . . . . 2209--2216
P. Samarasinghe and
T. Abhayapala and
M. Poletti and
T. Betlehem An Efficient Parameterization of the
Room Transfer Function . . . . . . . . . 2217--2227
Yong Xiang and
I. Natgunanathan and
Yue Rong and
Song Guo Spread Spectrum-Based High Embedding
Capacity Watermarking Method for Audio
Signals . . . . . . . . . . . . . . . . 2228--2237
In-Chul Yoo and
Hyeontaek Lim and
Dongsuk Yook Formant-Based Robust Voice Activity
Detection . . . . . . . . . . . . . . . 2238--2245
T. Hueber and
L. Girin and
X. Alameda-Pineda and
G. Bailly Speaker-Adaptive Acoustic-Articulatory
Inversion Using Cascaded Gaussian
Mixture Regression . . . . . . . . . . . 2246--2259
Hequn Bai and
G. Richard and
L. Daudet Late Reverberation Synthesis: From
Radiance Transfer to Feedback Delay
Networks . . . . . . . . . . . . . . . . 2260--2271
I. Bayram A Multichannel Audio Denoising
Formulation Based on Spectral Sparsity 2272--2285
H. Delgado and
X. Anguera and
C. Fredouille and
J. Serrano Fast Single- and Cross-Show Speaker
Diarization Using Binary Key Speaker
Modeling . . . . . . . . . . . . . . . . 2286--2297
W. S. Percybrooks and
E. Moore A New HMM-Based Voice Conversion
Methodology Evaluated on Monolingual and
Cross-Lingual Conversion Tasks . . . . . 2298--2310
M. Graja and
M. Jaoua and
L. H. Belguith Statistical Framework with Knowledge
Base Integration for Robust Speech
Understanding of the Tunisian Dialect 2311--2321
F. Strasser and
H. Puder Adaptive Feedback Cancellation for
Realistic Hearing Aid Applications . . . 2322--2333
Yu Ting Yeung and
Tan Lee and
Cheung-Chi Leung Supervised Single-Microphone
Multi-Talker Speech Separation with
Conditional Random Fields . . . . . . . 2334--2342
Wenyu Jin and
W. B. Kleijn Theory and Design of Multizone
Soundfield Reproduction Using Sparse
Methods . . . . . . . . . . . . . . . . 2343--2355
Xionghu Zhong and
J. R. Hopgood A Time--Frequency Masking Based Random
Finite Set Particle Filtering Method for
Multiple Acoustic Source Detection and
Tracking . . . . . . . . . . . . . . . . 2356--2370
K. Vijayan and
K. S. R. Murty Analysis of Phase Spectrum of Speech
Signals Using Allpass Modeling . . . . . 2371--2383
D. Marquardt and
E. Hadad and
S. Gannot and
S. Doclo Theoretical Analysis of Linearly
Constrained Multi-Channel Wiener
Filtering Algorithms for Combined Noise
Reduction and Binaural Cue Preservation
in Binaural Hearing Aids . . . . . . . . 2384--2397
M. Zohrer and
R. Peharz and
F. Pernkopf Representation Learning for
Single-Channel Source Separation and
Bandwidth Extension . . . . . . . . . . 2398--2409
Hao Fang and
M. Ostendorf and
P. Baumann and
J. Pierrehumbert Exponential Language Modeling Using
Morphological Features and Multi-Task
Learning . . . . . . . . . . . . . . . . 2410--2421
M. A. Carlin and
M. Elhilali A Framework for Speech Activity
Detection Using Adaptive Auditory
Receptive Fields . . . . . . . . . . . . 2422--2433
S. Saito and
K. Oishi and
T. Furukawa Convolutive Blind Source Separation
Using an Iterative Least-Squares
Algorithm for Non-Orthogonal Approximate
Joint Diagonalization . . . . . . . . . 2434--2448
E. Hadad and
D. Marquardt and
S. Doclo and
S. Gannot Theoretical Analysis of Binaural
Transfer Function MVDR Beamformers with
Interference Cue Preservation
Constraints . . . . . . . . . . . . . . 2449--2464
Guang Yang and
R. F. Lyon and
E. M. Drakakis Psychophysical Evaluation of An
Ultra-Low Power, Analog Biomimetic
Cochlear Implant Processor Filterbank
Architecture With Across Channels AGC 2465--2473
Anonymous List of Reviewers . . . . . . . . . . . 2474--2476
Anonymous Table of Contents . . . . . . . . . . . 1--2
Anonymous Table of Contents . . . . . . . . . . . 3--4
S. Brognaux and
T. Drugman HMM-Based Speech Segmentation:
Improvements of Fully Automatic
Approaches . . . . . . . . . . . . . . . 5--15
M. Tahon and
L. Devillers Towards a Small Set of Robust Acoustic
Features for Emotion Recognition:
Challenges . . . . . . . . . . . . . . . 16--28
H. Behravan and
V. Hautamaki and
S. M. Siniscalchi and
T. Kinnunen and
Chin-Hui Lee $i$-Vector Modeling of Speech Attributes
for Automatic Foreign Accent Recognition 29--41
R. Saeidi and
P. Alku and
T. Backstrom Feature Extraction Using Power-Law
Adjusted Linear Prediction With
Application to Speaker Recognition Under
Severe Vocal Effort Mismatch . . . . . . 42--53
I. T. Ardekani and
J. P. Kaipio and
A. Nasiri and
H. Sharifzadeh and
W. H. Abdulla A Statistical Inverse Problem Approach
to Online Secondary Path Modeling in
Active Noise Control . . . . . . . . . . 54--64
T. Stafylakis and
P. Kenny and
M. J. Alam and
M. Kockmann Speaker and Channel Factors in
Text-Dependent Speaker Recognition . . . 65--78
Yanzhang He and
P. Baumann and
Hao Fang and
B. Hutchinson and
A. Jaech and
M. Ostendorf and
E. Fosler-Lussier and
J. Pierrehumbert Using Pronunciation-Based Morphological
Subword Units to Improve OOV Handling in
Keyword Search . . . . . . . . . . . . . 79--92
Meng Sun and
Xiongwei Zhang and
H. Van Hamme and
T. F. Zheng Unseen Noise Estimation Using Separable
Deep Auto Encoder for Speech Enhancement 93--104
L. Ferrer and
Yun Lei and
M. McLaren and
N. Scheffer Study of Senone-Based Deep Neural
Network Approaches for Spoken Language
Recognition . . . . . . . . . . . . . . 105--116
S. I. Adalbjornsson and
T. Kronvall and
S. Burgess and
K. Astrom and
A. Jakobsson Sparse Localization of Harmonic Audio
Sources . . . . . . . . . . . . . . . . 117--129
Man-Wai Mak and
Xiaomin Pang and
Jen-Tzung Chien Mixture of PLDA for Noise Robust
$I$-Vector Speaker Verification . . . . 130--142
C. A. Anderson and
P. D. Teal and
M. A. Poletti Spatial Correlation of Radial Gaussian
and Uniform Spherical Volume Near-Field
Source Distributions . . . . . . . . . . 143--150
H. Torres and
J. Gurlekian Novel Estimation Method for the
Superpositional Intonation Model . . . . 151--160
S. Bilbao and
B. Hamilton and
J. Botts and
L. Savioja Finite Volume Time Domain Room Acoustics
Simulation under General Impedance
Boundary Conditions . . . . . . . . . . 161--173
A. H. Harati Nejad Torbati and
J. Picone A Doubly Hierarchical Dirichlet Process
Hidden Markov Model with a Non-Ergodic
Structure . . . . . . . . . . . . . . . 174--184
Jen-Tzung Chien and
Po-Kai Yang Bayesian Factorization and Learning for
Monaural Source Separation . . . . . . . 185--195
D. L. Alon and
B. Rafaely Beamforming with Optimal Aliasing
Cancellation in Spherical Microphone
Arrays . . . . . . . . . . . . . . . . . 196--210
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Edics . . . . . 211--212
Anonymous Information for authors . . . . . . . . 213--214
Anonymous Special issue on sound scene and event
analysis . . . . . . . . . . . . . . . . 215
Anonymous [Front cover] . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous [Blank page] . . . . . . . . . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 211--212
Anonymous Table of contents . . . . . . . . . . . 213--214
E. Rasumow and
M. Hansen and
S. van de Par and
D. Puschel and
V. Mellert and
S. Doclo and
M. Blau Regularization Approaches for
Synthesizing HRTF Directivity Patterns 215--225
Chao Pan and
J. Benesty and
Jingdong Chen Design of Directivity Patterns with a
Unique Null of Maximum Multiplicity . . 226--235
Jeih-Weih Hung and
Hsin-Ju Hsieh and
Berlin Chen Robust Speech Recognition via Enhancing
the Complex-Valued Acoustic Spectrum in
Modulation Domain . . . . . . . . . . . 236--251
Xiao-Lei Zhang and
DeLiang Wang Boosting Contextual Information for Deep
Neural Network Based Voice Activity
Detection . . . . . . . . . . . . . . . 252--264
M. A. Tugtekin Turan and
E. Erzin Source and Filter Estimation for
Throat-Microphone Speech Enhancement . . 265--275
N. Mohammadiha and
S. Doclo Speech Dereverberation Using
Non-Negative Convolutive Transfer
Function and Spectro-Temporal Modeling 276--289
A. Sharma and
S. Kaul Two-Stage Supervised Learning-Based
Method to Detect Screams and Cries in
Urban Environments . . . . . . . . . . . 290--299
Xiaoguang Wu and
Huawei Chen Directivity Factors of the First-Order
Steerable Differential Array With
Microphone Mismatches: Deterministic and
Worst-Case Analysis . . . . . . . . . . 300--315
A. I. Koutrouvelis and
G. P. Kafentzis and
N. D. Gaubitch and
R. Heusdens A Fast Method for High-Resolution
Voiced/Unvoiced Detection and Glottal
Closure/Opening Instant Estimation of
Speech . . . . . . . . . . . . . . . . . 316--328
T. Nakamura and
E. Nakamura and
S. Sagayama Real-Time Audio-to-Score Alignment of
Music Performances Containing Errors and
Arbitrary Repeats and Skips . . . . . . 329--339
A. Bahne and
A. Ahlen Optimizing the Similarity of
Loudspeaker-Room Responses in Multiple
Listening Positions . . . . . . . . . . 340--353
J. M. Kates and
K. H. Arehart The Hearing-Aid Audio Quality Index
(HAAQI) . . . . . . . . . . . . . . . . 354--365
H. Schepker and
S. Doclo A Semidefinite Programming Approach to
Min-max Estimation of the Common Part of
Acoustic Feedback Paths in Hearing Aids 366--377
Bong-Ki Lee and
Joon-Hyuk Chang Packet Loss Concealment Based on Deep
Neural Networks for Digital Speech
Transmission . . . . . . . . . . . . . . 378--387
L. Bentivogli and
N. Bertoldi and
M. Cettolo and
M. Federico and
M. Negri and
M. Turchi On the Evaluation of Adaptive Machine
Translation for Human Post-Editing . . . 388--399
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Edics . . . . . 400--401
Anonymous Information for authors . . . . . . . . 402--403
Anonymous Special issue on sound scene and event
analysis . . . . . . . . . . . . . . . . 404
Anonymous [Front cover] . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Signal Processing Society
Information . . . . . . . . . . . . . . C3
Anonymous [Blank page] . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 405--406
Anonymous Table of Contents . . . . . . . . . . . 407--408
Reinhard Sonnleitner and
Gerhard Widmer Robust Quad-Based Audio Fingerprinting 409--421
Li Dong and
Furu Wei and
Ke Xu and
Shixia Liu and
Ming Zhou Adaptive Multi-Compositionality for
Recursive Neural Network Models . . . . 422--431
Zheng Lin and
Xiaolong Jin and
Xueke Xu and
Yuanzhuo Wang and
Xueqi Cheng and
Weiping Wang and
Dan Meng An Unsupervised Cross-Lingual Topic
Model Framework for Sentiment
Classification . . . . . . . . . . . . . 432--444
Anil Nagathil and
Claus Weihs and
Rainer Martin Spectral Complexity Reduction of Music
Signals for Mitigating Effects of
Cochlear Hearing Loss . . . . . . . . . 445--458
Tian Tan and
Yanmin Qian and
Kai Yu Cluster Adaptive Training for Deep
Neural Network Based Acoustic Model . . 459--468
Arne Leijon and
Gustav Eje Henter and
Martin Dahlquist Bayesian Analysis of Phoneme Confusion
Matrices . . . . . . . . . . . . . . . . 469--482
Donald S. Williamson and
Yuxuan Wang and
DeLiang Wang Complex Ratio Masking for Monaural
Speech Separation . . . . . . . . . . . 483--492
Johannes Traa and
David Wingate and
Noah D. Stein and
Paris Smaragdis Robust Source Localization and
Enhancement With a Probabilistic Steered
Response Power Model . . . . . . . . . . 493--503
Sven Ewan Shepstone and
Kong Aik Lee and
Haizhou Li and
Zheng-Hua Tan and
Sòren Holdt Jensen Total Variability Modeling Using
Source-Specific Priors . . . . . . . . . 504--517
Martin Schneider and
Walter Kellermann Multichannel Acoustic Echo Cancellation
in the Wave Domain With Increased
Robustness to Nonuniqueness . . . . . . 518--529
Ken O'Hanlon and
Hidehisa Nagano and
Nicolas Keriven and
Mark D. Plumbley Non-Negative Group Sparsity with
Subspace Note Modelling for Polyphonic
Transcription . . . . . . . . . . . . . 530--542
Elior Hadad and
Simon Doclo and
Sharon Gannot The Binaural LCMV Beamformer and its
Performance Analysis . . . . . . . . . . 543--558
Felipe Grijalva and
Luiz Martini and
Dinei Florencio and
Siome Goldenstein A Manifold Learning Approach for
Personalizing HRTFs from Anthropometric
Features . . . . . . . . . . . . . . . . 559--570
Lin Wang and
Simon Doclo Correlation Maximization-Based Sampling
Rate Offset Estimation for Distributed
Microphone Arrays . . . . . . . . . . . 571--582
Nasim Radmanesh and
Ian S. Burnett and
Bhaskar D. Rao A Lasso-LS Optimization with a Frequency
Variable Dictionary in a Multizone Sound
System . . . . . . . . . . . . . . . . . 583--593
Xin Liu and
Changchun Bao Audio Bandwidth Extension Based on
Ensemble Echo State Networks with
Temporal Evolution . . . . . . . . . . . 594--607
Anonymous EDICS Categories for IEEE/ACM
Transactions on Audio, Speech, and
Language Processing . . . . . . . . . . 608--609
Anonymous Information for Authors . . . . . . . . 610--611
Anonymous Special issue on sound scene and event
analysis . . . . . . . . . . . . . . . . 612
Anonymous Introducing IEEE Collabratec . . . . . . 613
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing . . . . . . . . C2
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing . . . . . . . . C3
Anonymous Table of Contents . . . . . . . . . . . 608--609
Anonymous Table of Contents . . . . . . . . . . . 610--611
Peifeng Li and
Guodong Zhou Joint Argument Inference in Chinese
Event Extraction with Argument
Consistency and Event Relevance . . . . 612--622
Jianming Liu and
Steven L. Grant Proportionate Adaptive Filtering for
Block-Sparse System Identification . . . 623--630
Jesper Rindom Jensen and
Jacob Benesty and
Mads Græsbòll Christensen Noise Reduction with Optimal Variable
Span Linear Filters . . . . . . . . . . 631--644
Sidsel Marie Nòrholm and
Jesper Rindom Jensen and
Mads Græsbòll Christensen Enhancement and Noise Statistics
Estimation for Non-Stationary Voiced
Speech . . . . . . . . . . . . . . . . . 645--658
Daryush D. Mehta and
Jarrad H. Van Stan and
Robert E. Hillman Relationships Between Vocal Function
Measures Derived from an Acoustic
Microphone and a Subglottal Neck-Surface
Accelerometer . . . . . . . . . . . . . 659--668
Herman Kamper and
Aren Jansen and
Sharon Goldwater Unsupervised Word Segmentation and
Lexicon Discovery Using Acoustic Word
Embeddings . . . . . . . . . . . . . . . 669--679
Ina Kodrasi and
Simon Doclo Joint Dereverberation and Noise
Reduction Based on Acoustic
Multi-Channel Equalization . . . . . . . 680--693
Hamid Palangi and
Li Deng and
Yelong Shen and
Jianfeng Gao and
Xiaodong He and
Jianshu Chen and
Xinying Song and
Rabab Ward Deep Sentence Embedding Using Long
Short-Term Memory Networks: Analysis and
Application to Information Retrieval . . 694--707
Michael Jeffet and
Noam R. Shabtai and
Boaz Rafaely Theory and Perceptual Evaluation of the
Binaural Reproduction and Beamforming
Tradeoff in the Generalized Spherical
Array Beamformer . . . . . . . . . . . . 708--718
Pablo Peso Parada and
Dushyant Sharma and
Jose Lainez and
Daniel Barreda and
Toon van Waterschoot and
Patrick A. Naylor A Single-Channel Non-Intrusive C50
Estimator Correlated With Speech
Recognition Performance . . . . . . . . 719--732
Ming-Hsiang Su and
Chung-Hsien Wu and
Yu-Ting Zheng Exploiting Turn-Taking Temporal
Evolution for Personality Trait
Perception in Dyadic Conversations . . . 733--744
Sadaf Abdul-Rauf and
Holger Schwenk and
Patrik Lambert and
Mohammad Nawaz Empirical Use of Information Retrieval
to Build Synthetic Data for SMT Domain
Adaptation . . . . . . . . . . . . . . . 745--754
Shinnosuke Takamichi and
Tomoki Toda and
Alan W. Black and
Graham Neubig and
Sakriani Sakti and
Satoshi Nakamura Postfilters to Modify the Modulation
Spectrum for Statistical Parametric
Speech Synthesis . . . . . . . . . . . . 755--767
Zhizheng Wu and
Phillip L. De Leon and
Cenk Demiroglu and
Ali Khodabakhsh and
Simon King and
Zhen-Hua Ling and
Daisuke Saito and
Bryan Stewart and
Tomoki Toda and
Mirjam Wester and
Junichi Yamagishi Anti-Spoofing for Text-Independent
Speaker Verification: an Initial
Database, Comparison of Countermeasures,
and Human Performance . . . . . . . . . 768--783
Kristian Timm Andersen and
Marc Moonen Adaptive Time-Frequency Analysis for
Noise Reduction in an Audio Filter Bank
With Low Delay . . . . . . . . . . . . . 784--795
Zhong-Qiu Wang and
DeLiang Wang A Joint Training Framework for Robust
Automatic Speech Recognition . . . . . . 796--806
Huy Phan and
Lars Hertel and
Marco Maass and
Radoslaw Mazur and
Alfred Mertins Learning Representations for Nonspeech
Audio Events Through Their Similarities
to Speech Patterns . . . . . . . . . . . 807--822
Anonymous EDICS Categories for IEEE/ACM
Transactions on Audio, Speech, and
Language Processing . . . . . . . . . . 823--824
Anonymous Information for Authors . . . . . . . . 825--826
Anonymous Special issue on sound scene and event
analysis . . . . . . . . . . . . . . . . 827
Anonymous Introducing IEEE Collabratec . . . . . . 828
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing . . . . . . . . C2
Anonymous Table of Contents . . . . . . . . . . . 829--830
Anonymous Table of Contents . . . . . . . . . . . 831--832
T. J. Tsai and
Andreas Stolcke Robust and Efficient Multiple Alignment
of Unsynchronized Meeting Recordings . . 833--845
Simon Receveur and
Robin Weiß and
Tim Fingscheidt Turbo Automatic Speech Recognition . . . 846--862
Ricard Marxer and
Hendrik Purwins Unsupervised Incremental Online Learning
and Prediction of Musical Audio Signals 863--874
Mohammad Adeli and
Jean Rouat and
Sean Wood and
Stéphane Molotchnikoff and
Eric Plourde A Flexible Bio-Inspired Hierarchical
Model for Analyzing Musical Timbre . . . 875--889
Geliang Zhang and
Simon Godsill Fundamental Frequency Estimation in
Speech Signals With Variable Rate
Particle Filters . . . . . . . . . . . . 890--900
Nadine Kroher and
Emilia Gómez Automatic Transcription of Flamenco
Singing From Polyphonic Music Recordings 901--913
Fiete Winter and
Jens Ahrens and
Sascha Spors On Analytic Methods for $ 2.5$-D Local
Sound Field Synthesis Using Circular
Distributions of Secondary Sources . . . 914--926
Siddharth Sigtia and
Emmanouil Benetos and
Simon Dixon An End-to-End Neural Network for
Polyphonic Piano Music Transcription . . 927--939
Martin Krawczyk-Becker and
Timo Gerkmann Fundamental Frequency Informed Speech
Enhancement in a Flexible Statistical
Framework . . . . . . . . . . . . . . . 940--951
Joseph Szurley and
Alexander Bertrand and
Bas Van Dijk and
Marc Moonen Binaural Noise Cue Preservation in a
Binaural Noise Reduction System With a
Remote Microphone Signal . . . . . . . . 952--966
Xiao-Lei Zhang and
DeLiang Wang A Deep Ensemble Learning Method for
Monaural Speech Separation . . . . . . . 967--977
Haotian Xu and
Zhijian Ou Scalable Discovery of Audio Fingerprint
Motifs in Broadcast Streams With
Determinantal Point Process Based Motif
Clustering . . . . . . . . . . . . . . . 978--989
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Edics . . . . . 990--991
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing information for
authors . . . . . . . . . . . . . . . . 992--993
Anonymous Special issue on sound scene and event
analysis . . . . . . . . . . . . . . . . 994
Anonymous Special Issue on Biosignal-based Spoken
Communication . . . . . . . . . . . . . 995
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Power Electronics Society
Information . . . . . . . . . . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 990--991
Anonymous Table of Contents . . . . . . . . . . . 992--993
Asli Celikyilmaz and
Ruhi Sarikaya and
Minwoo Jeong and
Anoop Deoras An Empirical Investigation of Word
Class-Based Features for Natural
Language Understanding . . . . . . . . . 994--1005
Duc Hoang Ha Nguyen and
Xiong Xiao and
Eng Siong Chng and
Haizhou Li Feature Adaptation Using Linear
Spectro-Temporal Transform for Robust
Speech Recognition . . . . . . . . . . . 1006--1019
Xiaojun Qian and
Helen Meng and
Frank Soong A Two-Pass Framework of Mispronunciation
Detection and Diagnosis for
Computer-Aided Pronunciation Training 1020--1028
Lijiang Chen and
Xia Mao and
Hong Yan Text-Independent Phoneme Segmentation
Combining EGG and Speech Data . . . . . 1029--1037
Vincent Mohammad Tavakoli and
Jesper Rindom Jensen and
Mads Græsbòll Christensen and
Jacob Benesty A Framework for Speech Enhancement With
Ad Hoc Microphone Arrays . . . . . . . . 1038--1051
Yan-You Chen and
Chung-Hsien Wu and
Yi-Chin Huang and
Shih-Lun Lin and
Jhing-Fa Wang Candidate Expansion and Prosody
Adjustment for Natural Speech Synthesis
Using a Small Corpus . . . . . . . . . . 1052--1065
Xueliang Zhang and
Hui Zhang and
Shuai Nie and
Guanglai Gao and
Wenju Liu A Pairwise Algorithm Using the Deep
Stacking Network for Speech Separation
and Pitch Estimation . . . . . . . . . . 1066--1078
Lin Wang and
Tsz-Kin Hon and
Joshua D. Reiss and
Andrea Cavallaro An Iterative Approach to Source Counting
and Localization Using Two Distant
Microphones . . . . . . . . . . . . . . 1079--1093
Seán O'Leary and
Axel Röbel A Montage Approach to Sound Texture
Synthesis . . . . . . . . . . . . . . . 1094--1105
Chahid Ouali and
Pierre Dumouchel and
Vishwa Gupta Fast Audio Fingerprinting System Using
GPU and a Clustering-Based Technique . . 1106--1118
Francisco Raposo and
Ricardo Ribeiro and
David Martins de Matos Using Generic Summarization to Improve
Music Information Retrieval Tasks . . . 1119--1128
Lantian Li and
Dong Wang and
Chenhao Zhang and
Thomas Fang Zheng Improving Short Utterance Speaker
Recognition by Modeling Speech Unit
Classes . . . . . . . . . . . . . . . . 1129--1139
Jalal Taghia and
Rainer Martin A Frequency-Domain Adaptive Line
Enhancer With Step-Size Control Based on
Mutual Information for Harmonic Noise
Reduction . . . . . . . . . . . . . . . 1140--1154
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Edics . . . . . 1155--1156
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing information for
authors . . . . . . . . . . . . . . . . 1157--1158
Anonymous Special issue on sound scene and event
analysis . . . . . . . . . . . . . . . . 1159
Anonymous Special Issue on Biosignal-based Spoken
Communication . . . . . . . . . . . . . 1160
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing publication
information . . . . . . . . . . . . . . C2
Anonymous IEEE Power Electronics Society
Information . . . . . . . . . . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Min Gao and
Jing Lu and
Xiaojun Qiu A Simplified Subband ANC Algorithm
Without Secondary Path Modeling . . . . 1164--1174
Ryo Aihara and
Tetsuya Takiguchi and
Yasuo Ariki Multiple Non-Negative Matrix
Factorization for Many-to-Many Voice
Conversion . . . . . . . . . . . . . . . 1175--1184
Kai Chen and
Qiang Huo Training Deep Bidirectional LSTM
Acoustic Model for LVCSR by a
Context-Sensitive-Chunk BPTT Approach 1185--1193
Themos Stafylakis and
Md. Jahangir Alam and
Patrick Kenny Text-Dependent Speaker Recognition With
Random Digit Strings . . . . . . . . . . 1194--1203
K. T. Deepak and
S. R. Mahadeva Prasanna Foreground Speech Segmentation and
Enhancement Using Glottal Closure
Instants and Mel Cepstral Coefficients 1204--1218
Habib Hajimolahoseini and
Rassoul Amirfattahi and
Saeed Gazor and
Hamid Soltanian-Zadeh Robust Estimation and Tracking of Pitch
Period Using an Efficient Bayesian
Filter . . . . . . . . . . . . . . . . . 1219--1229
Subhasmita Sahoo and
Aurobinda Routray A Novel Method of Glottal Inverse
Filtering . . . . . . . . . . . . . . . 1230--1241
Gilles Degottex and
Luc Ardaillon and
Axel Roebel Multi-Frame Amplitude Envelope
Estimation for Modification of Singing
Voice . . . . . . . . . . . . . . . . . 1242--1254
Zhizheng Wu and
Simon King Improving Trajectory Modelling for
DNN-Based Speech Synthesis by Using
Stacked Bottleneck Features and Minimum
Generation Error Training . . . . . . . 1255--1265
Xabier Jaureguiberry and
Emmanuel Vincent and
Gaël Richard Fusion Methods for Speech Enhancement
and Audio Source Separation . . . . . . 1266--1279
Rajib Lochan Das and
Mrityunjoy Chakraborty Improving the Performance of the PNLMS
Algorithm Using Norm Regularization . . 1280--1290
Maja Taseska and
Emanuël A. P. Habets Spotforming: Spatial Filtering With
Distributed Arrays for
Position-Selective Sound Acquisition . . 1291--1304
Guangyou Zhou and
Zhiwen Xie and
Tingting He and
Jun Zhao and
Xiaohua Tony Hu Learning the Multilingual Translation
Representations for Question Retrieval
in Community Question Answering via
Non-Negative Matrix Factorization . . . 1305--1314
Chanwoo Kim and
Richard M. Stern Power-Normalized Cepstral Coefficients
(PNCC) for Robust Speech Recognition . . 1315--1329
Henning Schepker and
Simon Doclo Least-Squares Estimation of the Common
Pole-Zero Filter of Acoustic Feedback
Paths in Hearing Aids . . . . . . . . . 1334--1347
Hannes Pessentheiner and
Martin Hagmüller and
Gernot Kubin Localization and Characterization of
Multiple Harmonic Sources . . . . . . . 1348--1363
Hanieh Khalilian and
Ivan V. Baji\'c and
Rodney G. Vaughan Comparison of Loudspeaker Placement
Methods for Sound Field Reproduction . . 1364--1379
Cheng-Yen Yang and
Chih-Wei Liu and
Shyh-Jye Jou A Systematic ANSI S1.11 Filter Bank
Specification Relaxation and Its
Efficient Multirate Architecture for
Hearing-Aid Systems . . . . . . . . . . 1380--1392
Bracha Laufer-Goldshtein and
Ronen Talmon and
Sharon Gannot Semi-Supervised Sound Source
Localization Based on Manifold
Regularization . . . . . . . . . . . . . 1393--1407
Dionyssos Kounades-Bastian and
Laurent Girin and
Xavier Alameda-Pineda and
Sharon Gannot and
Radu Horaud A Variational EM Algorithm for the
Separation of Time-Varying Convolutive
Audio Mixtures . . . . . . . . . . . . . 1408--1423
Jun Du and
Yanhui Tu and
Li-Rong Dai and
Chin-Hui Lee A Regression Approach to Single-Channel
Speech Separation Via High-Resolution
Deep Neural Networks . . . . . . . . . . 1424--1437
Xunying Liu and
Xie Chen and
Yongqiang Wang and
Mark J. F. Gales and
Philip C. Woodland Two Efficient Lattice Rescoring Methods
Using Recurrent Neural Network Language
Models . . . . . . . . . . . . . . . . . 1438--1449
Pawel Swietojanski and
Jinyu Li and
Steve Renals Learning Hidden Unit Contributions for
Unsupervised Acoustic Model Adaptation 1450--1463
Meng Zhang and
Yang Liu and
Huanbo Luan and
Maosong Sun Listwise Ranking Functions for
Statistical Machine Translation . . . . 1464--1472
Anonymous Table of Contents . . . . . . . . . . . 1477--1478
Anonymous Table of Contents . . . . . . . . . . . 1479--1480
Daniel C. Cavalieri and
Sira E. Palazuelos-Cagigas and
Teodiano F. Bastos-Filho and
Mário Sarcinelli-Filho Combination of Language Models for Word
Prediction: An Exponential Approach . . 1481--1494
Ofer Schwartz and
Sharon Gannot and
Emanuël A. P. Habets An Expectation-Maximization Algorithm
for Multimicrophone Speech
Dereverberation and Noise Reduction With
Coherence Matrix Estimation . . . . . . 1495--1510
Symeon Delikaris-Manias and
Juha Vilkamo and
Ville Pulkki Signal-Dependent Spatial Filtering Based
on Weighted-Orthogonal Beamformers in
the Spherical Harmonic Domain . . . . . 1511--1523
Sheng Li and
Yuya Akita and
Tatsuya Kawahara Semi-Supervised Acoustic Model Training
by Discriminative Data Selection From
Multiple ASR Systems' Hypotheses . . . . 1524--1534
Christian Dittmar and
Meinard Müller Reverse Engineering the Amen Break ---
Score-Informed Separation and
Restoration Applied to Drum Recordings 1535--1547
Chao Pan and
Jingdong Chen and
Jacob Benesty Reduced-Order Robust Superdirective
Beamforming With Uniform Linear
Microphone Arrays . . . . . . . . . . . 1548--1559
Derry FitzGerald and
Antoine Liutkus and
Roland Badeau Projection-Based Demixing of Spatial
Audio . . . . . . . . . . . . . . . . . 1560--1572
Lin Wang and
Joshua D. Reiss and
Andrea Cavallaro Over-Determined Source Separation and
Localization Using Distributed
Microphones . . . . . . . . . . . . . . 1573--1588
Yang Liu and
Sujian Li and
Furu Wei and
Heng Ji Relation Classification Via Modeling
Augmented Dependency Paths . . . . . . . 1589--1598
Adam Kuklasi\'nski and
Simon Doclo and
Sòren Holdt Jensen and
Jesper Jensen Maximum Likelihood PSD Estimation for
Speech Enhancement in Reverberation and
Noise . . . . . . . . . . . . . . . . . 1599--1612
Sam Karimian-Azari and
Jesper Rindom Jensen and
Mads Græsbòll Christensen Computationally Efficient and Noise
Robust DOA and Pitch Estimation . . . . 1613--1625
Daichi Kitamura and
Nobutaka Ono and
Hiroshi Sawada and
Hirokazu Kameoka and
Hiroshi Saruwatari Determined Blind Source Separation
Unifying Independent Vector Analysis and
Nonnegative Matrix Factorization . . . . 1626--1641
Nicolas Obin and
Axel Roebel Similarity Search of Acted Voices for
Automatic Voice Casting . . . . . . . . 1642--1651
Aditya Arie Nugraha and
Antoine Liutkus and
Emmanuel Vincent Multichannel Audio Source Separation
With Deep Neural Networks . . . . . . . 1652--1664
Stephen H. Shum and
David F. Harwath and
Najim Dehak and
James R. Glass On the Use of Acoustic Unit Discovery
for Language Recognition . . . . . . . . 1665--1676
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 1677--1678
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 1679--1680
Anonymous Introducing the IEEE PES Resource Center 1681
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE Signal Processing Society . . . . . C2
Anonymous IEEE Signal Processing Society . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1677--1678
Anonymous Table of Contents . . . . . . . . . . . 1679--1680
James Eaton and
Nikolay D. Gaubitch and
Alastair H. Moore and
Patrick A. Naylor Estimation of Room Acoustic Parameters:
The ACE Challenge . . . . . . . . . . . 1681--1693
Takashi Nose Efficient Implementation of Global
Variance Compensation for Parametric
Speech Synthesis . . . . . . . . . . . . 1694--1704
Shabnam Ghaffarzadegan and
Hynek Bo\vril and
John H. L. Hansen Generative Modeling of Pseudo-Whisper
for Robust Whispered Speech Recognition 1705--1720
Seyedmahdad Mirsamadi and
John H. L. Hansen A Generalized Nonnegative Tensor
Factorization Approach for Distant
Speech Recognition With Distributed
Microphones . . . . . . . . . . . . . . 1721--1731
Laura Fuster and
Maria de Diego and
Luis A. Azpicueta-Ruiz and
Miguel Ferrer Adaptive Filtered-x Algorithms for Room
Equalization Based on Block-Based
Combination Schemes . . . . . . . . . . 1732--1745
Kamil Adilo\uglu and
Emmanuel Vincent Variational Bayesian Inference for
Source Separation and Robust Feature
Extraction . . . . . . . . . . . . . . . 1746--1758
Steffen Kortlang and
Giso Grimm and
Volker Hohmann and
Birger Kollmeier and
Stephan D. Ewert Auditory Model-Based Dynamic Compression
Controlled by Subband Instantaneous
Frequency and Speech Presence
Probability Estimates . . . . . . . . . 1759--1772
Pawel Swietojanski and
Steve Renals Differentiable Pooling for Unsupervised
Acoustic Model Adaptation . . . . . . . 1773--1784
Kenta Niwa and
Yusuke Hioka and
Kazunori Kobayashi Optimal Microphone Array Observation for
Clear Recording of Distant Sound Sources 1785--1795
Nicolas Epain and
Craig T. Jin Spherical Harmonic Signal Covariance and
Sound Field Diffuseness . . . . . . . . 1796--1807
Tudor-C\uat\ualin Zoril\ua and
Yannis Stylianou and
Tatsuma Ishihara and
Masami Akamine Near and Far Field Speech-in-Noise
Intelligibility Improvements Based on a
Time--Frequency Energy Reallocation
Approach . . . . . . . . . . . . . . . . 1808--1818
Xi Ma and
Dong Wang and
Javier Tejedor Similar Word Model for Unfrequent Word
Enhancement in Speech Recognition . . . 1819--1830
Mohammad Hadi Bokaei and
Hossein Sameti and
Yang Liu Summarizing Meeting Transcripts Based on
Functional Segmentation . . . . . . . . 1831--1841
Jiajun Zhang and
Yu Zhou and
Chengqing Zong Abstractive Cross-Language Summarization
via Translation Model Enhanced Predicate
Argument Structure Fusing . . . . . . . 1842--1853
Grégoire Lafay and
Mathieu Lagrange and
Mathias Rossignol and
Emmanouil Benetos and
Axel Roebel A Morphological Model for Simulating
Acoustic Scenes and Its Application to
Sound Event Detection . . . . . . . . . 1854--1864
An Ji and
Michael T. Johnson and
Jeffrey J. Berry Parallel Reference Speaker Weighting for
Kinematic-Independent
Acoustic-to-Articulatory Inversion . . . 1865--1875
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 1876--1877
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 1878--1879
Anonymous Introducing the IEEE PES Resource Center 1880
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE Signal Processing Society . . . . . C2
Anonymous IEEE Signal Processing Society . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1881--1882
Anonymous Table of Contents . . . . . . . . . . . 1883--1884
Aggelos Gkiokas and
Vassilis Katsouros and
George Carayannis Towards Multi-Purpose Spectral Rhythm
Features: An Application to Dance Style,
Meter and Tempo Estimation . . . . . . . 1885--1896
Yi-Chin Huang and
Chung-Hsien Wu and
Si-Ting Weng Improving Mandarin Prosody Generation
Using Alternative Smoothing Techniques 1897--1907
Asger Heidemann Andersen and
Jan Mark de Haan and
Zheng-Hua Tan and
Jesper Jensen Predicting the Intelligibility of Noisy
and Nonlinearly Processed Binaural
Speech . . . . . . . . . . . . . . . . . 1908--1920
Qiaoling Zhang and
Zhe Chen and
Fuliang Yin Distributed Marginalized Auxiliary
Particle Filter for Speaker Tracking in
Distributed Microphone Networks . . . . 1921--1934
Marc Ferr\`as and
Srikanth Madikeri and
Hervé Bourlard Speaker Diarization and Linking of
Meeting Data . . . . . . . . . . . . . . 1935--1945
Yuzong Liu and
Katrin Kirchhoff Graph-Based Semisupervised Learning for
Acoustic Modeling in Automatic Speech
Recognition . . . . . . . . . . . . . . 1946--1956
Jin Wang and
Liang-Chih Yu and
K. Robert Lai and
Xuejie Zhang Community-Based Weighted Graph Model for
Valence-Arousal Prediction of Affective
Words . . . . . . . . . . . . . . . . . 1957--1968
Alberto Carini and
Stefania Cecchi and
Laura Romoli Robust Room Impulse Response Measurement
Using Perfect Sequences for Legendre
Nonlinear Filters . . . . . . . . . . . 1969--1982
Sebastian Ewert and
Mark Sandler Piano Transcription in the Studio Using
an Extensible Alternating Directions
Framework . . . . . . . . . . . . . . . 1983--1997
Yu-Ren Chien and
Hsin-Min Wang and
Shyh-Kang Jeng Alignment of Lyrics With Accompanied
Singing Audio Based on Acoustic-Phonetic
Vowel Likelihood Modeling . . . . . . . 1998--2008
Jesper Jensen and
Cees H. Taal An Algorithm for Predicting the
Intelligibility of Speech Masked by
Modulated Noise Maskers . . . . . . . . 2009--2022
Xiaodong Cui and
Vaibhava Goel Maximum Likelihood Nonlinear
Transformations Based on Deep Neural
Networks . . . . . . . . . . . . . . . . 2023--2031
Toru Nakashika and
Tetsuya Takiguchi and
Yasuhiro Minami Non-Parallel Training in Voice
Conversion Using an Adaptive Restricted
Boltzmann Machine . . . . . . . . . . . 2032--2045
I-Bin Liao and
Chen-Yu Chiang and
Yih-Ru Wang and
Sin-Horng Chen Speaker Adaptation of SR-HPM for
Speaking Rate-Controlled Mandarin TTS 2046--2058
Hiroki Ouchi and
Kevin Duh and
Hiroyuki Shindo and
Yuji Matsumoto Transition-Based Dependency Parsing
Exploiting Supertags . . . . . . . . . . 2059--2068
Tong Xiao and
Derek F. Wong and
Jingbo Zhu A Loss-Augmented Approach to Training
Syntactic Machine Translation Systems 2069--2083
Yukara Ikemiya and
Katsutoshi Itoyama and
Kazuyoshi Yoshii Singing Voice Separation and Vocal F0
Estimation Based on Mutual Combination
of Robust Principal Component Analysis
and Subharmonic Summation . . . . . . . 2084--2095
Siddharth Sigtia and
Adam M. Stark and
Sacha Krstulovi\'c and
Mark D. Plumbley Automatic Environmental Sound
Recognition: Performance Versus
Computational Cost . . . . . . . . . . . 2096--2107
Srinivas Parthasarathy and
Roddy Cowie and
Carlos Busso Using Agreement on Direction of Change
to Build Rank-Based Emotion Classifiers 2108--2121
Jia-Ching Wang and
Yuan-Shan Lee and
Chang-Hong Lin and
Shu-Fan Wang and
Chih-Hao Shih and
Chung-Hsien Wu Compressive Sensing-Based Speech
Enhancement . . . . . . . . . . . . . . 2122--2131
Siying Wang and
Sebastian Ewert and
Simon Dixon Robust and Efficient Joint Alignment of
Multiple Musical Performances . . . . . 2132--2145
Xie Chen and
Xunying Liu and
Yongqiang Wang and
Mark J. F. Gales and
Philip C. Woodland Efficient Training and Evaluation of
Recurrent Neural Network Language Models
for Automatic Speech Recognition . . . . 2146--2157
Ping-Keng Jao and
Li Su and
Yi-Hsuan Yang and
Brendt Wohlberg Monaural Music Source Separation Using
Convolutional Sparse Coding . . . . . . 2158--2170
Andrea Cogliati and
Zhiyao Duan and
Brendt Wohlberg Context-Dependent Piano Music
Transcription With Convolutional Sparse
Coding . . . . . . . . . . . . . . . . . 2218--2230
Yanmin Qian and
Tian Tan and
Dong Yu Neural Network Based Multi-Factor Aware
Joint Training for Robust Speech
Recognition . . . . . . . . . . . . . . 2231--2240
Lahiru Samarakoon and
Khe Chai Sim Factorized Hidden Layer Adaptation for
Deep Neural Network Based Acoustic
Modeling . . . . . . . . . . . . . . . . 2241--2250
Martin Krawczyk-Becker and
Timo Gerkmann On MMSE-Based Estimation of Amplitude
and Complex Speech Spectral Coefficients
Under Phase-Uncertainty . . . . . . . . 2251--2262
Yanmin Qian and
Mengxiao Bi and
Tian Tan and
Kai Yu Very Deep Convolutional Neural Networks
for Noise Robust Speech Recognition . . 2263--2276
Yi-Chan Wu and
Homer H. Chen Generation of Affective Accompaniment in
Accordance With Emotion Flow . . . . . . 2277--2287
Mahmood Movassagh and
Peter Kabal Scalable Audio Coding Using
Trellis-Based Optimized Joint Entropy
Coding and Quantization . . . . . . . . 2288--2300
Milos Cernak and
Alexandros Lazaridis and
Afsaneh Asaei and
Philip N. Garner Composition of Deep and Spiking Neural
Networks for Very Low Bit Rate Speech
Coding . . . . . . . . . . . . . . . . . 2301--2312
David Dov and
Ronen Talmon and
Israel Cohen Kernel Method for Voice Activity
Detection in the Presence of Transients 2313--2326
Jesús Villalba and
Antonio Miguel and
Alfonso Ortega and
Eduardo Lleida Bayesian Networks to Model the
Variability of Speaker Verification
Scores in Adverse Environments . . . . . 2327--2340
Hardik B. Sailor and
Hemant A. Patil Novel Unsupervised Auditory Filterbank
Learning Using Convolutional RBM for
Speech Recognition . . . . . . . . . . . 2341--2353
Sidsel Marie Nòrholm and
Jesper Rindom Jensen and
Mads Græsbòll Christensen Instantaneous Fundamental Frequency
Estimation With Optimal Segmentation for
Nonstationary Voiced Speech . . . . . . 2354--2367
Sheng Zhang and
Jiashu Zhang and
Hongyu Han Robust Variable Step-Size Decorrelation
Normalized Least-Mean-Square Algorithm
and its Application to Acoustic Echo
Cancellation . . . . . . . . . . . . . . 2368--2376
Tom Barker and
Tuomas Virtanen Blind Separation of Audio Mixtures
Through Nonnegative Tensor Factorization
of Modulation Spectrograms . . . . . . . 2377--2389
Jinxin Liu and
Xuefeng Chen Adaptive Compensation of Misequalization
in Narrowband Active Noise Equalizer
Systems . . . . . . . . . . . . . . . . 2390--2399
Atsunori Ogawa and
Takaaki Hori and
Atsushi Nakamura Estimating Speech Recognition Accuracy
Based on Error Type Classification . . . 2400--2413
Finnian Kelly and
John H. L. Hansen Score-Aging Calibration for Speaker
Verification . . . . . . . . . . . . . . 2414--2424
Bochen Li and
Zhiyao Duan An Approach to Score Following for Piano
Performances With the Sustained Effect 2425--2438
Niko Moritz and
Birger Kollmeier and
Jörn Anemüller Integration of Optimized Modulation
Filter Sets Into Deep Neural Networks
for Automatic Speech Recognition . . . . 2439--2452
Simon Leglaive and
Roland Badeau and
Gaël Richard Multichannel Audio Source Separation
With Probabilistic Reverberation Priors 2453--2465
Sakari Tervo Single Snapshot Detection and Estimation
of Reflections From Room Impulse
Responses in the Spherical Harmonic
Domain . . . . . . . . . . . . . . . . . 2466--2480
Dejan Markovi\'c and
Fabio Antonacci and
Lucio Bianchi and
Stefano Tubaro and
Augusto Sarti Extraction of Acoustic Sources Through
the Processing of Sound Field Maps in
the Ray Space . . . . . . . . . . . . . 2481--2494
Anonymous Table of Contents . . . . . . . . . . . 222--223
Anonymous Table of Contents . . . . . . . . . . . 224--225
Hanchi Chen and
Thushara Dheemantha Abhayapala and
Prasanga N. Samarasinghe and
Wen Zhang Direct-to-Reverberant Energy Ratio
Estimation Using a First-Order
Microphone . . . . . . . . . . . . . . . 226--237
Peter Bell and
Pawel Swietojanski and
Steve Renals Multitask Learning of Context-Dependent
Targets in Deep Neural Network Acoustic
Models . . . . . . . . . . . . . . . . . 238--247
Rui Zhao and
Kezhi Mao Topic-Aware Deep Compositional Models
for Sentence Classification . . . . . . 248--260
Dalia El Badawy and
Ngoc Q. K. Duong and
Alexey Ozerov On-the-Fly Audio Source Separation --- A
Novel User-Friendly Framework . . . . . 261--272
Filip Elvander and
Johan Swärd and
Andreas Jakobsson Online Estimation of Multiple Harmonic
Signals . . . . . . . . . . . . . . . . 273--284
Vincent Renkens and
Hugo Van hamme Weakly Supervised Learning of Hidden
Markov Models for Spoken Language
Acquisition . . . . . . . . . . . . . . 285--295
Luca Remaggi and
Philip J. B. Jackson and
Philip Coleman and
Wenwu Wang Acoustic Reflector Localization: Novel
Image Source Reversion and Direct
Localization Methods . . . . . . . . . . 296--309
Prasanga N. Samarasinghe and
Thushara D. Abhayapala and
Hanchi Chen Estimating the Direct-to-Reverberant
Energy Ratio Using a Spherical
Harmonics-Based Spatial Correlation
Model . . . . . . . . . . . . . . . . . 310--319
Shmulik Markovich-Golan and
Sharon Gannot and
Walter Kellermann Combined LCMV-TRINICON Beamforming for
Separating Multiple Speech Sources in
Noisy and Reverberant Environments . . . 320--332
Shakeel Ahmed and
Muhammad Tahir Akhtar Gain Scheduling of Auxiliary Noise and
Variable Step-Size for Online Acoustic
Feedback Cancellation in Narrow-Band
Active Noise Control Systems . . . . . . 333--343
Gabriel Sargent and
Frédéric Bimbot and
Emmanuel Vincent Estimating the Structural Segmentation
of Popular Music Pieces Under Regularity
Constraints . . . . . . . . . . . . . . 344--358
Jordan Cheer and
Stephen Daley An Investigation of Delayless Subband
Adaptive Filtering for Multi-Input
Multi-Output Active Noise Control
Applications . . . . . . . . . . . . . . 359--373
Sebastian J. Schlecht and
Emanuël A. P. Habets Feedback Delay Networks: Echo Density
and Mixing Time . . . . . . . . . . . . 374--383
Johannes Abel and
Magdalena Kaniewska and
Cyril Guillaumé and
Wouter Tirry and
Tim Fingscheidt An Instrumental Quality Measure for
Artificially Bandwidth-Extended Speech
Signals . . . . . . . . . . . . . . . . 384--396
Robert Rehr and
Timo Gerkmann An Analysis of Adaptive Recursive
Smoothing with Applications to Noise PSD
Estimation . . . . . . . . . . . . . . . 397--408
Emilio Granell and
Carlos-D. Martínez-Hinarejos Multimodal Crowdsourcing for
Transcribing Handwritten Documents . . . 409--419
Yaping Ma and
Yegui Xiao A New Strategy for Online Secondary-Path
Modeling of Narrowband Active Noise
Control . . . . . . . . . . . . . . . . 420--434
Jose A. Belloch and
Alberto Gonzalez and
Enrique S. Quintana-Ortí and
Miguel Ferrer and
Vesa Välimäki GPU-Based Dynamic Wave Field Synthesis
Using Fractional Delay Filters and Room
Compensation . . . . . . . . . . . . . . 435--447
Anonymous IEEE/ACM Transactions on Audio, Speech,
and Language Processing Edics . . . . . 448--449
Anonymous IEEE Transactions on Multimedia
information for authors . . . . . . . . 450--451
Anonymous Introducing IEEE Collabratec . . . . . . 452
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE Signal Processing Society . . . . . C2
Anonymous Table of Contents . . . . . . . . . . . 3--4
Anonymous Table of Contents . . . . . . . . . . . 3--4
Anonymous Table of Contents . . . . . . . . . . . 3--4
Anonymous Table of Contents . . . . . . . . . . . 3--4
Qi He and
Feng Bao and
Changchun Bao Multiplicative Update of Auto-Regressive
Gains for Codebook-Based Speech
Enhancement . . . . . . . . . . . . . . 457--468
Zhongqing Wang and
Sophia Yat Mei Lee and
Shoushan Li and
Guodong Zhou Emotion Analysis in Code-Switching Text
With Joint Factor Graph Model . . . . . 469--480
Ashwin Bellur and
Mounya Elhilali Feedback-Driven Sensory Mapping
Adaptation for Robust Speech Activity
Detection . . . . . . . . . . . . . . . 481--492
Zhiyuan Tang and
Lantian Li and
Dong Wang and
Ravichander Vipperla Collaborative Joint Training With
Multitask Recurrent Model for Speech and
Speaker Recognition . . . . . . . . . . 493--504
Bidisha Sharma and
S. R. Mahadeva Prasanna Sonority Measurement Using System,
Source, and Suprasegmental Information 505--518
Hung-Yi Lee and
Bo-Hsiang Tseng and
Tsung-Hsien Wen and
Yu Tsao Personalizing
Recurrent-Neural-Network-Based Language
Model by Social Network . . . . . . . . 519--530
Ji Ming and
Danny Crookes Speech Enhancement Based on
Full-Sentence Correlation and Clean
Speech Recognition . . . . . . . . . . . 531--543
Quoc Truong Do and
Tomoki Toda and
Graham Neubig and
Sakriani Sakti and
Satoshi Nakamura Preserving Word-Level Emphasis in
Speech-to-Speech Translation . . . . . . 544--556
Zhenghua Li and
Jiayuan Chao and
Min Zhang and
Wenliang Chen and
Meishan Zhang and
Guohong Fu Coupled POS Tagging on Heterogeneous
Annotations . . . . . . . . . . . . . . 557--571
Clement S. J. Doire and
Mike Brookes and
Patrick A. Naylor and
Christopher M. Hicks and
Dave Betts and
Mohammad A. Dmour and
Sòren Holdt Jensen Single-Channel Online Enhancement of
Speech Corrupted by Reverberation and
Noise . . . . . . . . . . . . . . . . . 572--587
Aleksandr Sizov and
Kong Aik Lee and
Tomi Kinnunen Direct Optimization of the Detection
Cost for $I$-Vector-Based Spoken
Language Recognition . . . . . . . . . . 588--597
Imran Sheikh and
Dominique Fohr and
Irina Illina and
Georges Linar\`es Modelling Semantic Context of OOV Words
in Large Vocabulary Continuous Speech
Recognition . . . . . . . . . . . . . . 598--610
Mojtaba Farmani and
Michael Syskind Pedersen and
Zheng-Hua Tan and
Jesper Jensen Informed Sound Source Localization Using
Relative Transfer Functions for Hearing
Aid Applications . . . . . . . . . . . . 611--623
C. M. Vikram and
S. R. Mahadeva Prasanna Epoch Extraction From Telephone Quality
Speech Using Single Pole Filter . . . . 624--636
Motoi Omachi and
Tetsuji Ogawa and
Tetsunori Kobayashi Associative Memory Model-Based Linear
Filtering and Its Application to Tandem
Connectionist Blind Source Separation 637--650
Dani Cherkassky and
Sharon Gannot Blind Synchronization in Wireless
Acoustic Sensor Networks . . . . . . . . 651--661
Laurent Girin and
Thomas Hueber and
Xavier Alameda-Pineda Extending the Cascaded Gaussian Mixture
Regression Framework for Cross-Speaker
Acoustic-Articulatory Mapping . . . . . 662--673
Mohamad Hasan Bahari and
Alexander Bertrand and
Marc Moonen Blind Sampling Rate Offset Estimation
for Wireless Acoustic Sensor Networks
Through Weighted Least-Squares Coherence
Drift Estimation . . . . . . . . . . . . 674--686
Adam Kuklasi\'nski and
Simon Doclo and
Sòren Holdt Jensen and
Jesper Jensen Correction to ``Maximum Likelihood PSD
Estimation for Speech Enhancement in
Reverberation and Noise'' . . . . . . . 687--687
Anonymous Table of Contents . . . . . . . . . . . 688--689
Anonymous Table of Contents . . . . . . . . . . . 690--691
Sharon Gannot and
Emmanuel Vincent and
Shmulik Markovich-Golan and
Alexey Ozerov A Consolidated Perspective on
Multimicrophone Speech Enhancement and
Source Separation . . . . . . . . . . . 692--730
Dongwen Ying and
Ruohua Zhou and
Junfeng Li and
Yonghong Yan Window-Dominant Signal Subspace Methods
for Multiple Short-Term Speech Source
Localization . . . . . . . . . . . . . . 731--744
Sean U. N. Wood and
Jean Rouat and
Stéphane Dupont and
Gueorgui Pironkov Blind Speech Separation and Enhancement
With GCC-NMF . . . . . . . . . . . . . . 745--755
Constantin Spille and
Birger Kollmeier and
Bernd T. Meyer Combining Binaural and Cortical Features
for Robust Speech Recognition . . . . . 756--767
Yuma Koizumi and
Kenta Niwa and
Yusuke Hioka and
Kazunori Kobayashi and
Hitoshi Ohmuro Informative Acoustic Feature Selection
to Maximize Mutual Information for
Collecting Target Sources . . . . . . . 768--779
Takuya Higuchi and
Nobutaka Ito and
Shoko Araki and
Takuya Yoshioka and
Marc Delcroix and
Tomohiro Nakatani Online MVDR Beamformer Based on Complex
Gaussian Mixture Model With Spatial
Prior for Noise Robust ASR . . . . . . . 780--793
Eita Nakamura and
Kazuyoshi Yoshii and
Shigeki Sagayama Rhythm Transcription of Polyphonic Piano
Music Based on Merged-Output HMM for
Multiple Voices . . . . . . . . . . . . 794--806
Omid Ghahabi and
Javier Hernando Deep Learning Backend for Single and
Multisession $i$-Vector Speaker
Recognition . . . . . . . . . . . . . . 807--817
Penny Karanasou and
Chunyang Wu and
Mark Gales and
Philip C. Woodland $I$-Vectors and Structured Neural
Networks for Rapid Adaptation of
Acoustic Models . . . . . . . . . . . . 818--828
G. Aneeja and
B. Yegnanarayana Extraction of Fundamental Frequency From
Degraded Speech Using Temporal Envelopes
at High SNR Frequencies . . . . . . . . 829--838
Seyyed Saeed Sarfjoo and
Cenk Demiro\uglu and
Simon King Using Eigenvoices and Nearest-Neighbors
in HMM-Based Cross-Lingual Speaker
Adaptation With Limited Data . . . . . . 839--851
Yung-Yue Chen and
Jia-Hao Zhang Background Noise Reduction Design for
Dual Microphone Cellular Phones: Robust
Approach . . . . . . . . . . . . . . . . 852--862
Liner Yang and
Xinxiong Chen and
Zhiyuan Liu and
Maosong Sun Improving Word Representations with
Document Labels . . . . . . . . . . . . 863--870
Shiliang Zhang and
Cong Liu and
Hui Jiang and
Si Wei and
Lirong Dai and
Yu Hu Nonrecurrent Neural Structure for
Long-Term Dependence . . . . . . . . . . 871--884
Xuefeng Yang and
Kezhi Mao Task Independent Fine Tuning for Word
Embeddings . . . . . . . . . . . . . . . 885--894
Yu Bao and
Huawei Chen Design of Robust Broadband Beamformers
Using Worst-Case Performance
Optimization: a Semidefinite Programming
Approach . . . . . . . . . . . . . . . . 895--907
Sandro Cumani and
Pietro Laface Nonlinear I-Vector Transformations for
PLDA-Based Speaker Recognition . . . . . 908--919
Anonymous IEEE\slash ACM Transactions on Audio,
Speech, and Language Processing Edics 920--921
Anonymous IEEE Transactions on Audio, Speech, and
Language Processing information for
authors . . . . . . . . . . . . . . . . 922--923
Anonymous Introducing IEEE Collabratec . . . . . . 924
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE Signal Processing Society . . . . . C2
Anonymous IEEE Signal Processing Society . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 925--926
Anonymous Table of Contents . . . . . . . . . . . 927--928
Manu Airaksinen and
Tom Bäckström and
Paavo Alku Quadratic Programming Approach to
Glottal Inverse Filtering by Joint
Norm-1 and Norm-2 Optimization . . . . . 929--939
Ofer Schwartz and
Sharon Gannot and
Emanuël A. P. Habets Multispeaker LCMV Beamformer and
Postfilter for Source Separation and
Noise Reduction . . . . . . . . . . . . 940--951
Dongmei Wang and
Chengzhu Yu and
John H. L. Hansen Robust Harmonic Features for
Classification-Based Pitch Estimation 952--964
Tara N. Sainath and
Ron J. Weiss and
Kevin W. Wilson and
Bo Li and
Arun Narayanan and
Ehsan Variani and
Michiel Bacchiani and
Izhak Shafran and
Andrew Senior and
Kean Chin and
Ananya Misra and
Chanwoo Kim Multichannel Signal Processing With Deep
Neural Networks for Automatic Speech
Recognition . . . . . . . . . . . . . . 965--979
Hanieh Khalilian and
Ivan V. Baji\'c and
Rodney G. Vaughan A Simulation Study of a
Three-Dimensional Sound Field
Reproduction System for Immersive
Communication . . . . . . . . . . . . . 980--995
Andreas Franck and
Wenwu Wang and
Filippo Maria Fazi Sparse $ \ell_1$-Optimal
Multiloudspeaker Panning and Its
Relation to Vector Base Amplitude
Panning . . . . . . . . . . . . . . . . 996--1010
Songbin Li and
Yizhen Jia and
C.-C. Jay Kuo Steganalysis of QIM Steganography in
Low-Bit-Rate Speech Signals . . . . . . 1011--1022
Naoyuki Kanda and
Xugang Lu and
Hisashi Kawai Maximum-a-Posteriori-Based Decoding for
End-to-End Acoustic Models . . . . . . . 1023--1034
Navid Shokouhi and
John H. L. Hansen Teager--Kaiser Energy Operators for
Overlapped Speech Detection . . . . . . 1035--1047
Yi-Chin Huang and
Chung-Hsien Wu and
Yan-You Chen and
Ming-Ge Shie and
Jhing-Fa Wang Personalized Spontaneous Speech
Synthesis Using a Small-Sized
Unsegmented Semispontaneous Speech . . . 1048--1060
Jeongsoo Park and
Jaeyoung Shin and
Kyogu Lee Exploiting Continuity/Discontinuity of
Basis Vectors in Spectrogram
Decomposition for Harmonic-Percussive
Sound Separation . . . . . . . . . . . . 1061--1074
Xueliang Zhang and
DeLiang Wang Deep Learning Based Binaural Speech
Separation in Reverberant Environments 1075--1084
Masood Delfarah and
DeLiang Wang Features for Masking-Based Monaural
Speech Separation in Reverberant
Conditions . . . . . . . . . . . . . . . 1085--1094
Feiran Yang and
Gerald Enzner and
Jun Yang Statistical Convergence Analysis for
Optimal Control of DFT-Domain Adaptive
Echo Canceler . . . . . . . . . . . . . 1095--1106
Takashi Nose and
Yusuke Arao and
Takao Kobayashi and
Komei Sugiura and
Yoshinori Shiga Sentence Selection Based on Extended
Entropy Using Phonetic and Prosodic
Contexts for Statistical Parametric
Speech Synthesis . . . . . . . . . . . . 1107--1116
Gergely Firtha and
Péter Fiala and
Frank Schultz and
Sascha Spors Improved Referencing Schemes for 2.5D
Wave Field Synthesis Driving Functions 1117--1127
Esteban Maestre and
Gary P. Scavone and
Julius O. Smith Joint Modeling of Bridge Admittance and
Body Radiativity for Efficient Synthesis
of String Instrument Sound by Digital
Waveguides . . . . . . . . . . . . . . . 1128--1139
Gongping Huang and
Jacob Benesty and
Jingdong Chen On the Design of Frequency-Invariant
Beampatterns With Uniform Circular
Microphone Arrays . . . . . . . . . . . 1140--1153
Zden\vek Pr\ru\vsa and
Peter Balazs and
Peter Lempel Sòndergaard A Noniterative Method for Reconstruction
of Phase From STFT Magnitude . . . . . . 1154--1164
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 1167--1168
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
for authors . . . . . . . . . . . . . . 1169--1170
Anonymous Open Access . . . . . . . . . . . . . . 1171
Anonymous Introducing IEEE Collabratec . . . . . . 1172
Anonymous Member Get-A-Member (MGM) Program . . . 1173
Anonymous Blank Page . . . . . . . . . . . . . . . B1165--B1166
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE Signal Processing Society . . . . . C2
Anonymous IEEE Signal Processing Society . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1167--1168
G. Richard and
T. Virtanen and
J. P. Bello and
N. Ono and
H. Glotin Introduction to the Special Section on
Sound Scene and Event Analysis . . . . . 1169--1171
Héctor A. Sánchez-Hevia and
David Ayllón and
Roberto Gil-Pita and
Manuel Rosa-Zurera Maximum Likelihood Decision Fusion for
Weapon Classification in Wireless
Acoustic Sensor Networks . . . . . . . . 1172--1182
Nithin Rao Koluguri and
G. Nisha Meenakshi and
Prasanta Kumar Ghosh Spectrogram Enhancement Using Multiple
Window Savitzky--Golay (MWSG) Filter for
Robust Bird Sound Detection . . . . . . 1183--1192
Dan Stowell and
Emmanouil Benetos and
Lisa F. Gill On-Bird Sound Recordings: Automatic
Acoustic Recognition of Activities and
Contexts . . . . . . . . . . . . . . . . 1193--1206
Brandon T. Carroll and
Bradley M. Whitaker and
Wayne Dayley and
David V. Anderson Outlier Learning via Augmented Frozen
Dictionaries . . . . . . . . . . . . . . 1207--1215
Victor Bisot and
Romain Serizel and
Slim Essid and
Gaël Richard Feature Learning With Matrix
Factorization Applied to Acoustic Scene
Classification . . . . . . . . . . . . . 1216--1229
Yong Xu and
Qiang Huang and
Wenwu Wang and
Peter Foster and
Siddharth Sigtia and
Philip J. B. Jackson and
Mark D. Plumbley Unsupervised Feature Learning Based on
Deep Models for Environmental Audio
Tagging . . . . . . . . . . . . . . . . 1230--1241
René Grzeszick and
Axel Plinge and
Gernot A. Fink Bag-of-Features Methods for Acoustic
Event Detection and Classification . . . 1242--1252
Alain Rakotomamonjy Supervised Representation Learning for
Audio Scene Classification . . . . . . . 1253--1265
Emmanouil Benetos and
Grégoire Lafay and
Mathieu Lagrange and
Mark D. Plumbley Polyphonic Sound Event Tracking Using
Linear Dynamical Systems . . . . . . . . 1266--1277
Huy Phan and
Lars Hertel and
Marco Maass and
Philipp Koch and
Radoslaw Mazur and
Alfred Mertins Improved Audio Scene Classification
Based on Label-Tree Embeddings and
Convolutional Neural Networks . . . . . 1278--1290
Emre Çak\i r and
Giambattista Parascandolo and
Toni Heittola and
Heikki Huttunen and
Tuomas Virtanen Convolutional Recurrent Neural Networks
for Polyphonic Sound Event Detection . . 1291--1303
Jens Schröder and
Niko Moritz and
Jörn Anemüller and
Stefan Goetze and
Birger Kollmeier Classifier Architectures for Acoustic
Scenes and Events: Implications for
DNNs, TDNNs, and Perceptual Features
from DCASE 2016 . . . . . . . . . . . . 1304--1314
Wenjun Yang and
Sridhar Krishnan Combining Temporal Features by Local
Binary Pattern for Acoustic Scene
Classification . . . . . . . . . . . . . 1315--1321
David Dov and
Ronen Talmon and
Israel Cohen Multimodal Kernel Method for Activity
Detection of Sound Sources . . . . . . . 1322--1334
Keisuke Imoto and
Nobutaka Ono Spatial Cepstrum as a Spatial Feature
Using a Distributed Microphone Array for
Acoustic Scene Analysis . . . . . . . . 1335--1343
Ivo Trowitzsch and
Johannes Mohr and
Youssef Kashef and
Klaus Obermayer Robust Detection of Environmental Sounds
in Binaural Auditory Scenes . . . . . . 1344--1356
Abu Shafin Mohammad Mahdee Jameel and
Shaikh Anowarul Fattah and
Rajib Goswami and
Wei-Ping Zhu and
M. Omair Ahmad Noise Robust Formant Frequency
Estimation Method Based on Spectral
Model of Repeated Autocorrelation of
Speech . . . . . . . . . . . . . . . . . 1357--1370
Na Li and
Man-Wai Mak and
Jen-Tzung Chien DNN-Driven Mixture of PLDA for Robust
Speaker Verification . . . . . . . . . . 1371--1383
Kai Wu and
Vaninirappuputhenpurayil Gopalan Reju and
Andy W. H. Khong and
Shu Ting Goh Swarm Intelligence Based Particle Filter
for Alternating Talker Localization and
Tracking Using Microphone Arrays . . . . 1384--1397
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 1398--1399
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
for authors . . . . . . . . . . . . . . 1400--1401
Anonymous Open Access . . . . . . . . . . . . . . 1402
Anonymous Introducing IEEE Collabratec . . . . . . 1403
Anonymous Member Get-A-Member (MGM) Program . . . 1404
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE Signal Processing Society . . . . . C2
Anonymous IEEE Signal Processing Society . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1405--1406
Anonymous Table of Contents Edics . . . . . . . . 1407--1408
Yu-An Chen and
Ju-Chiang Wang and
Yi-Hsuan Yang and
Homer H. Chen Component Tying for Mixture Model
Adaptation in Personalization of Music
Emotion Recognition . . . . . . . . . . 1409--1420
Hossein Zeinali and
Hossein Sameti and
Luká\vs Burget HMM-Based Phrase-Independent $i$-Vector
Extractor for Text-Dependent Speaker
Verification . . . . . . . . . . . . . . 1421--1435
Xinzhou Xu and
Jun Deng and
Nicholas Cummins and
Zixing Zhang and
Chen Wu and
Li Zhao and
Björn Schuller A Two-Dimensional Framework of Multiple
Kernel Subspace Learning for Recognizing
Emotion in Speech . . . . . . . . . . . 1436--1449
Mandy Korpusik and
James Glass Spoken Language Understanding for a
Nutrition Dialogue System . . . . . . . 1450--1461
Mahmoud Fakhry and
Piergiorgio Svaizer and
Maurizio Omologo Audio Source Separation in Reverberant
Environments Using $ \beta
$-Divergence-Based Nonnegative
Factorization . . . . . . . . . . . . . 1462--1476
Bracha Laufer-Goldshtein and
Ronen Talmon and
Sharon Gannot Semi-Supervised Source Localization on
Multiple Manifolds With Distributed
Microphones . . . . . . . . . . . . . . 1477--1491
Donald S. Williamson and
DeLiang Wang Time-Frequency Masking in the Complex
Domain for Speech Dereverberation and
Denoising . . . . . . . . . . . . . . . 1492--1501
Liang Lu and
Steve Renals Small-Footprint Highway Deep Neural
Networks for Speech Recognition . . . . 1502--1511
Ina Kodrasi and
Simon Doclo Signal-Dependent Penalty Functions for
Robust Acoustic Multi-Channel
Equalization . . . . . . . . . . . . . . 1512--1525
Jung-Hee Kim and
Jin Kim and
Jae Hyeon Jeon and
Sang Won Nam Delayless Individual-Weighting-Factors
Sign Subband Adaptive Filter With
Band-Dependent Variable Step-Sizes . . . 1526--1534
Yannan Wang and
Jun Du and
Li-Rong Dai and
Chin-Hui Lee A Gender Mixture Detection Approach to
Unsupervised Single-Channel Speech
Separation Based on Deep Neural Networks 1535--1546
Giacomo Vairetti and
Enzo De Sena and
Michael Catrysse and
Sòren Holdt Jensen and
Marc Moonen and
Toon van Waterschoot A Scalable Algorithm for Physically
Motivated and Sparse Approximation of
Room Impulse Responses With Orthonormal
Basis Functions . . . . . . . . . . . . 1547--1561
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 1562--1563
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 1564--1565
Anonymous Open Access . . . . . . . . . . . . . . 1566
Anonymous Introducing IEEE Collabratec . . . . . . 1567
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE Signal Processing Society . . . . . C2
Anonymous IEEE Signal Processing Society . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1562--1563
Anonymous Table of Contents . . . . . . . . . . . 1564--1565
Francis Stevens and
Damian T. Murphy and
Lauri Savioja and
Vesa Välimäki Modeling Sparsely Reflecting Outdoor
Acoustic Scenes Using the Waveguide Web 1566--1578
Ferdinando Olivieri and
Filippo Maria Fazi and
Simone Fontana and
Dylan Menzies and
Philip Arthur Nelson Generation of Private Sound With a
Circular Loudspeaker Array and the
Weighted Pressure Matching Method . . . 1579--1591
Samy Elshamy and
Nilesh Madhu and
Wouter Tirry and
Tim Fingscheidt Instantaneous A Priori SNR Estimation by
Cepstral Excitation Manipulation . . . . 1592--1605
Paavo Alku and
Rahim Saeidi The Linear Predictive Modeling of Speech
From Higher-Lag Autocorrelation
Coefficients Applied to Noise-Robust
Speaker Recognition . . . . . . . . . . 1606--1617
Cheng Pang and
Hong Liu and
Jie Zhang and
Xiaofei Li Binaural Sound Localization Based on
Reverberation Weighting and Generalized
Parametric Mapping . . . . . . . . . . . 1618--1632
Somanath Pradhan and
Vinal Patel and
Dipen Somani and
Nithin V. George An Improved Proportionate Delayless
Multiband-Structured Subband Adaptive
Feedback Canceller for Digital Hearing
Aids . . . . . . . . . . . . . . . . . . 1633--1643
Szymon Drgas and
Tuomas Virtanen and
Jörg Lücke and
Antti Hurmalainen Binary Non-Negative Matrix Deconvolution
for Audio Dictionary Learning . . . . . 1644--1656
Fatemeh Saki and
Nasser Kehtarnavaz Real-Time Unsupervised Classification of
Environmental Noise Signals . . . . . . 1657--1667
Lakshmish Kaushik and
Abhijeet Sangwan and
John H. L. Hansen Automatic Sentiment Detection in
Naturalistic Audio . . . . . . . . . . . 1668--1679
Ofer Schwartz and
Sharon Gannot and
Emanuël A. P. Habets Cramér--Rao Bound Analysis of
Reverberation Level Estimators for
Dereverberation and Noise Reduction . . 1680--1693
Seyran Khademi and
Richard C. Hendriks and
W. Bastiaan Kleijn Intelligibility Enhancement Based on
Mutual Information . . . . . . . . . . . 1694--1708
Yuta Hatano and
Chuang Shi and
Yoshinobu Kajikawa Compensation for Nonlinear Distortion of
the Frequency Modulation-Based
Parametric Array Loudspeaker . . . . . . 1709--1717
Yu-Ren Chien and
Daryush D. Mehta and
Jón Gu\ethnason and
Matías Zañartu and
Thomas F. Quatieri Evaluation of Glottal Inverse Filtering
Algorithms Using a Physiologically Based
Articulatory Speech Synthesizer . . . . 1718--1730
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 1731--1732
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 1733--1734
Anonymous Open Access . . . . . . . . . . . . . . 1735
Anonymous Introducing IEEE Collabratec . . . . . . 1736
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE Signal Processing Society . . . . . C2
Anonymous IEEE Signal Processing Society . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1737--1738
Anonymous Table of Contents . . . . . . . . . . . 1739--1740
Jakob Abeßer and
Gerald Schuller Instrument-Centered Music Transcription
of Solo Bass Guitar Recordings . . . . . 1741--1750
Thomas Le Cornu and
Ben Milner Generating Intelligible Audio Speech
From Visual Speech . . . . . . . . . . . 1751--1761
Lemao Liu and
Atsushi Fujita and
Masao Utiyama and
Andrew Finch and
Eiichiro Sumita Translation Quality Estimation Using
Only Bilingual Corpora . . . . . . . . . 1762--1772
Emad M. Grais and
Gerard Roma and
Andrew J. R. Simpson and
Mark D. Plumbley Two-Stage Single-Channel Audio Source
Separation Using Deep Neural Networks 1773--1783
Giuliano Bernardi and
Toon van Waterschoot and
Jan Wouters and
Marc Moonen Adaptive Feedback Cancellation Using a
Partitioned-Block Frequency-Domain
Kalman Filter Approach With PEM-Based
Signal Prewhitening . . . . . . . . . . 1784--1798
Vinal Patel and
Jordan Cheer and
Nithin V. George Modified Phase-Scheduled-Command FxLMS
Algorithm for Active Sound Profiling . . 1799--1808
Killian Janod and
Mohamed Morchid and
Richard Dufour and
Georges Linar\`es and
Renato De Mori Denoised Bottleneck Features From Deep
Autoencoders for Telephone Conversation
Analysis . . . . . . . . . . . . . . . . 1809--1820
Nikolaos Stefanakis and
Despoina Pavlidi and
Athanasios Mouchtaris Perpendicular Cross-Spectra Fusion for
Sound Source Localization With a Planar
Microphone Array . . . . . . . . . . . . 1821--1835
Takenori Yoshimura and
Kei Hashimoto and
Keiichiro Oura and
Yoshihiko Nankaku and
Keiichi Tokuda Simultaneous Optimization of Multiple
Tree-Based Factor Analyzed HMM for
Speech Synthesis . . . . . . . . . . . . 1836--1845
Eita Nakamura and
Kazuyoshi Yoshii and
Simon Dixon Note Value Recognition for Piano
Transcription Using Markov Random Fields 1846--1858
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 1859--1860
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 1861--1862
Anonymous Open Access . . . . . . . . . . . . . . 1863
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE Signal Processing Society . . . . . C2
Anonymous IEEE Signal Processing Society . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1859--1860
Anonymous Table of Contents . . . . . . . . . . . 1861--1862
Xiaohai Tian and
Siu Wa Lee and
Zhizheng Wu and
Eng Siong Chng and
Haizhou Li An Exemplar-Based Approach to Frequency
Warping for Voice Conversion . . . . . . 1863--1876
Siying Wang and
Sebastian Ewert and
Simon Dixon Identifying Missing and Extra Notes in
Piano Recordings Using Score-Informed
Dictionary Learning . . . . . . . . . . 1877--1889
Sandro Cumani and
Pietro Laface Joint Estimation of PLDA and Nonlinear
Transformations of Speaker Vectors . . . 1890--1900
Morten Kolbæk and
Dong Yu and
Zheng-Hua Tan and
Jesper Jensen Multitalker Speech Separation With
Utterance-Level Permutation Invariant
Training of Deep Recurrent Neural
Networks . . . . . . . . . . . . . . . . 1901--1913
Cheng-Tao Chung and
Cheng-Yu Tsai and
Chia-Hsiang Liu and
Lin-Shan Lee Unsupervised Iterative Deep Learning of
Speech Features and Acoustic Tokens with
Applications to Spoken Term Detection 1914--1928
Niccol\`o Antonello and
Enzo De Sena and
Marc Moonen and
Patrick A. Naylor and
Toon van Waterschoot Room Impulse Response Interpolation
Using a Sparse Spatio-Temporal
Representation of the Sound Field . . . 1929--1941
Yanmin Qian and
Nanxin Chen and
Heinrich Dinkel and
Zhizheng Wu Deep Feature Engineering for Noise
Robust Spoofing Detection . . . . . . . 1942--1955
Sina Hafezi and
Alastair H. Moore and
Patrick A. Naylor Augmented Intensity Vectors for
Direction of Arrival Estimation in the
Spherical Harmonic Domain . . . . . . . 1956--1968
Byeongho Jo and
Jung-Woo Choi Spherical Harmonic Smoothing for
Localizing Coherent Sound Sources . . . 1969--1984
Emma Jokinen and
Ulpu Remes and
Paavo Alku Intelligibility Enhancement of Telephone
Speech Using Gaussian Process Regression
for Normal-to-Lombard Spectral Tilt
Conversion . . . . . . . . . . . . . . . 1985--1996
Xiaofei Li and
Laurent Girin and
Radu Horaud and
Sharon Gannot Multiple-Speaker Localization Based on
Direct-Path Features and Likelihood
Maximization With Spatial Sparsity
Regularization . . . . . . . . . . . . . 1997--2012
Marc Arnela and
Oriol Guasch Finite Element Synthesis of Diphthongs
Using Tuned Two-Dimensional Vocal Tracts 2013--2023
Deepak Baby and
Hugo Van hamme Joint Denoising and Dereverberation
Using Exemplar-Based Sparse
Representations and Decaying Norm
Constraint . . . . . . . . . . . . . . . 2024--2035
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 2036--2037
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 2038--2039
Anonymous Open Access . . . . . . . . . . . . . . 2040
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE Signal Processing Society . . . . . C2
Anonymous IEEE Signal Processing Society . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 2041--2042
Anonymous Table of Contents . . . . . . . . . . . 2043--2044
Qinghua Huang and
Lin Zhang and
Yong Fang Two-Stage Decoupled DOA Estimation Based
on Real Spherical Harmonics for
Spherical Arrays . . . . . . . . . . . . 2045--2058
Tomoki Hayashi and
Shinji Watanabe and
Tomoki Toda and
Takaaki Hori and
Jonathan Le Roux and
Kazuya Takeda Duration-Controlled LSTM for Polyphonic
Sound Event Detection . . . . . . . . . 2059--2070
Monisankha Pal and
Goutam Saha Spectral Mapping Using Prior
Re-Estimation of $i$-Vectors and System
Fusion for Voice Conversion . . . . . . 2071--2084
Seppo Enarvi and
Peter Smit and
Sami Virpioja and
Mikko Kurimo Automatic Speech Recognition With Very
Large Conversational Finnish and
Estonian Vocabularies . . . . . . . . . 2085--2097
Hannah Muckenhirn and
Pavel Korshunov and
Mathew Magimai-Doss and
Sébastien Marcel Long-Term Spectral Statistics for Voice
Presentation Attack Detection . . . . . 2098--2111
Brian Hamilton and
Stefan Bilbao FDTD Methods for $3$-D Room Acoustics
Simulation With High-Order Accuracy in
Space and Time . . . . . . . . . . . . . 2112--2124
Pejman Mowlaee and
Martin Blass and
W. Bastiaan Kleijn New Results in Modulation-Domain
Single-Channel Speech Enhancement . . . 2125--2137
Dylan Menzies and
Filippo Maria Fazi Decoding and Compression of Channel and
Scene Objects for Spatial Audio . . . . 2138--2151
Eunwoo Song and
Frank K. Soong and
Hong-Goo Kang Effective Spectral and Excitation
Modeling Techniques for LSTM--RNN-Based
Speech Synthesis Systems . . . . . . . . 2152--2161
Pulkit Sharma and
Vinayak Abrol and
Anil Kumar Sao Deep-Sparse-Representation-Based
Features for Speech Recognition . . . . 2162--2175
Iynkaran Natgunanathan and
Yong Xiang and
Guang Hua and
Gleb Beliakov and
John Yearwood Patchwork-Based Multilayer Audio
Watermarking . . . . . . . . . . . . . . 2176--2187
Chengzhu Yu and
John H. L. Hansen Active Learning Based Constrained
Clustering For Speaker Diarization . . . 2188--2198
Emil Solsbæk Ottosen and
Monika Dörfler A Phase Vocoder Based on Nonstationary
Gabor Frames . . . . . . . . . . . . . . 2199--2208
Boaz Schwartz and
Sharon Gannot and
Emanuël A. P. Habets Two Model-Based EM Algorithms for Blind
Source Separation in Noisy Environments 2209--2222
Maja Taseska and
Emanuël A. P. Habets Nonstationary Noise PSD Matrix
Estimation for Multichannel Blind Speech
Extraction . . . . . . . . . . . . . . . 2223--2236
Bruno Di Giorgi and
Simon Dixon and
Massimiliano Zanoni and
Augusto Sarti A Data-Driven Model of Tonal Chord
Sequence Complexity . . . . . . . . . . 2237--2250
N. Stefanakis and
D. Pavlidi and
A. Mouchtaris Corrections to ``Perpendicular
Cross-Spectra Fusion for Sound Source
Localization With a Planar Microphone
Array'' [Sep 17 1821--1835] . . . . . . 2251
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 2252--2253
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 2254--2255
Anonymous Open Access . . . . . . . . . . . . . . 2256
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE Signal Processing Society . . . . . C2
Anonymous IEEE Signal Processing Society . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 2252--2253
T. Schultz and
T. Hueber and
D. J. Krusienski and
J. S. Brumberg Introduction to the Special Issue on
Biosignal-Based Spoken Communication . . 2254--2256
Tanja Schultz and
Michael Wand and
Thomas Hueber and
Dean J. Krusienski and
Christian Herff and
Jonathan S. Brumberg Biosignal-Based Spoken Communication: a
Survey . . . . . . . . . . . . . . . . . 2257--2271
Christopher Dromey and
Katherine M. Black Effects of Laryngeal Activity on
Articulation . . . . . . . . . . . . . . 2272--2280
Michal Borsky and
Daryush D. Mehta and
Jarrad H. Van Stan and
Jon Gudnason Modal and Nonmodal Voice Quality
Classification Using Acoustic and
Electroglottographic Features . . . . . 2281--2291
Alborz Rezazadeh Sereshkeh and
Robert Trott and
Aurélien Bricout and
Tom Chau EEG Classification of Covert Speech
Using Regularized Neural Networks . . . 2292--2300
Reza Sahraeian and
Dirk Van Compernolle Crosslingual and Multilingual Speech
Recognition Based on the Speech Manifold 2301--2312
\Dbaror\dbare T. Grozdi\'c and
Slobodan T. Jovi\vci\'c Whispered Speech Recognition Using Deep
Denoising Autoencoder and Inverse
Filtering . . . . . . . . . . . . . . . 2313--2322
Myungjong Kim and
Beiming Cao and
Ted Mau and
Jun Wang Speaker-Independent Silent Speech
Recognition From Flesh-Point
Articulatory Movements Using an LSTM
Neural Network . . . . . . . . . . . . . 2323--2336
Patrick Lumban Tobing and
Kazuhiro Kobayashi and
Tomoki Toda Articulatory Controllable Speech
Modification Based on Statistical
Inversion and Production Mappings . . . 2337--2350
Ingmar Steiner and
Sébastien Le Maguer and
Alexander Hewer Synthesis of Tongue Motion and Acoustics
From Text Using a Multimodal
Articulatory Database . . . . . . . . . 2351--2361
Jose A. Gonzalez and
Lam A. Cheah and
Angel M. Gomez and
Phil D. Green and
James M. Gilbert and
Stephen R. Ell and
Roger K. Moore and
Ed Holdsworth Direct Speech Reconstruction From
Articulatory Sensor Data by Machine
Learning . . . . . . . . . . . . . . . . 2362--2374
Matthias Janke and
Lorenz Diener EMG-to-Speech: Direct Generation of
Speech From Facial Electromyographic
Signals . . . . . . . . . . . . . . . . 2375--2385
Geoffrey S. Meltzner and
James T. Heaton and
Yunbin Deng and
Gianluca De Luca and
Serge H. Roy and
Joshua C. Kline Silent Speech Recognition as an
Alternative Communication Device for
Persons With Laryngectomy . . . . . . . 2386--2398
Fei Chen and
Lan Wang and
Hui Chen and
Gang Peng Investigations on Mandarin Aspiratory
Animations Using an Airflow Model . . . 2399--2409
Wayne Xiong and
Jasha Droppo and
Xuedong Huang and
Frank Seide and
Michael L. Seltzer and
Andreas Stolcke and
Dong Yu and
Geoffrey Zweig Toward Human Parity in Conversational
Speech Recognition . . . . . . . . . . . 2410--2423
Biao Zhang and
Deyi Xiong and
Jinsong Su and
Hong Duan A Context-Aware Recurrent Encoder for
Neural Machine Translation . . . . . . . 2424--2432
Afsaneh Asaei and
Milos Cernak and
Hervé Bourlard Perceptual Information Loss due to
Impaired Speech Production . . . . . . . 2433--2443
Ning Ma and
Tobias May and
Guy J. Brown Exploiting Deep Neural Networks and Head
Movements for Robust Binaural
Localization of Multiple Sources in
Reverberant Environments . . . . . . . . 2444--2453
Anonymous List of Reviewers . . . . . . . . . . . 2454--2457
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 2458--2459
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 2460--2461
Anonymous Open Access . . . . . . . . . . . . . . 2462
Anonymous 2017 Subject Index \booktitleIEEE
Transactions on Applied
Superconductivity Vol. 27 . . . . . . . 2463--2488
Anonymous Front Cover . . . . . . . . . . . . . . C1
Anonymous IEEE Signal Processing Society . . . . . C2
Anonymous IEEE Signal Processing Society . . . . . C3
Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 1--2
Anonymous Table of Contents [Edics] . . . . . . . 3--4
Dianna Yee and
Homayoun Kamkar-Parsi and
Rainer Martin and
Henning Puder A Noise Reduction Postfilter for
Binaurally Linked Single-Microphone
Hearing Aids Utilizing a Nearby External
Microphone . . . . . . . . . . . . . . . 5--18
Tom Bäckstròm and
Johannes Fischer Fast Randomization for Distributed
Low-Bitrate Coding of Speech and Audio 19--30
Jun Deng and
Xinzhou Xu and
Zixing Zhang and
Sascha Frühholz and
Björn Schuller Semisupervised Autoencoders for Speech
Emotion Recognition . . . . . . . . . . 31--43
Md. Sahidullah and
Dennis Alexander Lehmann Thomsen and
Rosa Gonzalez Hautamäki and
Tomi Kinnunen and
Zheng-Hua Tan and
Robert Parts and
Martti Pitkänen Robust Voice Liveness Detection and
Speaker Verification Using Throat
Microphones . . . . . . . . . . . . . . 44--56
Gilles Degottex and
Pierre Lanchantin and
Mark Gales A Log Domain Pulse Model for Parametric
Speech Synthesis . . . . . . . . . . . . 57--70
Johannes Abel and
Tim Fingscheidt Artificial Speech Bandwidth Extension
Using Deep Neural Networks for Wideband
Spectral Envelope Estimation . . . . . . 71--83
Yuki Saito and
Shinnosuke Takamichi and
Hiroshi Saruwatari Statistical Parametric Speech Synthesis
Incorporating Generative Adversarial
Networks . . . . . . . . . . . . . . . . 84--96
Kristian Timm Andersen and
Marc Moonen Robust Speech-Distortion Weighted
Interframe Wiener Filters for
Single-Channel Noise Reduction . . . . . 97--107
Chen-Yu Chiang Cross-Dialect Adaptation Framework for
Constructing Prosodic Models for Chinese
Dialect Text-to-Speech Systems . . . . . 108--121
Bingquan Liu and
Zhen Xu and
Chengjie Sun and
Baoxun Wang and
Xiaolong Wang and
Derek F. Wong and
Min Zhang Content-Oriented User Modeling for
Personalized Response Ranking in
Chatbots . . . . . . . . . . . . . . . . 122--133
Zhiyuan Tang and
Dong Wang and
Yixiang Chen and
Lantian Li and
Andrew Abel Phonetic Temporal Neural Model for
Language Identification . . . . . . . . 134--144
Soumitro Chakrabarty and
Emanuël A. P. Habets A Bayesian Approach to Informed Spatial
Filtering With Robustness Against DOA
Estimation Errors . . . . . . . . . . . 145--160
Kuan-Yu Chen and
Shih-Hung Liu and
Berlin Chen and
Hsin-Min Wang An Information Distillation Framework
for Extractive Summarization . . . . . . 161--170
Ma Jin and
Yan Song and
Ian McLoughlin and
Li-Rong Dai LID-Senones and Their Statistics for
Language Identification . . . . . . . . 171--183
Zhehuai Chen and
Jasha Droppo and
Jinyu Li and
Wayne Xiong Progressive Joint Modeling in
Unsupervised Single-Channel Overlapped
Speech Recognition . . . . . . . . . . . 184--196
Shivesh Ranjan and
John H. L. Hansen Curriculum Learning Based Approaches for
Noise Robust Speaker Recognition . . . . 197--210
Yoshiaki Bando and
Katsutoshi Itoyama and
Masashi Konyo and
Satoshi Tadokoro and
Kazuhiro Nakadai and
Kazuyoshi Yoshii and
Tatsuya Kawahara and
Hiroshi G. Okuno Speech Enhancement Based on Bayesian
Low-Rank and Sparse Decomposition of
Multichannel Magnitude Spectrograms . . 215--230
Yu-Ping Ruan and
Qian Chen and
Zhen-Hua Ling A Sequential Neural Encoder With Latent
Structured Description for Modeling
Sentences . . . . . . . . . . . . . . . 231--242
Amelia J. Gully and
Helena Daffern and
Damian T. Murphy Diphthong Synthesis Using the Dynamic
$3$D Digital Waveguide Mesh . . . . . . 243--255
Chunyang Wu and
Mark J. F. Gales and
Anton Ragni and
Penny Karanasou and
Khe Chai Sim Improving Interpretability and
Regularization in Deep Learning . . . . 256--265
Kehai Chen and
Tiejun Zhao and
Muyun Yang and
Lemao Liu and
Akihiro Tamura and
Rui Wang and
Masao Utiyama and
Eiichiro Sumita A Neural Approach to Source Dependence
Based Context Model for Statistical
Machine Translation . . . . . . . . . . 266--280
Joonas Nikunen and
Aleksandr Diment and
Tuomas Virtanen Separation of Moving Sound Sources Using
Multichannel NMF and Acoustic Tracking 281--295
Johan Swärd and
Hongbin Li and
Andreas Jakobsson Off-Grid Fundamental Frequency
Estimation . . . . . . . . . . . . . . . 296--303
Dylan Menzies and
Marcos F. Simón Gálvez and
Filippo Maria Fazi A Low-Frequency Panning Method With
Compensation for Head Rotation . . . . . 304--317
Branimir Dropulji\'c and
Igor Miji\'c and
Davor Petrinovi\'c and
Tanja Jovanovic and
Kre\vsimir \'Cosi\'c Vocal Analysis of Acoustic Startle
Responses . . . . . . . . . . . . . . . 318--329
Philipp Aichinger and
Martin Hagmüller and
Berit Schneider-Stickler and
Jean Schoentgen and
Franz Pernkopf Tracking of Multiple Fundamental
Frequencies in Diplophonic Voices . . . 330--341
Anastasios Alexandridis and
Athanasios Mouchtaris Multiple Sound Source Location
Estimation in Wireless Acoustic Sensor
Networks Using DOA Estimates: The
Data-Association Problem . . . . . . . . 342--356
Robert Rehr and
Timo Gerkmann On the Importance of Super-Gaussian
Speech Priors for Machine-Learning Based
Speech Enhancement . . . . . . . . . . . 357--366
Sonia Djaziri-Larbi and
Gaël Mahé and
Imen Mezghani and
Monia Turki and
Mériem Ja\"\idane Watermark-Driven Acoustic Echo
Cancellation . . . . . . . . . . . . . . 367--378
Annamaria Mesaros and
Toni Heittola and
Emmanouil Benetos and
Peter Foster and
Mathieu Lagrange and
Tuomas Virtanen and
Mark D. Plumbley Detection and Classification of Acoustic
Scenes and Events: Outcome of the DCASE
2016 Challenge . . . . . . . . . . . . . 379--393
Cheng-Tao Chung and
Lin-Shan Lee Unsupervised Discovery of Structured
Acoustic Tokens With Applications to
Spoken Term Detection . . . . . . . . . 394--405
Tobias May Robust Speech Dereverberation With a
Neural Network-Based Post-Filter That
Exploits Multi-Conditional Training of
Binaural Cues . . . . . . . . . . . . . 406--414
Majid Mirbagheri and
Les Atlas and
Adrian K. C. Lee Regression Factor Analysis With an
Application to Continuous HRIR
Measurement . . . . . . . . . . . . . . 415--421
Jen-Tzung Chien Bayesian Nonparametric Learning for
Hierarchical and Sparse Topics . . . . . 422--435
Johannes Stahl and
Pejman Mowlaee A Pitch-Synchronous Simultaneous
Detection-Estimation Framework for
Speech Enhancement . . . . . . . . . . . 436--450
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of Contents . . . . . . . . . . . 457--458
Anonymous Table of Contents [Edics] . . . . . . . 459--460
C. D. Salvador and
S. Sakamoto and
J. Treviño and
Y. Suzuki Boundary Matching Filters for Spherical
Microphone and Loudspeaker Arrays . . . 461--474
A. H. Abdelaziz Comparing Fusion Models for DNN-Based
Audiovisual Continuous Speech
Recognition . . . . . . . . . . . . . . 475--484
S. Emura Residual Echo Reduction for Multichannel
Acoustic Echo Cancelers With a
Complex-Valued Residual Echo Estimate 485--500
V. H. Do and
N. F. Chen and
B. P. Lim and
M. A. Hasegawa-Johnson Multitask Learning for Phone Recognition
of Underresourced Languages Using
Mismatched Transcription . . . . . . . . 501--514
M. Zohourian and
G. Enzner and
R. Martin Binaural Speaker Localization Integrated
Into an Adaptive Beamformer for Hearing
Aids . . . . . . . . . . . . . . . . . . 515--528
Y. Xiang and
I. Natgunanathan and
D. Peng and
G. Hua and
B. Liu Spread Spectrum Audio Watermarking Using
Multiple Orthogonal PN Sequences and
Variable Embedding Strengths and
Polarities . . . . . . . . . . . . . . . 529--539
C. Tan and
F. Wei and
Q. Zhou and
N. Yang and
B. Du and
W. Lv and
M. Zhou Context-Aware Answer Sentence Selection
With Hierarchical Gated Recurrent Neural
Networks . . . . . . . . . . . . . . . . 540--549
J. Zhang and
S. P. Chepuri and
R. C. Hendriks and
R. Heusdens Microphone Subset Selection for MVDR
Beamformer Based Noise Reduction . . . . 550--563
S. Wang and
P. Lin and
Y. Tsao and
J. Hung and
B. Su Suppression by Selecting Wavelets for
Feature Compression in Distributed
Speech Recognition . . . . . . . . . . . 564--579
Y. Wang and
M. Brookes Model-Based Speech Enhancement in the
Modulation Domain . . . . . . . . . . . 580--594
C. Huemmer and
C. Hofmann and
R. Maas and
W. Kellermann Estimating Parameters of Nonlinear
Systems Using the Elitist Particle
Filter Based on Evolutionary Strategies 595--608
D. Salvati and
C. Drioli and
G. L. Foresti A Low-Complexity Robust Beamforming
Using Diagonal Unloading for Acoustic
Source Localization . . . . . . . . . . 609--622
J. Su and
J. Zeng and
D. Xiong and
Y. Liu and
M. Wang and
J. Xie A Hierarchy-to-Sequence Attentional
Neural Machine Translation Model . . . . 623--632
W. B. Kheder and
D. Matrouf and
M. Ajili and
J. Bonastre A Unified Joint Model to Deal With
Nuisance Variabilities in the $i$-Vector
Space . . . . . . . . . . . . . . . . . 633--645
G. Gelly and
J. Gauvain Optimization of RNN-Based Speech
Activity Detection . . . . . . . . . . . 646--656
M. Taseska and
E. A. P. Habets Blind Source Separation of Moving
Sources Using Sparsity-Based Source
Detection and Tracking . . . . . . . . . 657--670
L. Yu and
J. Wang and
K. R. Lai and
X. Zhang Refining Word Embeddings Using Intensity
Scores for Sentiment Analysis . . . . . 671--681
Y. Dorfan and
A. Plinge and
G. Hazan and
S. Gannot Distributed Expectation-Maximization
Algorithm for Speaker Localization in
Reverberant Environments . . . . . . . . 682--695
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 696--697
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 698--699
Anonymous Open Access . . . . . . . . . . . . . . 700--700
Anonymous Introducing IEEE Collabratec . . . . . . 701--701
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of Contents . . . . . . . . . . . 696--697
Anonymous Table of Contents [Edics] . . . . . . . 698--699
Z. Tan and
M. Mak and
B. K. Mak DNN-Based Score Calibration With
Multitask Learning for Noise Robust
Speaker Verification . . . . . . . . . . 700--712
Y. Hu and
Z. Ling Extracting Spectral Features Using Deep
Autoencoders With Binary Distributed
Hidden Units for Statistical Parametric
Speech Synthesis . . . . . . . . . . . . 713--724
B. Laufer-Goldshtein and
R. Talmon and
S. Gannot A Hybrid Approach for Speaker Tracking
Based on TDOA and Data-Driven Models . . 725--735
S. Cumani and
P. Laface Speaker Recognition Using e Vectors . . 736--748
L. Xu and
K. A. Lee and
H. Li and
Z. Yang Generalizing I-Vector Estimation for
Rapid Speaker Recognition . . . . . . . 749--759
Y. Buchris and
I. Cohen and
J. Benesty Frequency-Domain Design of Asymmetric
Circular Differential Microphone Arrays 760--773
J. Zhang and
T. D. Abhayapala and
W. Zhang and
P. N. Samarasinghe and
S. Jiang Active Noise Control Over Space: a Wave
Domain Approach . . . . . . . . . . . . 774--786
Y. Luo and
Z. Chen and
N. Mesgarani Speaker-Independent Speech Separation
With Deep Attractor Network . . . . . . 787--796
N. M. Joy and
S. R. Kothinti and
S. Umesh FMLLR Speaker Normalization With
i-Vector: In Pseudo-FMLLR and
Distillation Framework . . . . . . . . . 797--805
S. Chandna and
W. Wang Bootstrap Averaging for Model-Based
Source Separation in Reverberant
Conditions . . . . . . . . . . . . . . . 806--819
Z. Tan and
M. Mak and
B. K. Mak and
Y. Zhu Denoised Senone I-Vectors for Robust
Speaker Verification . . . . . . . . . . 820--830
K. Itakura and
Y. Bando and
E. Nakamura and
K. Itoyama and
K. Yoshii and
T. Kawahara Bayesian Multichannel Audio Source
Separation Based on Integrated Source
and Spatial Models . . . . . . . . . . . 831--846
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 847--848
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 849--850
Anonymous Open Access . . . . . . . . . . . . . . 851--851
Anonymous Introducing IEEE Collabratec . . . . . . 852--852
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Table of Contents . . . . . . . . . . . 853--854
Anonymous Table of Contents [Edics] . . . . . . . 855--856
Y. E. Baba and
A. Walther and
E. A. P. Habets $3$D Room Geometry Inference Based on
Room Impulse Response Stacks . . . . . . 857--872
Q. Zhang and
J. H. L. Hansen Language/Dialect Recognition Based on
Unsupervised Deep Learning . . . . . . . 873--882
Z. Ling and
Y. Ai and
Y. Gu and
L. Dai Waveform Modeling and Generation Using
Hierarchical Recurrent Neural Networks
for Speech Bandwidth Extension . . . . . 883--894
M. Delcroix and
K. Kinoshita and
A. Ogawa and
C. Huemmer and
T. Nakatani Context Adaptive Neural Network Based
Acoustic Models for Rapid Adaptation . . 895--908
L. T. T. Tran and
S. E. Nordholm and
H. Schepker and
H. H. Dam and
S. Doclo Two-Microphone Hearing Aids Using
Prediction Error Method for Adaptive
Feedback Control . . . . . . . . . . . . 909--923
J. Chang and
M. Marschall Periphony-Lattice Mixed-Order Ambisonic
Scheme for Spherical Microphone Arrays 924--936
N. Dionelis and
M. Brookes Phase-Aware Single-Channel Speech
Enhancement With Modulation-Domain
Kalman Filtering . . . . . . . . . . . . 937--950
C. Zheng and
A. Deleforge and
X. Li and
W. Kellermann Statistical Analysis of the Multichannel
Wiener Filter Using a Bivariate Normal
Distribution for Sample Covariance
Matrices . . . . . . . . . . . . . . . . 951--966
C. Vaz and
V. Ramanarayanan and
S. Narayanan Acoustic Denoising Using Dictionary
Learning With Spectral and Temporal
Regularization . . . . . . . . . . . . . 967--980
L. Wang and
A. Cavallaro Pseudo-Determined Blind Source
Separation for Ad-hoc Microphone
Networks . . . . . . . . . . . . . . . . 981--994
S. Cumani and
P. Laface Scoring Heterogeneous Speaker Vectors
Using Nonlinear Transformations and Tied
PLDA Models . . . . . . . . . . . . . . 995--1009
G. Bernardi and
T. van Waterschoot and
J. Wouters and
M. Moonen Subjective and Objective Sound-Quality
Evaluation of Adaptive Feedback
Cancellation Algorithms . . . . . . . . 1010--1024
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 1025--1026
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 1027--1028
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of Contents . . . . . . . . . . . 1025--1026
Anonymous Table of Contents [Edics] . . . . . . . 1027--1028
H. Kameoka and
T. Higuchi and
M. Tanaka and
L. Li Nonnegative Matrix Factorization With
Basis Clustering Using Cepstral Distance
Regularization . . . . . . . . . . . . . 1029--1040
J. Donley and
C. Ritz and
W. B. Kleijn Multizone Soundfield Reproduction With
Privacy- and Quality-Based Speech
Masking Filters . . . . . . . . . . . . 1041--1055
S. Braun and
A. Kuklasi ski and
O. Schwartz and
O. Thiergart and
E. A. P. Habets and
S. Gannot and
S. Doclo and
J. Jensen Evaluation and Comparison of Late
Reverberation Power Spectral Density
Estimators . . . . . . . . . . . . . . . 1056--1071
E. L. Benaroya and
N. Obin and
M. Liuni and
A. Roebel and
W. Raumel and
S. Argentieri Binaural Localization of Multiple Sound
Sources by Non-Negative Tensor
Factorization . . . . . . . . . . . . . 1072--1082
N. Perraudin and
N. Holighaus and
P. Majdak and
P. Balazs Inpainting of Long Audio Segments With
Similarity Graphs . . . . . . . . . . . 1083--1094
P. Magron and
R. Badeau and
B. David Model-Based STFT Phase Recovery for
Audio Source Separation . . . . . . . . 1095--1105
I. Kodrasi and
S. Doclo Analysis of Eigenvalue
Decomposition-Based Late Reverberation
Power Spectral Density Estimation . . . 1106--1118
S. Braun and
E. A. P. Habets Linear Prediction-Based Online
Dereverberation and Noise Reduction
Using Alternating Kalman Filters . . . . 1119--1129
D. Ram and
A. Asaei and
H. Bourlard Sparse Subspace Modeling for Query by
Example Spoken Term Detection . . . . . 1130--1143
M. Krawczyk-Becker and
T. Gerkmann On Speech Enhancement Under PSD
Uncertainty . . . . . . . . . . . . . . 1144--1153
S. Leglaive and
R. Badeau and
G. Richard Student's $t$-Source and Mixing Models
for Multichannel Audio Source Separation 1154--1168
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 1169--1170
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 1171--1172
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of Contents . . . . . . . . . . . 1173--1174
Anonymous Table of Contents [Edics] . . . . . . . 1175--1176
T. Yoshimura and
K. Hashimoto and
K. Oura and
Y. Nankaku and
K. Tokuda Mel-Cepstrum-Based Quantization Noise
Shaping Applied to Neural-Network-Based
Speech Waveform Synthesis . . . . . . . 1177--1184
Q. Wang and
J. Du and
L. Dai and
C. Lee A Multiobjective Learning and Ensembling
Approach to High-Performance Speech
Enhancement With Compact Neural Network
Architectures . . . . . . . . . . . . . 1185--1197
M. Á. Del-Agua and
A. Giménez and
A. Sanchis and
J. Civera and
A. Juan Speaker-Adapted Confidence Measures for
ASR Using Deep Bidirectional Recurrent
Neural Networks . . . . . . . . . . . . 1198--1206
J. Proença and
C. Lopes and
M. Tjalve and
A. Stolcke and
S. Candeias and
F. Perdigão Mispronunciation Detection in Children's
Reading of Sentences . . . . . . . . . . 1207--1219
Ljubi\vsa Stankovi\'c and
Milo\vs Brajovi\'c Analysis of the Reconstruction of Sparse
Signals in the DCT Domain Applied to
Audio Signals . . . . . . . . . . . . . 1220--1235
J. F. Santos and
T. H. Falk Speech Dereverberation With
Context-Aware Recurrent Neural Networks 1236--1246
M. Geronazzo and
S. Spagnol and
F. Avanzini Do We Need Individual Head-Related
Transfer Functions for Vertical
Localization? The Case Study of a
Spectral Notch Distance Metric . . . . . 1247--1260
D. Marquardt and
S. Doclo Interaural Coherence Preservation for
Binaural Noise Reduction Using Partial
Noise Estimation and Spectral
Postfiltering . . . . . . . . . . . . . 1261--1274
M. Farmani and
M. S. Pedersen and
Z. Tan and
J. Jensen Bias-Compensated Informed Sound Source
Localization Using Relative Transfer
Functions . . . . . . . . . . . . . . . 1275--1289
F. Tao and
C. Busso Gating Neural Network for Large
Vocabulary Audiovisual Speech
Recognition . . . . . . . . . . . . . . 1290--1302
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 1303--1304
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 1305--1306
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of Contents . . . . . . . . . . . 1303--1304
Anonymous Table of Contents [Edics] . . . . . . . 1305--1306
Z. Rafii and
A. Liutkus and
F. Stöter and
S. I. Mimilakis and
D. FitzGerald and
B. Pardo An Overview of Lead and Accompaniment
Separation in Music . . . . . . . . . . 1307--1335
C. Wang and
J. Wang and
A. Santoso and
C. Chiang and
C. Wu Sound Event Recognition Using
Auditory-Receptive-Field Binary Pattern
and Hierarchical-Diving Deep Belief
Network . . . . . . . . . . . . . . . . 1336--1351
L. Yang and
M. Zhang and
Y. Liu and
M. Sun and
N. Yu and
G. Fu Joint POS Tagging and Dependence Parsing
With Transition-Based Neural Networks 1352--1358
K. Yu and
Z. Zhao and
X. Wu and
H. Lin and
X. Liu Rich Short Text Conversation Using
Semantic-Key-Controlled Sequence
Generation . . . . . . . . . . . . . . . 1359--1368
B. Lehner and
J. Schlüter and
G. Widmer Online, Loudness-Invariant Vocal
Detection in Mixed Music Signals . . . . 1369--1380
S. Stone and
M. Marxen and
P. Birkholz Construction and Evaluation of a
Parametric One-Dimensional Vocal Tract
Model . . . . . . . . . . . . . . . . . 1381--1392
T. Tan and
Y. Qian and
H. Hu and
Y. Zhou and
W. Ding and
K. Yu Adaptive Very Deep Convolutional
Residual Network for Noise Robust Speech
Recognition . . . . . . . . . . . . . . 1393--1405
X. Wang and
S. Takaki and
J. Yamagishi Autoregressive Neural F0 Model for
Statistical Parametric Speech Synthesis 1406--1419
C. Valentini-Botinhao and
J. Yamagishi Speech Enhancement of Noisy and
Reverberant Speech for Text-to-Speech 1420--1433
A. I. Koutrouvelis and
T. W. Sherson and
R. Heusdens and
R. C. Hendriks A Low-Cost Robust Distributed Linearly
Constrained Beamformer for Wireless
Acoustic Sensor Networks With Arbitrary
Topology . . . . . . . . . . . . . . . . 1434--1448
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 1449--1450
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 1451--1452
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of Contents . . . . . . . . . . . 1453--1454
Anonymous Table of Contents [Edics] . . . . . . . 1455--1456
C. Wu and
C. Dittmar and
C. Southall and
R. Vogl and
G. Widmer and
J. Hockman and
M. Müller and
A. Lerch A Review of Automatic Drum Transcription 1457--1483
C. Evers and
P. A. Naylor Acoustic SLAM . . . . . . . . . . . . . 1484--1498
C. Laroche and
M. Kowalski and
H. Papadopoulos and
G. Richard Hybrid Projective Nonnegative Matrix
Factorization With Drum Dictionaries for
Harmonic/Percussive Source Separation 1499--1511
J. J. Carabias-Orti and
J. Nikunen and
T. Virtanen and
P. Vera-Candeas Multichannel Blind Sound Source
Separation Using Spatial Covariance
Model With Level and Time Differences
and Nonnegative Matrix Factorization . . 1512--1527
M. Zhang and
N. Yu and
G. Fu A Simple and Effective Neural Model for
Joint Word Segmentation and POS Tagging 1528--1538
D. Menzies and
F. M. Fazi A Complex Panning Method for Near-Field
Imaging . . . . . . . . . . . . . . . . 1539--1548
A. Misra and
J. H. L. Hansen Maximum-Likelihood Linear Transformation
for Unsupervised Domain Adaptation in
Speaker Verification . . . . . . . . . . 1549--1558
Y. Wakabayashi and
T. Fukumori and
M. Nakayama and
T. Nishiura and
Y. Yamashita Single-Channel Speech Enhancement With
Phase Reconstruction Based on Phase
Distortion Averaging . . . . . . . . . . 1559--1569
S. Fu and
T. Wang and
Y. Tsao and
X. Lu and
H. Kawai End-to-End Waveform Utterance
Enhancement for Direct Evaluation
Metrics Optimization by Fully
Convolutional Neural Networks . . . . . 1570--1584
K. Xiao and
S. Wang and
M. Wan and
L. Wu Radiated Noise Suppression for
Electrolarynx Speech Based on Multiband
Time-Domain Amplitude Modulation . . . . 1585--1593
A. Fahim and
P. N. Samarasinghe and
T. D. Abhayapala PSD Estimation and Source Separation in
a Noisy Reverberant Environment Using a
Spherical Microphone Array . . . . . . . 1594--1607
H. He and
J. Chen and
J. Benesty and
T. Yang Noise Robust Frequency-Domain Adaptive
Blind Multichannel Identification With$
\ell_p$-Norm Constraint . . . . . . . . 1608--1619
W. Zhang and
Z. Chen and
F. Yin and
Q. Zhang Melody Extraction From Polyphonic Music
Using Particle Filter and Dynamic
Programming . . . . . . . . . . . . . . 1620--1632
C. Zhang and
K. Koishida and
J. H. L. Hansen Text-Independent Speaker Verification
Based on Triplet Convolutional Neural
Network Embeddings . . . . . . . . . . . 1633--1644
A. R. MV and
P. K. Ghosh PSFM A Probabilistic Source Filter Model
for Noise Robust Glottal Closure Instant
Detection . . . . . . . . . . . . . . . 1645--1657
M. Airaksinen and
L. Juvela and
B. Bollepalli and
J. Yamagishi and
P. Alku A Comparison Between STRAIGHT, Glottal,
and Sinusoidal Vocoding in Statistical
Parametric Speech Synthesis . . . . . . 1658--1670
G. Mahé and
M. Ja\"\idane Perceptually Controlled Reshaping of
Sound Histograms . . . . . . . . . . . . 1671--1683
Q. Huang and
L. Zhang and
Y. Fang Two-Step Spherical Harmonics ESPRIT-Type
Algorithms and Performance Analysis . . 1684--1697
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 1698--1699
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 1700--1702
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of Contents . . . . . . . . . . . 1698--1699
Anonymous Table of Contents [Edics] . . . . . . . 1700--1701
D. Wang and
J. Chen Supervised Speech Separation Based on
Deep Learning: an Overview . . . . . . . 1702--1726
R. Wang and
M. Utiyama and
A. Finch and
L. Liu and
K. Chen and
E. Sumita Sentence Selection and Weighting for
Neural Machine Translation Domain
Adaptation . . . . . . . . . . . . . . . 1727--1741
F. U. Khan and
B. P. Milner and
T. Le Cornu Using Visual Speech Information in
Masking Methods for Audio Speaker
Separation . . . . . . . . . . . . . . . 1742--1754
X. Li and
S. Gannot and
L. Girin and
R. Horaud Multichannel Identification and
Nonnegative Equalization for
Dereverberation and Noise Reduction
Based on Convolutive Transfer Function 1755--1768
Lütfi Kerem \cSenel and
\.Ihsan Utlu and
Veysel Yücesoy and
Aykut Koç and
Tolga Çukur Semantic Structure and Interpretability
of Word Embeddings . . . . . . . . . . . 1769--1779
Y. Koizumi and
K. Niwa and
Y. Hioka and
K. Kobayashi and
Y. Haneda DNN-Based Source Enhancement to Increase
Objective Sound Quality Assessment Score 1780--1792
C. Paleologu and
J. Benesty and
S. Ciochin\ua Linear System Identification Based on a
Kronecker Product Decomposition . . . . 1793--1808
F. Xiong and
S. Goetze and
B. Kollmeier and
B. T. Meyer Exploring Auditory-Inspired Acoustic
Features for Room Acoustic Parameter
Estimation From Monaural Speech . . . . 1809--1820
G. Le Lan and
D. Charlet and
A. Larcher and
S. Meignier An Adaptive Method for Cross-Recording
Speaker Diarization . . . . . . . . . . 1821--1832
W. Xue and
A. H. Moore and
M. Brookes and
P. A. Naylor Modulation-Domain Multichannel Kalman
Filtering for Speech Enhancement . . . . 1833--1847
K. Wu and
V. G. Reju and
A. W. H. Khong Multisource DOA Estimation in a
Reverberant Environment Using a Single
Acoustic Vector Sensor . . . . . . . . . 1848--1859
J. Huang and
Y. Sun and
W. Zhang and
H. Wang and
T. Liu Entity Highlight Generation as
Statistical and Neural Machine
Translation . . . . . . . . . . . . . . 1860--1872
Q. T. Do and
S. Sakti and
S. Nakamura Sequence-to-Sequence Models for Emphasis
Speech Translation . . . . . . . . . . . 1873--1883
F. Fontana and
E. Bozzo Explicit Fixed-Point Computation of
Nonlinear Delay-Free Loop Filter
Networks . . . . . . . . . . . . . . . . 1884--1896
S. Widmark Causal IIR Audio Precompensator Filters
Subject to Quadratic Constraints . . . . 1897--1912
F. Winter and
H. Wierstorf and
C. Hold and
F. Krüger and
A. Raake and
S. Spors Colouration in Local Wave Field
Synthesis . . . . . . . . . . . . . . . 1913--1924
A. H. Andersen and
J. M. de Haan and
Z. Tan and
J. Jensen Nonintrusive Speech Intelligibility
Prediction Using Convolutional Neural
Networks . . . . . . . . . . . . . . . . 1925--1939
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 1940--1941
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 1942--1944
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of Contents . . . . . . . . . . . 1945--1946
Anonymous Table of Contents [Edics] . . . . . . . 1947--1948
H. Hadian and
H. Sameti and
D. Povey and
S. Khudanpur Flat-Start Single-Stage Discriminatively
Trained HMM-Based Models for ASR . . . . 1949--1961
F. Katzberg and
R. Mazur and
M. Maass and
P. Koch and
A. Mertins A Compressed Sensing Framework for
Dynamic Sound-Field Measurements . . . . 1962--1975
H. Sundar and
T. V. Sreenivas and
C. S. Seelamantula TDOA-Based Multiple Acoustic Source
Localization Without Association
Ambiguity . . . . . . . . . . . . . . . 1976--1990
R. Sahraeian and
D. Van Compernolle Cross-Entropy Training of DNN Ensemble
Acoustic Models for Low-Resource ASR . . 1991--2001
H. Dinkel and
Y. Qian and
K. Yu Investigating Raw Wave Deep Neural
Networks for End-to-End Speaker Spoofing
Detection . . . . . . . . . . . . . . . 2002--2014
J. Zhang and
R. Heusdens and
R. C. Hendriks Rate-Distributed Spatial Filtering Based
Noise Reduction in Wireless Acoustic
Sensor Networks . . . . . . . . . . . . 2015--2026
M. Heck and
S. Sakti and
S. Nakamura Dirichlet Process Mixture of Mixtures
Model for Unsupervised Subword Modeling 2027--2042
S. Nie and
S. Liang and
W. Liu and
X. Zhang and
J. Tao Deep Learning Based Speech Separation
via NMF-Style Reconstructions . . . . . 2043--2055
H. Dubey and
A. Sangwan and
J. H. L. Hansen Leveraging Frequency-Dependent Kernel
and DIP-Based Clustering for Robust
Speech Activity Detection in
Naturalistic Audio Streams . . . . . . . 2056--2071
Y. Jang and
J. Ham and
B. Lee and
K. Kim Cross-Language Neural Dialog State
Tracker for Large Ontologies Using
Hierarchical Attention . . . . . . . . . 2072--2082
G. Weisz and
P. Budzianowski and
P. Su and
M. Ga\vsi\'c Sample Efficient Deep Reinforcement
Learning for Dialogue Systems With Large
Action Spaces . . . . . . . . . . . . . 2083--2097
S. Lin Reverberation-Robust Localization of
Speakers Using Distinct Speech Onsets
and Multichannel Cross Correlations . . 2098--2111
S. Abidin and
R. Togneri and
F. Sohel Spectrotemporal Analysis Using Local
Binary Pattern Variants for Acoustic
Scene Classification . . . . . . . . . . 2112--2121
N. Ma and
J. A. Gonzalez and
G. J. Brown Robust Binaural Localization of a Target
Sound Source by Combining Spectral
Source Models and Deep Neural Networks 2122--2131
S. Wu and
D. Zhang and
Z. Zhang and
N. Yang and
M. Li and
M. Zhou Dependency-to-Dependency Neural Machine
Translation . . . . . . . . . . . . . . 2132--2141
J. Xu and
H. He and
X. Sun and
X. Ren and
S. Li Cross-Domain and Semisupervised Named
Entity Recognition in Chinese Social
Media: a Unified Model . . . . . . . . . 2142--2152
S. Van Kuyk and
W. B. Kleijn and
R. C. Hendriks An Evaluation of Intrusive Instrumental
Intelligibility Metrics . . . . . . . . 2153--2166
X. Ouyang and
K. Gu and
P. Zhou Spatial Pyramid Pooling Mechanism in 3D
Convolutional Network for Sentence-Level
Classification . . . . . . . . . . . . . 2167--2179
B. McFee and
J. Salamon and
J. P. Bello Adaptive Pooling Operators for Weakly
Labeled Sound Event Detection . . . . . 2180--2193
I. Barbancho and
G. Tzanetakis and
A. M. Barbancho and
L. J. Tardón Discrimination Between
Ascending/Descending Pitch Arpeggios . . 2194--2203
Y. Kim and
M. Kim and
J. Goo and
H. Kim Learning Self-Informed Feature
Contribution for Deep Learning-Based
Acoustic Modeling . . . . . . . . . . . 2204--2214
M. B. Çöteli and
O. Olgun and
H. Hacìhabibo\uglu Multiple Sound Source Localization With
Steered Response Power Density and
Hierarchical Grid Refinement . . . . . . 2215--2229
J. Bao and
Y. Gong and
N. Duan and
M. Zhou and
T. Zhao Question Generation With Doubly
Adversarial Nets . . . . . . . . . . . . 2230--2239
B. Bu and
C. Bao and
M. Jia Design of a Planar First-Order
Loudspeaker Array for Global Active
Noise Control . . . . . . . . . . . . . 2240--2250
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 2251--2252
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 2253--2255
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of Contents . . . . . . . . . . . 2251--2252
Anonymous Table of Contents [Edics] . . . . . . . 2253--2254
X. Wang and
Z. Tu and
M. Zhang Incorporating Statistical Machine
Translation Word Knowledge Into Neural
Machine Translation . . . . . . . . . . 2255--2266
Y. Zhao and
M. Kuruvilla-Dugdale and
M. Song Structured Sparse Spectral Transforms
and Structural Measures for Voice
Conversion . . . . . . . . . . . . . . . 2267--2276
H. Salehi and
D. Suelzle and
P. Folkeard and
V. Parsa Learning-Based Reference-Free Speech
Quality Measures for Hearing Aid
Applications . . . . . . . . . . . . . . 2277--2288
G. Enzner and
P. Thüne Bayesian MMSE Filtering of Noisy Speech
by SNR Marginalization With Global PSD
Priors . . . . . . . . . . . . . . . . . 2289--2304
G. Huang and
J. Chen and
J. Benesty Insights Into Frequency-Invariant
Beamforming With Concentric Circular
Microphone Arrays . . . . . . . . . . . 2305--2318
Ayana and
S. Shen and
Y. Chen and
C. Yang and
Z. Liu and
M. Sun Zero-Shot Cross-Lingual Neural Headline
Generation . . . . . . . . . . . . . . . 2319--2327
S. Surendran and
T. K. Kumar Oblique Projection and Cepstral
Subtraction in Signal Subspace Speech
Enhancement for Colored Noise Reduction 2328--2340
Q. Li and
D. F. Wong and
L. S. Chao and
M. Zhu and
T. Xiao and
J. Zhu and
M. Zhang Linguistic Knowledge-Aware Neural
Machine Translation . . . . . . . . . . 2341--2354
W. Zhang and
C. Hofmann and
M. Buerger and
T. D. Abhayapala and
W. Kellermann Spatial Noise-Field Control With Online
Secondary Path Modeling: a Wave-Domain
Approach . . . . . . . . . . . . . . . . 2355--2370
A. Meynard and
B. Torrésani Spectral Analysis for Nonstationary
Audio . . . . . . . . . . . . . . . . . 2371--2380
I. Martín-Morató and
M. Cobos and
F. J. Ferri Adaptive Mid-Term Representations for
Robust Audio Event Classification . . . 2381--2392
G. Firtha and
P. Fiala and
F. Schultz and
S. Spors On the General Relation of Wave Field
Synthesis and Spectral Division Method
for Linear Arrays . . . . . . . . . . . 2393--2403
P. Birkholz and
S. Stone and
K. Wolf and
D. Plettemeier Non-Invasive Silent Phoneme Recognition
Using Microwave Signals . . . . . . . . 2404--2411
W. Lin and
M. Mak and
J. Chien Multisource I-Vectors Domain Adaptation
Using Maximum Mean Discrepancy Based
Autoencoders . . . . . . . . . . . . . . 2412--2422
M. Abdelwahab and
C. Busso Domain Adversarial for Acoustic Emotion
Recognition . . . . . . . . . . . . . . 2423--2435
D. El Badawy and
I. Dokmani\'c Direction of Arrival With One
Microphone, a Few LEGOs, and
Non-Negative Matrix Factorization . . . 2436--2446
H. Lee and
P. Chung and
Y. Wu and
T. Lin and
T. Wen Interactive Spoken Content Retrieval by
Deep Reinforcement Learning . . . . . . 2447--2459
S. Elshamy and
N. Madhu and
W. Tirry and
T. Fingscheidt DNN-Supported Speech Enhancement With
Cepstral Estimation of Both Excitation
and Envelope . . . . . . . . . . . . . . 2460--2474
Y. Bao and
H. Chen A Chance-Constrained Programming
Approach to the Design of Robust
Broadband Beamformers With Microphone
Mismatches . . . . . . . . . . . . . . . 2475--2488
Anonymous Farewell Editorial . . . . . . . . . . . 2489--2489
Anonymous List of Reviewers . . . . . . . . . . . 2490--2496
Anonymous \booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing
Edics . . . . . . . . . . . . . . . . . 2497--2498
Anonymous \booktitleIEEE Transactions on
Multimedia information for authors . . . 2499--2501
Anonymous IEEE Open Access Publishing . . . . . . 2502--2502
Anonymous 2018 Index \booktitleIEEE/ACM
Transactions on Audio, Speech, and
Language Processing Vol. 26 . . . . . . 2503--2528
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Table of contents . . . . . . . . . . . C1--1
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of Contents [Edics] . . . . . . . 2--3
Anonymous [Blank page] . . . . . . . . . . . . . . B4--B4
Anonymous Inaugural Editorial Innovations in an
Era of Ubiquitous Audio, Speech, and
Language Processing . . . . . . . . . . 5--6
F. Bao and
W. H. Abdulla A New Ratio Mask Representation for
CASA-Based Speech Enhancement . . . . . 7--19
P. Magron and
T. Virtanen Complex ISNMF: a Phase-Aware Model for
Monaural Audio Source Separation . . . . 20--31
T. T. H. Duong and
N. Q. K. Duong and
P. C. Nguyen and
C. Q. Nguyen Gaussian Modeling-Based Multichannel
Audio Source Separation Exploiting
Generic Source Spectral Model . . . . . 32--43
G. Zhang and
J. Tao and
X. Qiu and
I. Burnett Decentralized Two-Channel Active Noise
Control for Single Frequency by Shaping
Matrix Eigenvalues . . . . . . . . . . . 44--52
Y. Zhao and
Z. Wang and
D. Wang Two-Stage Deep Learning for
Noisy-Reverberant Speech Enhancement . . 53--62
N. Zheng and
X. Zhang Phase-Aware Speech Enhancement Based on
Deep Neural Networks . . . . . . . . . . 63--76
T. Moriya and
T. Tanaka and
T. Shinozaki and
S. Watanabe and
K. Duh Evolution-Strategy-Based Automation of
System Development for High-Performance
Speech Recognition . . . . . . . . . . . 77--88
H. Kamper and
G. Shakhnarovich and
K. Livescu Semantic Speech Retrieval With a
Visually Grounded Model of Untranscribed
Speech . . . . . . . . . . . . . . . . . 89--98
M. S. Kavalekalam and
J. K. Nielsen and
J. B. Boldt and
M. G. Christensen Model-Based Speech Enhancement for
Intelligibility Improvement in Binaural
Hearing Aids . . . . . . . . . . . . . . 99--113
A. R. MV and
P. K. Ghosh Glottal Inverse Filtering Using
Probabilistic Weighted Linear Prediction 114--124
Y. Sun and
W. Wang and
J. Chambers and
S. M. Naqvi Two-Stage Monaural Source Separation in
Reverberant Room Environments Using Deep
Neural Networks . . . . . . . . . . . . 125--139
L. Ferrer and
M. K. Nandwana and
M. McLaren and
D. Castan and
A. Lawson Toward Fail-Safe Speaker Recognition:
Trial-Based Calibration With a Reject
Option . . . . . . . . . . . . . . . . . 140--153
J. Amini and
R. C. Hendriks and
R. Heusdens and
M. Guo and
J. Jensen Asymmetric Coding for Rate-Constrained
Noise Reduction in Binaural Hearing Aids 154--167
J. Yu and
J. Jiang and
R. Xia Global Inference for Aspect and Opinion
Terms Co-Extraction Based on Multi-Task
Neural Networks . . . . . . . . . . . . 168--177
Z. Wang and
X. Zhang and
D. Wang Robust Speaker Localization Guided by
Deep Learning-Based Time-Frequency
Masking . . . . . . . . . . . . . . . . 178--188
K. Tan and
J. Chen and
D. Wang Gated Residual Networks With Dilated
Convolutions for Monaural Speech
Enhancement . . . . . . . . . . . . . . 189--198
G. H. Ngo and
M. Nguyen and
N. F. Chen Phonology-Augmented Statistical
Framework for Machine Transliteration
Using Limited Linguistic Resources . . . 199--211
Y. Koizumi and
S. Saito and
H. Uematsu and
Y. Kawachi and
N. Harada Unsupervised Detection of Anomalous
Sound Based on Deep Learning and the
Neyman--Pearson Lemma . . . . . . . . . 212--224
Y. Laufer and
S. Gannot A Bayesian Hierarchical Model for Speech
Enhancement With Time-Varying Audio
Channel . . . . . . . . . . . . . . . . 225--239
Anonymous Erratum for Nonlinear Audio Systems
Identification Through Audio Input
Gaussianization . . . . . . . . . . . . 240--240
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Table of Contents . . . . . . . . . . . C1--241
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of Contents[Edics] . . . . . . . . 242--243
T. Nakashika and
S. Takaki and
J. Yamagishi Complex-Valued Restricted Boltzmann
Machine for Speaker-Dependent Speech
Parameterization From Complex Spectra 244--254
F. Xiong and
S. Goetze and
B. Kollmeier and
B. T. Meyer Joint Estimation of Reverberation Time
and Early-To-Late Reverberation Ratio
From Single-Channel Speech Signals . . . 255--267
F. Stöter and
S. Chakrabarty and
B. Edler and
E. A. P. Habets CountNet: Estimating the Number of
Concurrent Speakers Using Supervised
Learning . . . . . . . . . . . . . . . . 268--282
M. Kolbæk and
Z. Tan and
J. Jensen On the Relationship Between Short-Time
Objective Intelligibility and Short-Time
Spectral-Amplitude Mean-Square Error for
Speech Enhancement . . . . . . . . . . . 283--295
M. W. Hansen and
J. R. Jensen and
M. G. Christensen Estimation of Fundamental Frequencies in
Stereophonic Music Mixtures . . . . . . 296--310
J. Bao and
D. Tang and
N. Duan and
Z. Yan and
M. Zhou and
T. Zhao Text Generation From Tables . . . . . . 311--320
A. I. Koutrouvelis and
R. C. Hendriks and
R. Heusdens and
J. Jensen A Convex Approximation of the Relaxed
Binaural Beamforming Optimization
Problem . . . . . . . . . . . . . . . . 321--331
T. Hashimoto and
D. Saito and
N. Minematsu Many-to-Many and Completely
Parallel-Data-Free Voice Conversion
Based on Eigenspace DNN . . . . . . . . 332--341
F. Pishdadian and
B. Pardo Multi-Resolution Common Fate Transform 342--354
Y. Wu and
W. Li Automatic Audio Chord Recognition With
MIDI-Trained Deep Feature and BLSTM-CRF
Sequence Decoding Model . . . . . . . . 355--366
K. Imoto and
N. Ono Acoustic Topic Model for Scene Analysis
With Intermittently Missing Observations 367--382
K. Xiao and
S. Wang and
M. Wan and
L. Wu Reconstruction of Mandarin
Electrolaryngeal Fricatives With Hybrid
Noise Source . . . . . . . . . . . . . . 383--391
L. Krishnan and
T. Betlehem and
P. D. Teal Fast Algorithms for Acoustic Impulse
Response Shaping . . . . . . . . . . . . 392--403
V. Zakeri and
A. J. Hodgson Automatic Identification of Hard and
Soft Bone Tissues by Analyzing Drilling
Sounds . . . . . . . . . . . . . . . . . 404--414
S. Bilbao and
B. Hamilton Directional Sources in Wave-Based
Acoustic Simulation . . . . . . . . . . 415--428
Y. Zhang and
B. Pardo and
Z. Duan Siamese Style Convolutional Neural
Networks for Sound Search by Vocal
Imitation . . . . . . . . . . . . . . . 429--441
F. Feng and
M. Kowalski Underdetermined Reverberant Blind Source
Separation: Sparse Approaches for
Multiplicative and Convolutive
Narrowband Approximation . . . . . . . . 442--456
Z. Wang and
D. Wang Combining Spectral and Spatial Features
for Deep Learning Based Blind Speaker
Separation . . . . . . . . . . . . . . . 457--468
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Table of Contents . . . . . . . . . . . C1--469
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of Contents[Edics] . . . . . . . . 470--471
M. Z. Jahromi and
A. Zahedi and
J. Jensen and
J. Òstergaard Information Loss in the Human Auditory
System . . . . . . . . . . . . . . . . . 472--481
Y. Buchris and
A. Amar and
J. Benesty and
I. Cohen Incoherent Synthesis of Sparse Arrays
for Frequency-Invariant Beamforming . . 482--495
Y. Rahulamathavan and
K. R. Sutharsini and
I. G. Ray and
R. Lu and
M. Rajarajan Privacy-Preserving $i$Vector-Based
Speaker Verification . . . . . . . . . . 496--506
J. Zhang and
Y. Zhao and
H. Li and
C. Zong Attention With Sparsity Regularization
for Neural Machine Translation and
Summarization . . . . . . . . . . . . . 507--518
A. H. Moore and
W. Xue and
P. A. Naylor and
M. Brookes Noise Covariance Matrix Estimation for
Rotating Microphone Arrays . . . . . . . 519--530
G. Yang and
H. He and
Q. Chen Emotion-Semantic-Enhanced Neural Network 531--543
T. Dietzen and
A. Spriet and
W. Tirry and
S. Doclo and
M. Moonen and
T. van Waterschoot Comparative Analysis of Generalized
Sidelobe Cancellation and Multi-Channel
Linear Prediction for Speech
Dereverberation and Noise Reduction . . 544--558
J. Gao and
J. Du and
E. Chen Mixed-Bandwidth Cross-Channel Speech
Recognition via Joint Optimization of
DNN-Based Bandwidth Expansion and
Acoustic Modeling . . . . . . . . . . . 559--571
S. Deena and
M. Hasan and
M. Doulaty and
O. Saz and
T. Hain Recurrent Neural Network Language Model
Adaptation for Multi-Genre Broadcast
Speech Recognition and Alignment . . . . 572--582
F. B. Gelderblom and
T. V. Tronstad and
E. M. Viggen Subjective Evaluation of a Noise-Reduced
Training Target for Deep Neural
Network-Based Speech Enhancement . . . . 583--594
M. Luis Valero and
E. A. P. Habets Low-Complexity Multi-Microphone Acoustic
Echo Control in the Short-Time Fourier
Transform Domain . . . . . . . . . . . . 595--609
Q. Zhu and
P. Coleman and
X. Qiu and
M. Wu and
J. Yang and
I. Burnett Robust Personal Audio Geometry
Optimization in the SVD-Based Modal
Domain . . . . . . . . . . . . . . . . . 610--620
J. Yi and
J. Tao and
Z. Wen and
Y. Bai Language-Adversarial Transfer Learning
for Low-Resource Speech Recognition . . 621--630
J. Zhang and
Z. Ling and
L. Liu and
Y. Jiang and
L. Dai Sequence-to-Sequence Acoustic Modeling
for Voice Conversion . . . . . . . . . . 631--644
X. Li and
L. Girin and
S. Gannot and
R. Horaud Multichannel Speech Separation and
Enhancement Using the Convolutive
Transfer Function . . . . . . . . . . . 645--659
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Table of Contents . . . . . . . . . . . C1--660
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of Contents[Edics] . . . . . . . . 661--662
Z. Zhao and
H. Liu and
T. Fingscheidt Convolutional Neural Networks to Enhance
Coded Speech . . . . . . . . . . . . . . 663--678
H. Schepker and
S. E. Nordholm and
L. T. T. Tran and
S. Doclo Null-Steering Beamformer-Based Feedback
Cancellation for Multi-Microphone
Hearing Aids With Incoming Signal
Preservation . . . . . . . . . . . . . . 679--691
Z. Li and
Y. Song and
L. Dai and
I. McLoughlin Listening and Grouping: an Online
Autoregressive Approach for Monaural
Speech Separation . . . . . . . . . . . 692--703
D. Deng and
L. Jing and
J. Yu and
S. Sun and
M. K. Ng Sentiment Lexicon Construction With
Hierarchical Supervision Topic Model . . 704--718
M. Zhou and
M. Huang and
X. Zhu Story Ending Selection by Finding Hints
From Pairwise Candidate Endings . . . . 719--729
J. Richter and
J. Fels On the Influence of Continuous Subject
Rotation During High-Resolution
Head-Related Transfer Function
Measurements . . . . . . . . . . . . . . 730--741
J. Yu and
K. Markov and
T. Matsui Articulatory and Spectrum Information
Fusion Based on Deep Recurrent Neural
Networks . . . . . . . . . . . . . . . . 742--752
F. P. Itturriet and
M. H. Costa Perceptually Relevant Preservation of
Interaural Time Differences in Binaural
Hearing Aids . . . . . . . . . . . . . . 753--764
J. Abel and
T. Fingscheidt Sinusoidal-Based Lowband Synthesis for
Artificial Speech Bandwidth Extension 765--776
Q. Kong and
Y. Xu and
I. Sobieraj and
W. Wang and
M. D. Plumbley Sound Event Detection and Time Frequency
Segmentation from Weakly Labelled Data 777--787
Y. Tuan and
H. Lee Improving Conditional Sequence
Generative Adversarial Networks by
Stepwise Evaluation . . . . . . . . . . 788--798
N. Dionelis and
M. Brookes Modulation-Domain Kalman Filtering for
Monaural Blind Speech Denoising and
Dereverberation . . . . . . . . . . . . 799--814
R. Lotfian and
C. Busso Curriculum Learning for Speech Emotion
Recognition From Crowdsourced Labels . . 815--826
S. Lin Robust Pitch Estimation and Tracking For
Speakers Based on Subband Encoding and
The Generalized Labeled Multi-Bernoulli
Filter . . . . . . . . . . . . . . . . . 827--841
X. Wang and
I. Cohen and
J. Chen and
J. Benesty On Robust and High Directive Beamforming
With Small-Spacing Microphone Arrays for
Scattered Sources . . . . . . . . . . . 842--852
Z. Quan and
Z. Wang and
Y. Le and
B. Yao and
K. Li and
J. Yin An Efficient Framework for Sentence
Similarity Modeling . . . . . . . . . . 853--865
N. Lubis and
S. Sakti and
K. Yoshino and
S. Nakamura Positive Emotion Elicitation in
Chat-Based Dialogue Systems . . . . . . 866--877
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Table of Contents . . . . . . . . . . . C1--878
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of Contents . . . . . . . . . . . 879--880
F. J. Ibarrola and
R. D. Spies and
L. E. D. Persia Switching Divergences for Spectral
Learning in Blind Speech Dereverberation 881--891
I. Cohen and
J. Benesty and
J. Chen Differential Kronecker Product
Beamforming . . . . . . . . . . . . . . 892--902
C. Elisei-Iliescu and
C. Paleologu and
J. Benesty and
C. Stanciu and
C. Anghel and
S. Ciochin\ua Recursive Least-Squares Algorithms for
the Identification of Low-Rank Systems 903--918
A. Kumar and
T. Guha and
P. K. Ghosh Dirichlet Latent Variable Model: a
Dynamic Model Based on Dirichlet Prior
for Audio Processing . . . . . . . . . . 919--931
P. Jancovic and
M. Köküer Bird Species Recognition Using
Unsupervised Modeling of Individual
Vocalization Elements . . . . . . . . . 932--947
T. Koriyama and
T. Kobayashi Statistical Parametric Speech Synthesis
Using Deep Gaussian Processes . . . . . 948--959
K. Shimada and
Y. Bando and
M. Mimura and
K. Itoyama and
K. Yoshii and
T. Kawahara Unsupervised Speech Enhancement Based on
Multichannel NMF-Informed Beamforming
for Noise-Robust Automatic Speech
Recognition . . . . . . . . . . . . . . 960--971
S. Widmark Causal MSE-Optimal Filters for Personal
Audio Subject to Constrained Contrast 972--987
Anonymous Article Awards for the
\booktitleIEEE/ACM Transactions on
Audio, Speech, and Language Processing 988--988
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Table of contents . . . . . . . . . . . C1--989
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of contents (EDICS) . . . . . . . 990--991
A. Mesaros and
A. Diment and
B. Elizalde and
T. Heittola and
E. Vincent and
B. Raj and
T. Virtanen Sound Event Detection in the DCASE 2017
Challenge . . . . . . . . . . . . . . . 992--1006
S. R. Chetupalli and
T. V. Sreenivas Late Reverberation Cancellation Using
Bayesian Estimation of Multi-Channel
Linear Predictors and Student's
$t$-Source Prior . . . . . . . . . . . . 1007--1018
L. Juvela and
B. Bollepalli and
V. Tsiaras and
P. Alku GlotNet --- a Raw Waveform Model for the
Glottal Excitation in Statistical
Parametric Speech Synthesis . . . . . . 1019--1030
F. Winter and
F. Schultz and
G. Firtha and
S. Spors A Geometric Model for Prediction of
Spatial Aliasing in $ 2.5 $D Sound Field
Synthesis . . . . . . . . . . . . . . . 1031--1046
Y. Liu and
T. Lee and
T. Law and
K. Y. Lee Acoustical Assessment of Voice Disorder
With Continuous Speech Using ASR
Posterior Features . . . . . . . . . . . 1047--1059
C. Pörschmann and
J. M. Arend and
F. Brinkmann Directional Equalization of Sparse
Head-Related Transfer Function Sets for
Spatial Upsampling . . . . . . . . . . . 1060--1071
S. S. Payal and
V. J. Mathews and
D. J. Button and
A. Iyer and
R. H. Lambert and
J. Hutchings and
L. A. Azpicueta-Ruiz Equalization of Nonlinear Propagation
Distortion in Cylindrical Waveguides . . 1072--1084
B. Sisman and
M. Zhang and
H. Li Group Sparse Representation With WaveNet
Vocoder Adaptation for Spectrum and
Prosody Conversion . . . . . . . . . . . 1085--1097
J. Lee and
H. Kang A Joint Learning Algorithm for
Complex-Valued T--F Masks in Deep
Learning-Based Single-Channel Speech
Enhancement Systems . . . . . . . . . . 1098--1108
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Table of Contents . . . . . . . . . . . C1--1109
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of Contents[Edics] . . . . . . . . 1110--1111
J. Fleßner and
T. Biberger and
S. D. Ewert Subjective and Objective Assessment of
Monaural and Binaural Aspects of Audio
Quality . . . . . . . . . . . . . . . . 1112--1125
B. Yusuf and
B. Gundogdu and
M. Saraclar Low Resource Keyword Search With
Synthesized Crosslingual Exemplars . . . 1126--1135
A. I. Koutrouvelis and
R. C. Hendriks and
R. Heusdens and
J. Jensen Robust Joint Estimation of
Multimicrophone Signal Model Parameters 1136--1150
B. Cauchi and
K. Siedenburg and
J. F. Santos and
T. H. Falk and
S. Doclo and
S. Goetze Non-Intrusive Speech Quality Prediction
Using Modulation Energies and
LSTM-Network . . . . . . . . . . . . . . 1151--1163
Y. Zhang and
P. Zhang and
Y. Yan Tailoring an Interpretable Neural
Language Model . . . . . . . . . . . . . 1164--1178
A. Pandey and
D. Wang A New Framework for CNN-Based Speech
Enhancement in the Time Domain . . . . . 1179--1188
C. M. Vikram and
N. Adiga and
S. R. M. Prasanna Detection of Nasalized Voiced Stops in
Cleft Palate Speech Using
Epoch-Synchronous Features . . . . . . . 1189--1200
H. Luo and
T. Li and
B. Liu and
B. Wang and
H. Unger Improving Aspect Term Extraction With
Bidirectional Dependency Tree
Representation . . . . . . . . . . . . . 1201--1212
Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Table of Contents . . . . . . . . . . . C1--1213
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Anonymous Table of Contents . . . . . . . . . . . 1214--1215
T. Zhang and
J. Wu Constrained Learned Feature Extraction
for Acoustic Scene Classification . . . 1216--1228
L. Gabrielli and
S. Tomassetti and
S. Squartini and
C. Zinato and
S. Guaiana A Multi-Stage Algorithm for Acoustic
Physical Model Parameters Estimation . . 1229--1240
B. Yang and
H. Liu and
C. Pang and
X. Li Multiple Sound Source Counting and
Localization Based on TF-Wise Spatial
Spectrum Clustering . . . . . . . . . . 1241--1255
Y. Luo and
N. Mesgarani Conv-TasNet: Surpassing Ideal Time
Frequency Magnitude Masking for Speech
Separation . . . . . . . . . . . . . . . 1256--1266
A. K. Sarkar and
Z. Tan and
H. Tang and
S. Shon and
J. Glass Time-Contrastive Learning Based Deep
Bottleneck Features for Text-Dependent
Speaker Verification . . . . . . . . . . 1267--1279
J. Chua and
W. B. Kleijn A Low Latency Approach for Blind Source
Separation . . . . . . . . . . . . . . . 1280--1294
C. Pan and
J. Chen and
J. Benesty and
G. Shi On the Design of Target Beampatterns for
Differential Microphone Arrays . . . . . 1295--1307
A. M. Azmi and
M. N. Almutery and
H. A. Aboalsamh Real-Word Errors in Arabic Texts: a
Better Algorithm for Detection and
Correction . . . . . . . . . . . . . . . 1308--1320
M. Korpusik and
J. Glass Deep Learning for Database Mapping and
Asking Clarification Questions in
Dialogue Systems . . . . . . . . . . . . 1321--1334
J. Pak and
J. W. Shin Sound Localization Based on Phase
Difference Enhancement Using Deep Neural
Networks . . . . . . . . . . . . . . . . 1335--1345
Anonymous IEEE Signal Processing Society . . . . . C3--C3
R. Ali and
G. Bernardi and
T. van Waterschoot and
M. Moonen Methods of Extending a Generalized
Sidelobe Canceller With External
Microphones . . . . . . . . . . . . . . 1349--1364
X. Li and
L. Girin and
S. Gannot and
R. Horaud Multichannel Online Dereverberation
Based on Spectral Magnitude Inverse
Filtering . . . . . . . . . . . . . . . 1365--1377
L. Chen and
Z. Chen and
B. Tan and
S. Long and
M. Ga\vsi\'c and
K. Yu AgentGraph: Toward Universal Dialogue
Management With Structured Deep
Reinforcement Learning . . . . . . . . . 1378--1391
L. Li and
J. Wang and
J. Li and
Q. Ma and
J. Wei Relation Classification via
Keyword-Attentive Sentence Mechanism and
Synthetic Stimulation Loss . . . . . . . 1392--1404
M. B. Mòller and
J. K. Nielsen and
E. Fernandez-Grande and
S. K. Olesen On the Influence of Transfer Function
Noise on Sound Zone Control in a Room 1405--1418
Z. Xu and
C. Sun and
Y. Long and
B. Liu and
B. Wang and
M. Wang and
M. Zhang and
X. Wang Dynamic Working Memory for Context-Aware
Response Generation . . . . . . . . . . 1419--1431
H. Kameoka and
T. Kaneko and
K. Tanaka and
N. Hojo ACVAE-VC: Non-Parallel Voice Conversion
With Auxiliary Classifier Variational
Autoencoder . . . . . . . . . . . . . . 1432--1443
X. Chen and
X. Liu and
Y. Wang and
A. Ragni and
J. H. M. Wong and
M. J. F. Gales Exploiting Future Word Contexts in
Neural Network Language Models for
Speech Recognition . . . . . . . . . . . 1444--1454
R. Wang and
Z. Chen and
F. Yin DOA-Based Three-Dimensional Node
Geometry Calibration in Acoustic Sensor
Networks and Its Cramér--Rao Bound and
Sensitivity Analysis . . . . . . . . . . 1455--1468
C. Lee and
H. Lee and
S. Wu and
C. Liu and
W. Fang and
J. Hsu and
B. Tseng Machine Comprehension of Spoken Content:
TOEFL Listening Test and Spoken SQuAD 1469--1480
Y. Chen and
S. Huang and
H. Lee and
Y. Wang and
C. Shen Audio Word2vec: Sequence-to-Sequence
Autoencoding for Unsupervised Learning
of Audio Segmentation and Representation 1481--1493
P. Li and
C. Chen and
W. Zheng and
Y. Deng and
F. Ye and
Z. Zheng STD: an Automatic Evaluation Metric for
Machine Translation Based on Word
Embeddings . . . . . . . . . . . . . . . 1497--1506
J. Zhang and
R. Heusdens and
R. C. Hendriks Relative Acoustic Transfer Function
Estimation in Wireless Acoustic Sensor
Networks . . . . . . . . . . . . . . . . 1507--1519
J. Park and
J. Chang State-Space Microphone Array Nonlinear
Acoustic Echo Cancellation Using
Multi-Microphone Near-End Speech
Covariance . . . . . . . . . . . . . . . 1520--1534
Z. Luo and
J. Chen and
T. Takiguchi and
Y. Ariki Emotional Voice Conversion Using Dual
Supervised Adversarial Networks With
Continuous Wavelet Transform F0 Features 1535--1548
H. As'ad and
M. Bouchard and
H. Kamkar-Parsi A Robust Target Linearly Constrained
Minimum Variance Beamformer With Spatial
Cues Preservation for Binaural Hearing
Aids . . . . . . . . . . . . . . . . . . 1549--1563
Y. Wang and
Y. Xia and
L. Zhao and
J. Bian and
T. Qin and
E. Chen and
T. Liu Semi-Supervised Neural Machine
Translation via Marginal Distribution
Estimation . . . . . . . . . . . . . . . 1564--1576
A. Jati and
P. Georgiou Neural Predictive Coding Using
Convolutional Neural Networks Toward
Unsupervised Learning of Speaker
Characteristics . . . . . . . . . . . . 1577--1589
F. Fontana and
E. Bozzo Newton--Raphson Solution of Nonlinear
Delay-Free Loop Filter Networks . . . . 1590--1600
N. Makishima and
S. Mogami and
N. Takamune and
D. Kitamura and
H. Sumino and
S. Takamichi and
H. Saruwatari and
N. Ono Independent Deeply Learned Matrix
Analysis for Determined Audio Source
Separation . . . . . . . . . . . . . . . 1601--1615
J. J. Prakash and
H. A. Murthy Analysis of Inter-Pausal Units in Indian
Languages and Its Application to
Text-to-Speech Synthesis . . . . . . . . 1616--1628
Y. Lan and
S. Wang and
J. Jiang Knowledge Base Question Answering With a
Matching-Aggregation Model and
Question-Specific Contextual Relations 1629--1638
X. Bai and
H. Cao and
K. Chen and
T. Zhao A Bilingual Adversarial Autoencoder for
Unsupervised Bilingual Lexicon Induction 1639--1648
G. Zhao and
R. Gutierrez-Osuna Using Phonetic Posteriorgram Based Frame
Pairing for Segmental Accent Conversion 1649--1660
Z. Zhang and
H. Zhao and
K. Ling and
J. Li and
Z. Li and
S. He and
G. Fu Effective Subword Segmentation for Text
Comprehension . . . . . . . . . . . . . 1664--1674
Y. Xie and
R. Liang and
Z. Liang and
C. Huang and
C. Zou and
B. Schuller Speech Emotion Classification Using
Attention-Based LSTM . . . . . . . . . . 1675--1685
S. Wang and
Z. Huang and
Y. Qian and
K. Yu Discriminative Neural Embedding Learning
for Short-Duration Text-Independent
Speaker Verification . . . . . . . . . . 1686--1696
R. Lu and
Z. Duan and
C. Zhang Audio Visual Deep Clustering for Speech
Separation . . . . . . . . . . . . . . . 1697--1712
N. Ueno and
S. Koyama and
H. Saruwatari Three-Dimensional Sound Field
Reproduction Based on Weighted
Mode-Matching Method . . . . . . . . . . 1852--1867
L. Wu and
X. Tan and
T. Qin and
J. Lai and
T. Liu Beyond Error Propagation: Language
Branching Also Affects the Accuracy of
Sequence Generation . . . . . . . . . . 1868--1879
A. Das and
J. Li and
G. Ye and
R. Zhao and
Y. Gong Advancing Acoustic-to-Word CTC Model
With Attention and Mixed-Units . . . . . 1880--1892
N. Antonello and
E. De Sena and
M. Moonen and
P. A. Naylor and
T. van Waterschoot Joint Acoustic Localization and
Dereverberation Through Plane Wave
Decomposition and Sparse Regularization 1893--1905
F. Borra and
A. Bernardini and
F. Antonacci and
A. Sarti Uniform Linear Arrays of First-Order
Steerable Differential Microphones . . . 1906--1918
L. Chai and
J. Du and
Q. Liu and
C. Lee Using Generalized Gaussian Distributions
to Improve Regression Error Modeling for
Deep Learning-Based Speech Enhancement 1919--1931
J. Qi and
J. Du and
S. M. Siniscalchi and
C. Lee A Theory on Deep Neural Network Based
Vector-to-Vector Regression With an
Illustration of Its Expressive Power in
Speech Enhancement . . . . . . . . . . . 1932--1943
X. Dang and
Q. Cheng and
H. Zhu Indoor Multiple Sound Source
Localization via Multi-Dimensional
Assignment Data Association . . . . . . 1944--1956
M. Schneider and
E. A. P. Habets Iterative DFT-Domain Inverse Filter
Optimization Using a Weighted
Least-Squares Criterion . . . . . . . . 1957--1969
K. Chen and
R. Wang and
M. Utiyama and
E. Sumita and
T. Zhao Neural Machine Translation With
Sentence-Level Topic Context . . . . . . 1970--1984
A. Gomez-Alanis and
A. M. Peinado and
J. A. Gonzalez and
A. M. Gomez A Gated Recurrent Convolutional Neural
Network for Robust Spoofing Detection 1985--1999
S. Feng and
T. Lee Exploiting Cross-Lingual Speaker and
Phonetic Diversity for Unsupervised
Subword Modeling . . . . . . . . . . . . 2000--2011
W. Li and
N. F. Chen and
S. M. Siniscalchi and
C. Lee Improving Mispronunciation Detection of
Mandarin Tones for Non-Native Learners
With Soft-Target Tone Labels and
BLSTM-Based Deep Tone Models . . . . . . 2012--2024
Q. Tu and
H. Chen On Mainlobe Orientation of the First-
and Second-Order Differential Microphone
Arrays . . . . . . . . . . . . . . . . . 2025--2040
J. Chorowski and
R. J. Weiss and
S. Bengio and
A. van den Oord Unsupervised Speech Representation
Learning Using WaveNet Autoencoders . . 2041--2053
V. Varanasi and
A. Agarwal and
R. M. Hegde Near-Field Acoustic Source Localization
Using Spherical Harmonic Features . . . 2054--2066
Y. Zheng and
J. Tao and
Z. Wen and
J. Yi Forward Backward Decoding Sequence for
Regularizing End-to-End TTS . . . . . . 2067--2079
Y. Tu and
J. Du and
C. Lee Speech Enhancement Based on Teacher
Student Deep Learning Using Improved
Speech Presence Probability for
Noise-Robust Speech Recognition . . . . 2080--2091
Y. Liu and
D. Wang Divide and Conquer: A Deep CASA Approach
to Talker-Independent Monaural Speaker
Separation . . . . . . . . . . . . . . . 2092--2102
X. Liu and
D. F. Wong and
L. S. Chao and
Y. Liu Latent Attribute Based Hierarchical
Decoder for Neural Machine Translation 2103--2112
J. Hu and
N. Chen Enhanced Feature Summarizing for
Effective Cover Song Identification . . 2113--2126
Q. Ma and
L. Yu and
S. Tian and
E. Chen and
W. W. Y. Ng Global-Local Mutual Attention Model for
Text Classification . . . . . . . . . . 2127--2139
V. Välimäki and
J. Rämö Neurally Controlled Graphic Equalizer 2140--2149
S. U. N. Wood and
J. K. W. Stahl and
P. Mowlaee Binaural Codebook-Based Speech
Enhancement With Atomic Speech Presence
Probability . . . . . . . . . . . . . . 2150--2161
L. Pfeifenberger and
M. Zöhrer and
F. Pernkopf Eigenvector-Based Speech Mask Estimation
for Multi-Channel Speech Enhancement . . 2162--2172
M. Arnela and
S. Dabbaghchian and
O. Guasch and
O. Engwall MRI-Based Vocal Tract Representations
for the Three-Dimensional Finite Element
Synthesis of Diphthongs . . . . . . . . 2173--2182
K. Sekiguchi and
Y. Bando and
A. A. Nugraha and
K. Yoshii and
T. Kawahara Semi-Supervised Multichannel Speech
Enhancement With a Deep Speech Prior . . 2197--2212
Q. Guo and
X. Qiu and
X. Xue and
Z. Zhang Low-Rank and Locality Constrained
Self-Attention for Sequence Modeling . . 2213--2222
J. Yu and
Q. Ling and
C. Luo and
C. W. Chen Synthesizing $3$D Trump: Predicting and
Visualizing the Relationship Between
Text, Speech, and Articulatory Movements 2223--2233
R. Sugiura and
Y. Kamamoto and
T. Moriya Shape Control of Discrete Generalized
Gaussian Distributions for
Frequency-Domain Audio Coding . . . . . 2234--2248
Z. Ben-Hur and
D. L. Alon and
R. Mehra and
B. Rafaely Efficient Representation and Sparse
Sampling of Head-Related Transfer
Functions Using Phase-Correction Based
on Ear Alignment . . . . . . . . . . . . 2249--2262
L. Remaggi and
P. J. B. Jackson and
W. Wang Modeling the Comb Filter Effect and
Interaural Coherence for Binaural Source
Separation . . . . . . . . . . . . . . . 2263--2277
B. Zhang and
D. Xiong and
J. Su and
J. Luo Future-Aware Knowledge Distillation for
Neural Machine Translation . . . . . . . 2278--2287
R. Ali and
T. Van Waterschoot and
M. Moonen Integration of a Priori and Estimated
Constraints Into an MVDR Beamformer for
Speech Enhancement . . . . . . . . . . . 2288--2300
N. Tiwari and
P. C. Pandey Speech Enhancement Using Noise
Estimation With Dynamic Quantile
Tracking . . . . . . . . . . . . . . . . 2301--2312
J. Duan and
X. Ding and
Y. Zhang and
T. Liu TEND: A Target-Dependent Representation
Learning Framework for News Document . . 2313--2325
L. Zhao and
X. Qiu and
Q. Zhang and
X. Huang Sequence Labeling With Deep Gated Dual
Path CNN . . . . . . . . . . . . . . . . 2326--2335
A. Kato and
T. H. Kinnunen Statistical Regression Models for Noise
Robust F0 Estimation Using Recurrent
Deep Neural Networks . . . . . . . . . . 2336--2349
D. Liu and
J. Fu and
Q. Qu and
J. Lv BFGAN: Backward and Forward Generative
Adversarial Networks for Lexically
Constrained Sentence Generation . . . . 2350--2361
A. Marafioti and
N. Perraudin and
N. Holighaus and
P. Majdak A Context Encoder For Audio Inpainting 2362--2372
J. Yang and
R. K. Das and
N. Zhou Extraction of Octave Spectra Information
for Spoofing Attack Detection . . . . . 2373--2384
Jamal Amini and
Richard Christian Hendriks and
Richard Heusdens and
Meng Guo and
Jesper Jensen Rate-Constrained Noise Reduction in
Wireless Acoustic Sensor Networks . . . 1--12
Chitralekha Gupta and
Haizhou Li and
Ye Wang Automatic Leaderboard: Evaluation of
Singing Quality Without a Standard
Reference . . . . . . . . . . . . . . . 13--26
Sefik Emre Eskimez and
Ross K. Maddox and
Chenliang Xu and
Zhiyao Duan Noise-Resilient Training Method for Face
Landmark Generation From Speech . . . . 27--38
Peidong Wang and
Ke Tan and
De Liang Wang Bridging the Gap Between Monaural Speech
Enhancement and Recognition With
Distortion-Independent Acoustic Modeling 39--48
Yuki Mitsufuji and
Stefan Uhlich and
Norihiro Takamune and
Daichi Kitamura and
Shoichi Koyama and
Hiroshi Saruwatari Multichannel Non-Negative Matrix
Factorization Using Banded Spatial
Covariance Matrices in Wavenumber Domain 49--60
Yaron Laufer and
Sharon Gannot Scoring-Based ML Estimation and CRBs for
Reverberation, Speech, and Noise PSDs in
a Spatially Homogeneous Noise Field . . 61--76
Naveen Kumar Desiraju and
Simon Doclo and
Markus Buck and
Tobias Wolff Online Estimation of Reverberation
Parameters For Late Residual Echo
Suppression . . . . . . . . . . . . . . 77--91
Mehdi Zohourian and
Rainer Martin Binaural Direct-to-Reverberant Energy
Ratio and Speaker Distance Estimation 92--104
Youhyun Shin and
Sang-goo Lee Learning Context Using Segment-Level
LSTM for Neural Sequence Labeling . . . 105--115
Gongping Huang and
Jingdong Chen and
Jacob Benesty Design of Planar Differential Microphone
Arrays With Fractional Orders . . . . . 116--130
Ming-Hsiang Su and
Chung-Hsien Wu and
Liang-Yu Chen Attention-Based Response Generation
Using Parallel Double Q-Learning for
Dialog Policy Decision in a
Conversational System . . . . . . . . . 131--143
Satoru Emura Wave-Domain Residual Echo Reduction
Using Subspace Tracking . . . . . . . . 144--156
Xin Wang and
Shinji Takaki and
Junichi Yamagishi and
Simon King and
Keiichi Tokuda A Vector Quantized Variational
Autoencoder (VQ-VAE) Autoregressive
Neural $ F_0 $ Model for Statistical
Parametric Speech Synthesis . . . . . . 157--170
Falk-Martin Hoffmann and
Philip Arthur Nelson and
Filippo Maria Fazi DOA Estimation Performance With Circular
Arrays in Sound Fields With Finite Rate
of Innovation . . . . . . . . . . . . . 171--184
Rongfeng Su and
Xunying Liu and
Lan Wang and
Jingzhou Yang Cross-Domain Deep Visual Feature
Generation for Mandarin Audio--Visual
Speech Recognition . . . . . . . . . . . 185--197
Titouan Parcollet and
Mohamed Morchid and
Xavier Bost and
Georges Linar\`es and
Renato De Mori Real to H-Space Autoencoders for Theme
Identification in Telephone
Conversations . . . . . . . . . . . . . 198--210
Antonio Canclini and
Fabio Antonacci and
Stefano Tubaro and
Augusto Sarti A Methodology for the Robust Estimation
of the Radiation Pattern of Acoustic
Sources . . . . . . . . . . . . . . . . 211--224
Yi Yu and
Hongsen He and
Badong Chen and
Jianghui Li and
Youwen Zhang and
Lu Lu $M$-Estimate Based Normalized Subband
Adaptive Filter Algorithm: Performance
Analysis and Improvements . . . . . . . 225--239
Hao-Xiang Wen and
Sen-Quan Yang and
Yuan-Quan Hong and
Huan Luo A Partial Update Adaptive Algorithm for
Sparse System Identification . . . . . . 240--255
Martin Bo Mòller and
Jan Òstergaard A Moving Horizon Framework for Sound
Zones . . . . . . . . . . . . . . . . . 256--265
Stylianos Ioannis Mimilakis and
Konstantinos Drossos and
Estefanía Cano and
Gerald Schuller Examining the Mapping Functions of
Denoising Autoencoders in Singing Voice
Separation . . . . . . . . . . . . . . . 266--278
Lachlan I. Birnie and
Thushara D. Abhayapala and
Prasanga N. Samarasinghe Reflection Assisted Sound Source
Localization Through a Harmonic Domain
MUSIC Framework . . . . . . . . . . . . 279--293
Wenhao Ding and
Liang He Adaptive Multi-Scale Detection of
Acoustic Events . . . . . . . . . . . . 294--306
Weijian Zhang and
Peng Song Transfer Sparse Discriminant Subspace
Learning for Cross-Corpus Speech Emotion
Recognition . . . . . . . . . . . . . . 307--318
Bidisha Sharma and
Ye Wang Automatic Evaluation of Song
Intelligibility Using Singing Adapted
STOI and Vocal-Specific Features . . . . 319--331
Hai Morgenstern and
Boaz Rafaely Perceptually-Transparent Online
Estimation of Two-Channel Room Transfer
Function for Sound Calibration . . . . . 332--342
Shaojin Ding and
Guanlong Zhao and
Christopher Liberatore and
Ricardo Gutierrez-Osuna Learning Structured Sparse
Representations for Voice Conversion . . 343--354
Mireia Diez and
Luká\vs Burget and
Federico Landini and
Jan \vCernocký Analysis of Speaker Diarization Based on
Bayesian HMM With Eigenvoice Priors . . 355--368
Jia-Chen Gu and
Zhen-Hua Ling and
Quan Liu Utterance-to-Utterance Interactive
Matching Network for Multi-Turn Response
Selection in Retrieval-Based Chatbots 369--379
Ke Tan and
DeLiang Wang Learning Complex Spectral Mapping With
Gated Convolutional Recurrent Networks
for Monaural Speech Enhancement . . . . 380--390
Richeng Duan and
Tatsuya Kawahara and
Masatake Dantsuji and
Hiroaki Nanjo Cross-Lingual Transfer Learning of
Non-Native Acoustic Modeling for
Pronunciation Error Detection and
Diagnosis . . . . . . . . . . . . . . . 391--401
Xin Wang and
Shinji Takaki and
Junichi Yamagishi Neural Source-Filter Waveform Models for
Statistical Parametric Speech Synthesis 402--415
Sanjeel Parekh and
Slim Essid and
Alexey Ozerov and
Ngoc Q. K. Duong and
Patrick Pérez and
Gaël Richard Weakly Supervised Representation
Learning for Audio-Visual Scene Analysis 416--428
Jianfei Yu and
Jing Jiang and
Rui Xia Entity-Sensitive Attention and Fusion
Network for Entity-Level Multimodal
Sentiment Classification . . . . . . . . 429--439
John G. Beerends and
Niels M. P. Neumann and
Egon L. van den Broek and
Anna Llagostera Casanovas and
Jovana Torres Menendez and
Christian Schmidmer and
Jens Berger Subjective and Objective Assessment of
Full Bandwidth Speech Quality . . . . . 440--449
Vikram C. Mathad and
S. R. Mahadeva Prasanna Vowel Onset Point Based Screening of
Misarticulated Stops in Cleft Lip and
Palate Speech . . . . . . . . . . . . . 450--460
Minh Nguyen and
Gia H. Ngo and
Nancy F. Chen Hierarchical Character Embeddings:
Learning Phonological and Semantic
Representations in Languages of
Logographic Origin Using Recursive
Neural Networks . . . . . . . . . . . . 461--473
Dani Cherkassky and
Sharon Gannot Successive Relative Transfer Function
Identification Using Blind Oblique
Projection . . . . . . . . . . . . . . . 474--486
Ivo Trowitzsch and
Christopher Schymura and
Dorothea Kolossa and
Klaus Obermayer Joining Sound Event Detection and
Localization Through Spatial Segregation 487--502
Shinichi Mogami and
Norihiro Takamune and
Daichi Kitamura and
Hiroshi Saruwatari and
Yu Takahashi and
Kazunobu Kondo and
Nobutaka Ono Independent Low-Rank Matrix Analysis
Based on Time-Variant Sub-Gaussian
Source Model for Determined Blind Source
Separation . . . . . . . . . . . . . . . 503--518
Hamzeh Ghasemzadeh and
Meisam K. Arjmandi Toward Optimum Quantification of
Pathology-Induced Noises: an
Investigation of Information Missed by
Human Auditory System . . . . . . . . . 519--528
Fei Ma and
Wen Zhang and
Thushara Dheemantha Abhayapala Active Control of Outgoing Broadband
Noise Fields in Rooms . . . . . . . . . 529--539
Jing-Xuan Zhang and
Zhen-Hua Ling and
Li-Rong Dai Non-Parallel Sequence-to-Sequence Voice
Conversion With Disentangled Linguistic
and Speaker Representations . . . . . . 540--552
Tao Dai and
Li Zhu and
Yaxiong Wang and
Kathleen M. Carley Attentive Stacked Denoising Autoencoder
With Bi-LSTM for Personalized
Context-Aware Citation Recommendation 553--568
Yuta Nishimura and
Katsuhito Sudoh and
Graham Neubig and
Satoshi Nakamura Multi-Source Neural Machine Translation
With Missing Data . . . . . . . . . . . 569--580
Jin Wang and
Liang-Chih Yu and
K. Robert Lai and
Xuejie Zhang Tree-Structured Regional CNN-LSTM Model
for Dimensional Sentiment Analysis . . . 581--591
Abul Azad and
Lamine Mili Robust Speech Filter and Voice Encoder
Parameter Estimation Using the
Phase--Phase Correlator . . . . . . . . 592--604
Abdullah Fahim and
Prasanga N. Samarasinghe and
Thushara D. Abhayapala Multi-Source DOA Estimation Through
Pattern Recognition of the Modal
Coherence of a Reverberant Soundfield 605--618
Yaron Laufer and
Bracha Laufer-Goldshtein and
Sharon Gannot ML Estimation and CRBs for
Reverberation, Speech, and Noise PSDs in
Rank-Deficient Noise Field . . . . . . . 619--634
Zhongqing Wang and
Qingying Sun and
Shoushan Li and
Qiaoming Zhu and
Guodong Zhou Neural Stance Detection With
Hierarchical Linguistic Representations 635--645
Ruizhi Li and
Xiaofei Wang and
Sri Harish Mallidi and
Shinji Watanabe and
Takaaki Hori and
Hynek Hermansky Multi-Stream End-to-End Speech
Recognition . . . . . . . . . . . . . . 646--655
Yu Maeno and
Yuki Mitsufuji and
Prasanga N. Samarasinghe and
Naoki Murata and
Thushara D. Abhayapala Spherical-Harmonic-Domain Feedforward
Active Noise Control Using Sparse
Decomposition of Reference Signals from
Distributed Sensor Arrays . . . . . . . 656--670
Qingyu Zhou and
Nan Yang and
Furu Wei and
Shaohan Huang and
Ming Zhou and
Tiejun Zhao A Joint Sentence Scoring and Selection
Framework for Neural Extractive Document
Summarization . . . . . . . . . . . . . 671--681
Ivan Kukanov and
Trung Ngo Trong and
Ville Hautamäki and
Sabato Marco Siniscalchi and
Valerio Mario Salerno and
Kong Aik Lee Maximal Figure-of-Merit Framework to
Detect Multi-Label Phonetic Features for
Spoken Language Recognition . . . . . . 682--695
Shoichi Koyama and
Gilles Chardon and
Laurent Daudet Optimizing Source and Sensor Placement
for Sound Field Control: an Overview . . 696--714
Atsushi Ando and
Ryo Masumura and
Hosana Kamiyama and
Satoshi Kobashikawa and
Yushi Aono and
Tomoki Toda Customer Satisfaction Estimation in
Contact Center Calls Based on a
Hierarchical Multi-Task Model . . . . . 715--728
Thomas Dietzen and
Simon Doclo and
Marc Moonen and
Toon van Waterschoot Integrated Sidelobe Cancellation and
Linear Prediction Kalman Filter for
Joint Multi-Microphone Speech
Dereverberation, Interfering Speech
Cancellation, and Noise Reduction . . . 740--754
Thomas Dietzen and
Simon Doclo and
Marc Moonen and
Toon van Waterschoot Square Root-Based Multi-Source Early PSD
Estimation and Recursive RETF Update in
Reverberant Environments by Means of the
Orthogonal Procrustes Problem . . . . . 755--769
Liwen Zhang and
Ziqiang Shi and
Jiqing Han Pyramidal Temporal Pooling With
Discriminative Mapping for Audio
Classification . . . . . . . . . . . . . 770--784
Mengfan Zhang and
Zhongshu Ge and
Tiejun Liu and
Xihong Wu and
Tianshu Qu Modeling of Individual HRTFs Based on
Spatial Principal Component Analysis . . 785--797
Bijue Jia and
Jiancheng Lv and
Xi Peng and
Yao Chen and
Shenglan Yang Hierarchical Regulated Iterative Network
for Joint Task of Music Detection and
Music Relative Loudness Estimation . . . 1--13
Nauman Dawalatabad and
Srikanth Madikeri and
C. Chandra Sekhar and
Hema A. Murthy Novel Architectures for Unsupervised
Information Bottleneck Based Speaker
Diarization of Meetings . . . . . . . . 14--27
Midia Yousefi and
John H. L. Hansen Block-Based High Performance CNN
Architectures for Frame-Level
Overlapping Speech Detection . . . . . . 28--40
Jiaming Cheng and
Ruiyu Liang and
Zhenlin Liang and
Li Zhao and
Chengwei Huang and
Björn Schuller A Deep Adaptation Network for Speech
Enhancement: Combining a Relativistic
Discriminator With Multi-Kernel Maximum
Mean Discrepancy . . . . . . . . . . . . 41--53
Franz Anders and
Mario Hlawitschka and
Mirco Fuchs Comparison of Artificial Neural Network
Types for Infant Vocalization
Classification . . . . . . . . . . . . . 54--67
Tomohiko Nakamura and
Hirokazu Kameoka Harmonic-Temporal Factor Decomposition
for Unsupervised Monaural Separation of
Harmonic Sounds . . . . . . . . . . . . 68--82
Jens Ahrens and
Stefan Bilbao Computation of Spherical Harmonic
Representations of Source Directivity
Based on the Finite-Distance Signature 83--92
Shun-Po Chuang and
Alexander H. Liu and
Tzu-Wei Sung and
Hung-yi Lee Improving Automatic Speech Recognition
and Speech Translation via Word
Embedding Prediction . . . . . . . . . . 93--105
Li Chai and
Jun Du and
Qing-Feng Liu and
Chin-Hui Lee A Cross-Entropy-Guided Measure (CEGM)
for Assessing Speech Recognition
Performance and Optimizing DNN-Based
Speech Enhancement . . . . . . . . . . . 106--117
De Hu and
Zhe Chen and
Fuliang Yin Passive Geometry Calibration for
Microphone Arrays Based on Distributed
Damped Newton Optimization . . . . . . . 118--131
Berrak Sisman and
Junichi Yamagishi and
Simon King and
Haizhou Li An Overview of Voice Conversion and Its
Challenges: From Statistical Modeling to
Deep Learning . . . . . . . . . . . . . 132--157
Jilu Jin and
Gongping Huang and
Xuehan Wang and
Jingdong Chen and
Jacob Benesty and
Israel Cohen Steering Study of Linear Differential
Microphone Arrays . . . . . . . . . . . 158--170
Ching-Hua Lee and
Bhaskar D. Rao and
Harinath Garudadri Proportionate Adaptive Filtering
Algorithms Derived Using an Iterative
Reweighting Framework . . . . . . . . . 171--186
Shakeel Ahmed and
Muhammad Tufail and
Muhammad Rehan and
Tanveer Abbas and
Amna Majid A Novel Approach for Improved Noise
Reduction Performance in Feed-Forward
Active Noise Control Systems With
(Loudspeaker) Saturation Non-Linearity
in the Secondary Path . . . . . . . . . 187--197
Cunhang Fan and
Jiangyan Yi and
Jianhua Tao and
Zhengkun Tian and
Bin Liu and
Zhengqi Wen Gated Recurrent Fusion With Joint
Training Framework for Robust End-to-End
Speech Recognition . . . . . . . . . . . 198--209
Amin Edraki and
Wai-Yip Chan and
Jesper Jensen and
Daniel Fogerty Speech Intelligibility Prediction Using
Spectro-Temporal Modulation Analysis . . 210--225
Phan Le Son On the Design of Sparse Arrays With
Frequency-Invariant Beam Pattern . . . . 226--238
Dylan Menzies and
Philip Coleman and
Filippo Maria Fazi A Room Compensation Method by
Modification of Reverberant Audio
Objects . . . . . . . . . . . . . . . . 239--252
Yonggang Hu and
Thushara D. Abhayapala and
Prasanga N. Samarasinghe Multiple Source Direction of Arrival
Estimations Using Relative Sound
Pressure Based MUSIC . . . . . . . . . . 253--264
Alan Kan and
Qinglin Meng The Temporal Limits Encoder as a Sound
Coding Strategy for Bilateral Cochlear
Implants . . . . . . . . . . . . . . . . 265--273
Rui Liu and
Berrak Sisman and
Feilong Bao and
Jichen Yang and
Guanglai Gao and
Haizhou Li Exploiting Morphological and
Phonological Features to Improve
Prosodic Phrasing for Mongolian Speech
Synthesis . . . . . . . . . . . . . . . 274--285
Fei Ma and
Thushara D. Abhayapala and
Wen Zhang Multiple Circular Arrays of Vector
Sensors for Real-Time Sound Field
Analysis . . . . . . . . . . . . . . . . 286--299
David Diaz-Guerra and
Antonio Miguel and
Jose R. Beltran Robust Sound Source Tracking Using
SRP-PHAT and $3$D Convolutional Neural
Networks . . . . . . . . . . . . . . . . 300--311
Viet Anh Trinh and
Michael Mandel Directly Comparing the Listening
Strategies of Humans and Machines . . . 312--323
Leda Sari and
Mark Hasegawa-Johnson and
Samuel Thomas Auxiliary Networks for Joint Speaker
Adaptation and Speaker Change Detection 324--333
Jielong Yang and
Xionghu Zhong and
Weiguang Chen and
Wenwu Wang Multiple Acoustic Source Localization in
Microphone Array Networks . . . . . . . 334--347
Bin Wu and
Sakriani Sakti and
Jinsong Zhang and
Satoshi Nakamura Tackling Perception Bias in Unsupervised
Phoneme Discovery Using DPGMM-RNN Hybrid
Model and Functional Load . . . . . . . 348--362
Taewoong Lee and
Liming Shi and
Jesper Kjær Nielsen and
Mads Græsbòll Christensen Fast Generation of Sound Zones Using
Variable Span Trade-Off Filters in the
DFT-Domain . . . . . . . . . . . . . . . 363--378
Maoshen Jia and
Yuxuan Wu and
Changchun Bao and
Christian Ritz Multi-Source DOA Estimation in
Reverberant Environments by Jointing
Detection and Modeling of Time-Frequency
Points . . . . . . . . . . . . . . . . . 379--392
Wei Xue and
Alastair H. Moore and
Mike Brookes and
Patrick A. Naylor Speech Enhancement Based on
Modulation-Domain Parametric
Multichannel Kalman Filtering . . . . . 393--405
Wei Song and
Jingjin Guo and
Ruiji Fu and
Ting Liu and
Lizhen Liu A Knowledge Graph Embedding Approach for
Metaphor Processing . . . . . . . . . . 406--420
Longbiao Cheng and
Xingwei Sun and
Dingding Yao and
Junfeng Li and
Yonghong Yan Estimation Reliability Function Assisted
Sound Source Localization With Enhanced
Steering Vector Phase Difference . . . . 421--435
Wangyang Yu and
W. Bastiaan Kleijn Room Acoustical Parameter Estimation
From Room Impulse Responses Using Deep
Neural Networks . . . . . . . . . . . . 436--447
Miguel Ferrer and
Maria de Diego and
Gema Piñero and
Alberto Gonzalez Affine Projection Algorithm Over
Acoustic Sensor Networks for Active
Noise Control . . . . . . . . . . . . . 448--461
Nico Gößling and
Daniel Marquardt and
Simon Doclo Performance Analysis of the Extended
Binaural MVDR Beamformer With Partial
Noise Estimation . . . . . . . . . . . . 462--476
Gábor Gosztolya and
Róbert Busa-Fekete Ensemble Bag-of-Audio-Words
Representation Improves Paralinguistic
Classification Accuracy . . . . . . . . 477--488
Alfred Mertins and
Marco Maass and
Fabrice Katzberg Room Impulse Response Reshaping and
Crosstalk Cancellation Using Convex
Optimization . . . . . . . . . . . . . . 489--502
Xuefeng Bai and
Pengbo Liu and
Yue Zhang Investigating Typed Syntactic
Dependencies for Targeted Sentiment
Classification Using Graph Attention
Neural Network . . . . . . . . . . . . . 503--514
Bengt J. Borgström and
Michael S. Brandstein Speech Enhancement via Attention Masking
Network (SEAMNET): an End-to-End System
for Joint Suppression of Noise and
Reverberation . . . . . . . . . . . . . 515--526
Juan M. Miramont and
Marcelo A. Colominas and
Gastón Schlotthauer Voice Jitter Estimation Using High-Order
Synchrosqueezing Operators . . . . . . . 527--536
Peidong Wang and
Zhuo Chen and
DeLiang Wang and
Jinyu Li and
Yifan Gong Speaker Separation Using Speaker
Inventories and Estimated Speech . . . . 537--546
Sandro Cumani On the Distribution of Speaker
Verification Scores: Generative Models
for Unsupervised Calibration . . . . . . 547--562
Yu-Ren Chien and
Jón Gu\ethnason Acoustic Measure of Vocal Strain Based
on Glottal Airflow Periodicity . . . . . 563--574
Xingfa Shen and
Xingkun Shao and
Quanbo Ge and
Lili Liu RARS: Recognition of Audio Recording
Source Based on Residual Neural Network 575--584
Gang Chen and
Yang Liu and
Huanbo Luan and
Meng Zhang and
Qun Liu and
Maosong Sun Learning to Generate Explainable Plots
for Neural Story Generation . . . . . . 585--593
Wenxing Yang and
Jacob Benesty and
Gongping Huang and
Jingdong Chen A New Class of Differential Beamformers 594--606
Yuki Mitsufuji and
Norihiro Takamune and
Shoichi Koyama and
Hiroshi Saruwatari Multichannel Blind Source Separation
Based on Evanescent-Region-Aware
Non-Negative Tensor Factorization in
Spherical Harmonic Domain . . . . . . . 607--617
Dörte Fischer and
Simon Doclo Robust Constrained MFMVDR Filters for
Single-Channel Speech Enhancement Based
on Spherical Uncertainty Set . . . . . . 618--631
Xudong Zhao and
Jacob Benesty and
Jingdong Chen and
Gongping Huang Differential Beamforming From the
Beampattern Factorization Perspective 632--643
Yuki Kawara and
Chenhui Chu and
Yuki Arase Preordering Encoding on Transformer for
Translation . . . . . . . . . . . . . . 644--655
Anonymous Table of Contents . . . . . . . . . . . c1--ix
Anonymous IEEE Signal Processing Society . . . . . c2--c2
Anonymous Table of Contents . . . . . . . . . . . x--xx
Yuki Kawara and
Chenhui Chu and
Yuki Arase Preordering Encoding on Transformer for
Translation . . . . . . . . . . . . . . 644--655
Hirokazu Kameoka and
Wen-Chin Huang and
Kou Tanaka and
Takuhiro Kaneko and
Nobukatsu Hojo and
Tomoki Toda Many-to-Many Voice Transformer Network 656--670
Jie Zhang and
Huawei Chen and
Li-Rong Dai and
Richard Christian Hendriks A Study on Reference Microphone
Selection for Multi-Microphone Speech
Enhancement . . . . . . . . . . . . . . 671--683
Archontis Politis and
Annamaria Mesaros and
Sharath Adavanne and
Toni Heittola and
Tuomas Virtanen Overview and Evaluation of Sound Event
Localization and Detection in DCASE 2019 684--698
Markus Niermann and
Peter Vary Listening Enhancement in Noisy
Environments: Solutions in Time and
Frequency Domain . . . . . . . . . . . . 699--709
Hyeonseung Lee and
Woo Hyun Kang and
Sung Jun Cheon and
Hyeongju Kim and
Nam Soo Kim Gated Recurrent Context: Softmax-Free
Attention for Online Encoder-Decoder
Speech Recognition . . . . . . . . . . . 710--719
Elizabeth Vargas and
James R. Hopgood and
Keith Brown and
Kartic Subr On Improved Training of CNN for Acoustic
Source Localisation . . . . . . . . . . 720--732
Yunqi Cai and
Lantian Li and
Andrew Abel and
Xiaoyan Zhu and
Dong Wang Deep Normalization for Speaker Vectors 733--744
Wen-Chin Huang and
Tomoki Hayashi and
Yi-Chiao Wu and
Hirokazu Kameoka and
Tomoki Toda Pretraining Techniques for
Sequence-to-Sequence Voice Conversion 745--755
Arindam Jati and
Amrutha Nadarajan and
Raghuveer Peri and
Karel Mundnich and
Tiantian Feng and
Benjamin Girault and
Shrikanth Narayanan Temporal Dynamics of Workplace Acoustic
Scenes: Egocentric Analysis and
Prediction . . . . . . . . . . . . . . . 756--769
Chaoqun Duan and
Kehai Chen and
Rui Wang and
Masao Utiyama and
Eiichiro Sumita and
Conghui Zhu and
Tiejun Zhao Modeling Future Cost for Neural Machine
Translation . . . . . . . . . . . . . . 770--781
Kashif Munir and
Hai Zhao and
Zuchao Li Adaptive Convolution for Semantic Role
Labeling . . . . . . . . . . . . . . . . 782--791
Yi-Chiao Wu and
Tomoki Hayashi and
Takuma Okamoto and
Hisashi Kawai and
Tomoki Toda Quasi-Periodic Parallel WaveGAN: a
Non-Autoregressive Raw Waveform
Generative Model With Pitch-Dependent
Dilated Convolution Neural Network . . . 792--806
Weitao Yuan and
Bofei Dong and
Shengbei Wang and
Masashi Unoki and
Wenwu Wang Evolving Multi-Resolution Pooling CNN
for Monaural Singing Voice Separation 807--822
Liming Shi and
Taewoong Lee and
Lijun Zhang and
Jesper Kjær Nielsen and
Mads Græsbòll Christensen Generation of Personal Sound Zones With
Physical Meaningful Constraints and
Conjugate Gradient Method . . . . . . . 823--837
Xi Chen and
Jacob Benesty and
Gongping Huang and
Jingdong Chen On the Robustness of the Superdirective
Beamformer . . . . . . . . . . . . . . . 838--849
Xinsheng Wang and
Tingting Qiao and
Jihua Zhu and
Alan Hanjalic and
Odette Scharenborg Generating Images From Spoken
Descriptions . . . . . . . . . . . . . . 850--865
Vevake Balaraman and
Bernardo Magnini Domain-Aware Dialogue State Tracker for
Multi-Domain Dialogue Systems . . . . . 866--873
Xixin Wu and
Yuewen Cao and
Hui Lu and
Songxiang Liu and
Shiyin Kang and
Zhiyong Wu and
Xunying Liu and
Helen Meng Exemplar-Based Emotive Speech Synthesis 874--886
Heinrich Dinkel and
Mengyue Wu and
Kai Yu Towards Duration Robust Weakly
Supervised Sound Event Detection . . . . 887--900
Zamir Ben-Hur and
David Lou Alon and
Ravish Mehra and
Boaz Rafaely Binaural Reproduction Based on Bilateral
Ambisonics and Ear-Aligned HRTFs . . . . 901--913
Philipp Aichinger and
Franz Pernkopf Synthesis and Analysis-By-Synthesis of
Modulated Diplophonic Glottal Area
Waveforms . . . . . . . . . . . . . . . 914--926
Finnian Kelly and
John H. L. Hansen Analysis and Calibration of Lombard
Effect and Whisper for Speaker
Recognition . . . . . . . . . . . . . . 927--942
Matthias Müller and
Thilo Schulz and
Tatiana Ermakova and
Philipp P. Caffier Lyric or Dramatic --- Vibrato Analysis
for Voice Type Classification in
Professional Opera Singers . . . . . . . 943--955
Demóstenes Z. Rodríguez and
Dick Carrillo and
Miguel A. Ramírez and
Pedro H. J. Nardelli and
Sebastian Möller Incorporating Wireless Communication
Parameters Into the E-Model Algorithm 956--968
Tianrui Zong and
Yong Xiang and
Iynkaran Natgunanathan and
Longxiang Gao and
Guang Hua and
Wanlei Zhou Non-Linear-Echo Based Anti-Collusion
Mechanism for Audio Signals . . . . . . 969--984
Zheng Lian and
Bin Liu and
Jianhua Tao CTNet: Conversational Transformer
Network for Emotion Recognition . . . . 985--1000
Jiacheng Zhang and
Huanbo Luan and
Maosong Sun and
Feifei Zhai and
Jingfang Xu and
Yang Liu Neural Machine Translation With Explicit
Phrase Alignment . . . . . . . . . . . . 1001--1010
Maria Vukovic and
Melissa Stolar and
Margaret Lech Cognitive Load Estimation From Speech
Commands to Simulated Aircraft . . . . . 1011--1022
De Hu and
Zhe Chen and
Fuliang Yin Geometry Calibration for Acoustic
Transceiver Networks Based on Network
Newton Distributed Optimization . . . . 1023--1032
Yuki Saito and
Shinnosuke Takamichi and
Hiroshi Saruwatari Perceptual-Similarity-Aware Deep Speaker
Representation Learning for
Multi-Speaker Generative Modeling . . . 1033--1048
Tadashi Sakata and
Naomitsu Ikeda and
Yuichi Ueda and
Akira Watanabe Vocal Tract Length Estimation Using
Accumulated Means of Formants and Its
Effects on Speaker-Normalization . . . . 1049--1064
Jichen Yang and
Hongji Wang and
Rohan Kumar Das and
Yanmin Qian Modified Magnitude-Phase Spectrum
Information for Spoofing Detection . . . 1065--1078
Yanmin Qian and
Zhengyang Chen and
Shuai Wang Audio-Visual Deep Neural Network for
Robust Person Verification . . . . . . . 1079--1092
Peiqin Lin and
Meng Yang and
Jianhuang Lai Deep Selective Memory Network With
Selective Attention and Inter-Aspect
Modeling for Aspect Level Sentiment
Classification . . . . . . . . . . . . . 1093--1106
Herman Kamper and
Yevgen Matusevych and
Sharon Goldwater Improved Acoustic Word Embeddings for
Zero-Resource Languages Using
Multilingual Transfer . . . . . . . . . 1107--1118
Weiqing Wang and
Jin Pan and
Hua Yi and
Zhanmei Song and
Ming Li Audio-Based Piano Performance Evaluation
for Beginners With Convolutional Neural
Network and Attention Mechanism . . . . 1119--1133
Yi-Chiao Wu and
Tomoki Hayashi and
Patrick Lumban Tobing and
Kazuhiro Kobayashi and
Tomoki Toda Quasi-Periodic WaveNet: an
Autoregressive Raw Waveform Generative
Model With Pitch-Dependent Dilated
Convolution Neural Network . . . . . . . 1134--1148
Vesa Välimäki and
Karolina Prawda Late-Reverberation Synthesis Using
Interleaved Velvet-Noise Sequences . . . 1149--1160
Zhuosheng Zhang and
Junlong Li and
Hai Zhao Multi-Turn Dialogue Reading
Comprehension With Pivot Turns and
Knowledge . . . . . . . . . . . . . . . 1161--1173
Clément Gaultier and
Sr an Kiti and
Rémi Gribonval and
Nancy Bertin Sparsity-Based Audio Declipping Methods:
Selected Overview, New Algorithms, and
Large-Scale Evaluation . . . . . . . . . 1174--1187
Lachlan Birnie and
Thushara Abhayapala and
Vladimir Tourbabin and
Prasanga Samarasinghe Mixed Source Sound Field Translation for
Virtual Binaural Application With
Perceptual Validation . . . . . . . . . 1188--1203
Monisankha Pal and
Manoj Kumar and
Raghuveer Peri and
Tae Jin Park and
So Hyun Kim and
Catherine Lord and
Somer Bishop and
Shrikanth Narayanan Meta-Learning With Latent Space
Clustering in Generative Adversarial
Network for Speaker Diarization . . . . 1204--1219
Jie Zhang and
Jun Du and
Li-Rong Dai Sensor Selection for Relative Acoustic
Transfer Function Steered
Linearly-Constrained Beamformers . . . . 1220--1232
Huang Xie and
Tuomas Virtanen Zero-Shot Audio Classification Via
Semantic Embeddings . . . . . . . . . . 1233--1242
Xianhong Chen and
Changchun Bao Phoneme-Unit-Specific Time-Delay Neural
Network for Speaker Verification . . . . 1243--1255
Dongyuan Shi and
Woon-Seng Gan and
Bhan Lam and
Shulin Wen and
Xiaoyi Shen Optimal Output-Constrained Active Noise
Control Based on Inverse Adaptive
Modeling Leak Factor Estimate . . . . . 1256--1269
Ashutosh Pandey and
DeLiang Wang Dense CNN With Self-Attention for
Time-Domain Speech Enhancement . . . . . 1270--1279
Libo Qin and
Wanxiang Che and
Minheng Ni and
Yangming Li and
Ting Liu Knowing Where to Leverage: Context-Aware
Graph Convolutional Network With an
Adaptive Fusion Layer for Contextual
Spoken Language Understanding . . . . . 1280--1289
Mingyang Zhang and
Yi Zhou and
Li Zhao and
Haizhou Li Transfer Learning From Speech Synthesis
to Voice Conversion With Non-Parallel
Training Data . . . . . . . . . . . . . 1290--1302
Weipeng He and
Petr Motlicek and
Jean-Marc Odobez Neural Network Adaptation and Data
Augmentation for Multi-Speaker
Direction-of-Arrival Estimation . . . . 1303--1317
Yile Wang and
Leyang Cui and
Yue Zhang Improving Skip-Gram Embeddings Using
BERT . . . . . . . . . . . . . . . . . . 1318--1328
Linzhi Wu and
Meishan Zhang Deep Graph-Based Character-Level Chinese
Dependency Parsing . . . . . . . . . . . 1329--1339
Ye Bai and
Jiangyan Yi and
Jianhua Tao and
Zhengqi Wen and
Zhengkun Tian and
Shuai Zhang Integrating Knowledge Into End-to-End
Speech Recognition From External
Text-Only Data . . . . . . . . . . . . . 1340--1351
Byung Joon Cho and
Hyung-Min Park Convolutional Maximum-Likelihood
Distortionless Response Beamforming With
Steering Vector Estimation for Robust
Speech Recognition . . . . . . . . . . . 1352--1367
Daniel Michelsanti and
Zheng-Hua Tan and
Shi-Xiong Zhang and
Yong Xu and
Meng Yu and
Dong Yu and
Jesper Jensen An Overview of Deep-Learning-Based
Audio-Visual Speech Enhancement and
Separation . . . . . . . . . . . . . . . 1368--1396
Gal Itzhak and
Jacob Benesty and
Israel Cohen On the Design of Differential Kronecker
Product Beamformers . . . . . . . . . . 1397--1410
Zhongshu Ge and
Liang Li and
Tianshu Qu Partially Matching Projection Decoding
Method Evaluation Under Different
Playback Conditions . . . . . . . . . . 1411--1423
Sijie Mai and
Songlong Xing and
Haifeng Hu Analyzing Multimodal Sentiment Via
Acoustic- and Visual-LSTM With
Channel-Aware Temporal Convolution
Network . . . . . . . . . . . . . . . . 1424--1437
Tao Qian and
Meishan Zhang and
Yinxia Lou and
Daiwen Hua A Joint Model for Named Entity
Recognition With Sentence-Level Entity
Type Attentions . . . . . . . . . . . . 1438--1448
Ryotaro Sato and
Kenta Niwa and
Kazunori Kobayashi Ambisonic Signal Processing DNNs
Guaranteeing Rotation, Scale and Time
Translation Equivariance . . . . . . . . 1449--1462
Sooyeon Park and
Jung-Woo Choi Iterative Echo Labeling Algorithm With
Convex Hull Expansion for Room Geometry
Estimation . . . . . . . . . . . . . . . 1463--1478
Aidan O. T. Hogg and
Christine Evers and
Alastair H. Moore and
Patrick A. Naylor Overlapping Speaker Segmentation Using
Multiple Hypothesis Tracking of
Fundamental Frequency . . . . . . . . . 1479--1490
Rajib Sharma and
Israel Cohen and
Baruch Berdugo Controlling Elevation and Azimuth
Beamwidths With Concentric Circular
Microphone Arrays . . . . . . . . . . . 1491--1502
Run-Ze Wang and
Zhen-Hua Ling and
Jing-Bo Zhou and
Yu Hu A Multiple-Integration Encoder for
Multi-Turn Text-to-SQL Semantic Parsing 1503--1513
Shoukang Hu and
Xurong Xie and
Shansong Liu and
Jianwei Yu and
Zi Ye and
Mengzhe Geng and
Xunying Liu and
Helen Meng Bayesian Learning of LF-MMI Trained Time
Delay Neural Networks for Speech
Recognition . . . . . . . . . . . . . . 1514--1529
Matteo Torcoli and
Thorsten Kastner and
Jürgen Herre Objective Measures of Perceptual Audio
Quality Reviewed: an Evaluation of Their
Application Domain Dependence . . . . . 1530--1541
Heinrich Dinkel and
Shuai Wang and
Xuenan Xu and
Mengyue Wu and
Kai Yu Voice Activity Detection in the Wild: a
Data-Driven Approach Using
Teacher-Student Training . . . . . . . . 1542--1555
Songbin Li and
Jingang Wang and
Peng Liu and
Miao Wei and
Qiandong Yan Detection of Multiple Steganography
Methods in Compressed Speech Based on
Code Element Embedding, Bi-LSTM and CNN
With Attention Mechanisms . . . . . . . 1556--1569
Qianli Ma and
Jiangyue Yan and
Zhenxi Lin and
Liuhong Yu and
Zipeng Chen Deformable Self-Attention for Text
Classification . . . . . . . . . . . . . 1570--1581
Ya-Jie Zhang and
Zhen-Hua Ling Extracting and Predicting Word-Level
Style Variations for Speech Synthesis 1582--1593
Alexander Bohlender and
Ann Spriet and
Wouter Tirry and
Nilesh Madhu Exploiting Temporal Context in CNN Based
Multisource DOA Estimation . . . . . . . 1594--1608
Kohei Yatabe and
Daichi Kitamura Determined BSS Based on Time-Frequency
Masking and Its Application to Harmonic
Vector Analysis . . . . . . . . . . . . 1609--1625
Ji Won Yoon and
Hyeonseung Lee and
Hyung Yong Kim and
Won Ik Cho and
Nam Soo Kim TutorNet: Towards Flexible Knowledge
Distillation for End-to-End Speech
Recognition . . . . . . . . . . . . . . 1626--1638
Prachi Singh and
Sriram Ganapathy Self-Supervised Representation Learning
With Path Integral Clustering for
Speaker Diarization . . . . . . . . . . 1639--1649
Penghui Wei and
Jiahao Zhao and
Wenji Mao A Graph-to-Sequence Learning Framework
for Summarizing Opinionated Texts . . . 1650--1660
Dovid Y. Levin and
Shmulik Markovich-Golan and
Sharon Gannot Near-Field Superdirectivity: an
Analytical Perspective . . . . . . . . . 1661--1674
Jia-Hao Hsu and
Ming-Hsiang Su and
Chung-Hsien Wu and
Yi-Hsuan Chen Speech Emotion Recognition Considering
Nonverbal Vocalization in Affective
Conversations . . . . . . . . . . . . . 1675--1686
Tomohiko Nakamura and
Shihori Kozuka and
Hiroshi Saruwatari Time-Domain Audio Source Separation With
Neural Networks Based on Multiresolution
Analysis . . . . . . . . . . . . . . . . 1687--1701
Yun Zhang and
Yongguo Liu and
Jiajing Zhu and
Xindong Wu FSPRM: a Feature Subsequence Based
Probability Representation Model for
Chinese Word Embedding . . . . . . . . . 1702--1716
Songxiang Liu and
Yuewen Cao and
Disong Wang and
Xixin Wu and
Xunying Liu and
Helen Meng Any-to-Many Voice Conversion With
Location-Relative Sequence-to-Sequence
Modeling . . . . . . . . . . . . . . . . 1717--1728
Rafael A. Chiea and
Márcio H. Costa and
Júlio A. Cordioli An Optimal Envelope-Based Noise
Reduction Method for Cochlear Implants:
an Upper Bound Performance Investigation 1729--1739
Junliang Guo and
Zhirui Zhang and
Linli Xu and
Boxing Chen and
Enhong Chen Adaptive Adapters: an Efficient Way to
Incorporate BERT Into Neural Machine
Translation . . . . . . . . . . . . . . 1740--1751
Yi Luo and
Cong Han and
Nima Mesgarani Group Communication With Context Codec
for Lightweight Source Separation . . . 1752--1761
Zhiwen Xie and
Runjie Zhu and
Jin Liu and
Guangyou Zhou and
Jimmy Xiangji Huang Hierarchical Neighbor Propagation With
Bidirectional Graph Attention Network
for Relation Prediction . . . . . . . . 1762--1773
Xuehan Wang and
Jacob Benesty and
Jingdong Chen and
Gongping Huang and
Israel Cohen Beamforming with Cube Microphone Arrays
Via Kronecker Product Decompositions . . 1774--1784
Ke Tan and
DeLiang Wang Towards Model Compression for Deep
Learning Based Speech Enhancement . . . 1785--1794
Kristina Tesch and
Timo Gerkmann Nonlinear Spatial Filtering in
Multichannel Speech Enhancement . . . . 1795--1805
Rui Liu and
Berrak Sisman and
Guanglai Gao and
Haizhou Li Expressive TTS Training With Frame and
Style Reconstruction Loss . . . . . . . 1806--1818
Jipeng Qiang and
Xinyu Lu and
Yun Li and
Yunhao Yuan and
Xindong Wu Chinese Lexical Simplification . . . . . 1819--1828
Andong Li and
Wenzhe Liu and
Chengshi Zheng and
Cunhang Fan and
Xiaodong Li Two Heads are Better Than One: a
Two-Stage Complex Spectral Mapping
Approach for Monaural Speech Enhancement 1829--1843
Eric C. Hamdan and
Filippo Maria Fazi Weighted Orthogonal Vector Rejection
Method for Loudspeaker-Based Binaural
Audio Reproduction . . . . . . . . . . . 1844--1852
Ke Tan and
Xueliang Zhang and
DeLiang Wang Deep Learning Based Real-Time Speech
Enhancement for Dual-Microphone Mobile
Phones . . . . . . . . . . . . . . . . . 1853--1863
Kunkun SongGong and
Huawei Chen and
Wenwu Wang Indoor Multi-Speaker Localization Based
on Bayesian Nonparametrics in the
Circular Harmonic Domain . . . . . . . . 1864--1880
Aleksej Chinaev and
Philipp Thüne and
Gerald Enzner Double-Cross-Correlation Processing for
Blind Sampling-Rate and Time-Offset
Estimation . . . . . . . . . . . . . . . 1881--1896
Ye Bai and
Jiangyan Yi and
Jianhua Tao and
Zhengkun Tian and
Zhengqi Wen and
Shuai Zhang Fast End-to-End Speech Recognition Via
Non-Autoregressive Models and
Cross-Modal Knowledge Transferring From
BERT . . . . . . . . . . . . . . . . . . 1897--1911
Öykü Deniz Köse and
Murat Saraçlar Multimodal Representations for
Synchronized Speech and Real-Time MRI
Video Processing . . . . . . . . . . . . 1912--1924
N. P. Narendra and
Björn Schuller and
Paavo Alku The Detection of Parkinson's Disease
From Speech Using Voice Source
Information . . . . . . . . . . . . . . 1925--1936
Robert Rehr and
Timo Gerkmann SNR-Based Features and Diverse Training
Data for Robust DNN-Based Speech
Enhancement . . . . . . . . . . . . . . 1937--1949
Nobutaka Ito and
Rintaro Ikeshita and
Hiroshi Sawada and
Tomohiro Nakatani A Joint Diagonalization Based Efficient
Approach to Underdetermined Blind Audio
Source Separation Using the Multichannel
Wiener Filter . . . . . . . . . . . . . 1950--1965
Hao Fei and
Shengqiong Wu and
Yafeng Ren and
Donghong Ji Second-Order Semantic Role Labeling With
Global Structural Refinement . . . . . . 1966--1976
Humberto M. Torres and
Mercedes Güemes and
Jorge A. Gurlekian and
Diego A. Evin F0 Perturbation Due to Articulatory
Movements: Filtering, Characterization
and Applications . . . . . . . . . . . . 1977--1986
Khaled Koutini and
Hamid Eghbal-zadeh and
Gerhard Widmer Receptive Field Regularization
Techniques for Audio Classification and
Tagging With Deep Convolutional Neural
Networks . . . . . . . . . . . . . . . . 1987--2000
Zhong-Qiu Wang and
Peidong Wang and
DeLiang Wang Multi-microphone Complex Spectral
Mapping for Utterance-wise and
Continuous Speech Separation . . . . . . 2001--2014
Mengjia Zhou and
Donghong Ji and
Fei Li Relation Extraction in Dialogues: a Deep
Learning Model Based on the Generality
and Specialty of Dialogue Text . . . . . 2015--2026
Minh Nguyen and
Gia H. Ngo and
Nancy F. Chen Domain-Shift Conditioning Using
Adaptable Filtering Via Hierarchical
Embeddings for Robust Chinese Spell
Check . . . . . . . . . . . . . . . . . 2027--2036
Lior Madmoni and
Shir Tibor and
Israel Nelken and
Boaz Rafaely The Effect of Partial Time-Frequency
Masking of the Direct Sound on the
Perception of Reverberant Speech . . . . 2037--2047
Haibin Chen and
Qianli Ma and
Liuhong Yu and
Zhenxi Lin and
Jiangyue Yan Corpus-Aware Graph Aggregation Network
for Sequence Labeling . . . . . . . . . 2048--2057
Heming Wang and
DeLiang Wang Towards Robust Speech Super-Resolution 2058--2066
Jianwei Yu and
Shi-Xiong Zhang and
Bo Wu and
Shansong Liu and
Shoukang Hu and
Mengzhe Geng and
Xunying Liu and
Helen Meng and
Dong Yu Audio-Visual Multi-Channel Integration
and Recognition of Overlapped Speech . . 2067--2082
Olga Slizovskaia and
Gloria Haro and
Emilia Gómez Conditioned Source Separation for
Musical Instrument Performances . . . . 2083--2095
Xurong Xie and
Xunying Liu and
Tan Lee and
Lan Wang Bayesian Learning for Deep Neural
Network Adaptation . . . . . . . . . . . 2096--2110
Sankha Subhra Bhattacharjee and
Nithin V. George Nearest Kronecker Product Decomposition
Based Linear-in-The-Parameters Nonlinear
Filters . . . . . . . . . . . . . . . . 2111--2122
Canguang Li and
Guohua Wang and
Jin Cao and
Yi Cai A Multi-Agent Communication Based Model
for Nested Named Entity Recognition . . 2123--2136
Jonah Ong and
Ba Tuong Vo and
Sven Nordholm Blind Separation for Multiple Moving
Sources With Labeled Random Finite Sets 2137--2151
Yixuan Su and
Yan Wang and
Deng Cai and
Simon Baker and
Anna Korhonen and
Nigel Collier PROTOTYPE-TO-STYLE: Dialogue Generation
With Style-Aware Editing on Retrieval
Memory . . . . . . . . . . . . . . . . . 2152--2161
Alberto Bernardini and
Enrico Bozzo and
Federico Fontana and
Augusto Sarti A Wave Digital Newton--Raphson Method
for Virtual Analog Modeling of Audio
Circuits with Multiple One-Port
Nonlinearities . . . . . . . . . . . . . 2162--2173
Gang Guo and
Yi Yu and
Rodrigo C. de Lamare and
Zongsheng Zheng and
Lu Lu and
Qiangming Cai Proximal Normalized Subband Adaptive
Filtering for Acoustic Echo Cancellation 2174--2188
Juho Liski and
Aki Mäkivirta and
Vesa Välimäki Audibility of Group-Delay Equalization 2189--2201
Farjana Sultana Mim and
Naoya Inoue and
Paul Reisert and
Hiroki Ouchi and
Kentaro Inui Corruption Is Not All Bad: Incorporating
Discourse Structure Into Pre-Training
via Corruption for Essay Scoring . . . . 2202--2215
Dror Kipnis and
Roee Diamant Graph-Based Clustering of Dolphin
Whistles . . . . . . . . . . . . . . . . 2216--2227
Yuanyuan Liu and
Nelly Penttilä and
Tiina Ihalainen and
Juulia Lintula and
Rachel Convey and
Okko Räsänen Language-Independent Approach for
Automatic Computation of Vowel
Articulation Features in Dysarthric
Speech Assessment . . . . . . . . . . . 2228--2243
C. Medina and
R. Coelho and
L. Zão Impulsive Noise Detection for Speech
Enhancement in HHT Domain . . . . . . . 2244--2253
Iván López-Espejo and
Zheng-Hua Tan and
Jesper Jensen A Novel Loss Function and Training
Strategy for Noise-Robust Keyword
Spotting . . . . . . . . . . . . . . . . 2254--2266
Shansong Liu and
Mengzhe Geng and
Shoukang Hu and
Xurong Xie and
Mingyu Cui and
Jianwei Yu and
Xunying Liu and
Helen Meng Recent Progress in the CUHK Dysarthric
Speech Recognition System . . . . . . . 2267--2281
Juan Zhao and
Tianrui Zong and
Yong Xiang and
Longxiang Gao and
Wanlei Zhou and
Gleb Beliakov Desynchronization Attacks Resilient
Watermarking Method Based on Frequency
Singular Value Coefficient Modification 2282--2295
Mert Burkay Çöteli and
Hüseyin Hacìhabibo\uglu Sparse Representations With Legendre
Kernels for DOA Estimation and Acoustic
Source Separation . . . . . . . . . . . 2296--2309
Nicolas Furnon and
Romain Serizel and
Slim Essid and
Irina Illina DNN-Based Mask Estimation for
Distributed Speech Enhancement in
Spatially Unconstrained Microphone
Arrays . . . . . . . . . . . . . . . . . 2310--2323
Or Haim Anidjar and
Itshak Lapidot and
Chen Hajaj and
Amit Dvir and
Issachar Gilad Hybrid Speech and Text Analysis Methods
for Speaker Change Detection . . . . . . 2324--2338
Chuang Fan and
Chaofa Yuan and
Lin Gui and
Yue Zhang and
Ruifeng Xu Multi-Task Sequence Tagging for
Emotion-Cause Pair Extraction Via Tag
Distribution Refinement . . . . . . . . 2339--2350
Andy T. Liu and
Shang-Wen Li and
Hung-yi Lee TERA: Self-Supervised Learning of
Transformer Encoder Representation for
Speech . . . . . . . . . . . . . . . . . 2351--2366
Guanlong Zhao and
Shaojin Ding and
Ricardo Gutierrez-Osuna Converting Foreign Accent Speech Without
a Reference . . . . . . . . . . . . . . 2367--2381
Kilian Schulze-Forster and
Clement S. J. Doire and
Gaël Richard and
Roland Badeau Phoneme Level Lyrics Alignment and
Text-Informed Singing Voice Separation 2382--2395
Shengqiong Wu and
Hao Fei and
Yafeng Ren and
Bobo Li and
Fei Li and
Donghong Ji High-Order Pair-Wise Aspect and Opinion
Terms Extraction With Edge-Enhanced
Syntactic Graph Convolution . . . . . . 2396--2406
Jingyi Wu and
Lin Shang and
Xiaoying Gao Sentiment Time Series Calibration for
Event Detection . . . . . . . . . . . . 2407--2420
Kashif Munir and
Hai Zhao and
Zuchao Li Learning Context-Aware Convolutional
Filters for Implicit Discourse Relation
Classification . . . . . . . . . . . . . 2421--2433
Seokhwan Kim and
Hannes Schulz and
Chulaka Gunasekara and
Chiori Hori and
Abhinav Rastogi and
Luis Fernando D. Haro Editorial: Special Issue on the Eighth
Dialog System Technology Challenge . . . 2434--2436
Byoungjae Kim and
Jungyun Seo and
Myoung-Wan Koo Randomly Wired Network Based on RoBERTa
and Dialog History Attention for
Response Selection . . . . . . . . . . . 2437--2442
Jia-Chen Gu and
Tianda Li and
Zhen-Hua Ling and
Quan Liu and
Zhiming Su and
Yu-Ping Ruan and
Xiaodan Zhu Deep Contextualized Utterance
Representations for Response Selection
and Dialogue Analysis . . . . . . . . . 2443--2455
Yun-Wei Chu and
Kuan-Yen Lin and
Chao-Chun Hsu and
Lun-Wei Ku End-to-End Recurrent Cross-Modality
Attention for Video Dialogue . . . . . . 2456--2464
Kun Xu and
Han Wu and
Linfeng Song and
Haisong Zhang and
Linqi Song and
Dong Yu Conversational Semantic Role Labeling 2465--2475
Zekang Li and
Zongjia Li and
Jinchao Zhang and
Yang Feng and
Jie Zhou Bridging Text and Video: a Universal
Multimodal Transformer for Audio-Visual
Scene-Aware Dialog . . . . . . . . . . . 2476--2483
Igor Shalyminov and
Alessandro Sordoni and
Adam Atkinson and
Hannes Schulz GRTr: Generative-Retrieval Transformers
for Data-Efficient Dialogue Domain
Adaptation . . . . . . . . . . . . . . . 2484--2492
Jiali Zeng and
Yongjing Yin and
Yang Liu and
Yubin Ge and
Jinsong Su Domain Adaptive Meta-Learning for
Dialogue State Tracking . . . . . . . . 2493--2501
Chen Zhang and
Grandee Lee and
Luis Fernando D. Haro and
Haizhou Li D-Score: Holistic Dialogue Evaluation
Without Reference . . . . . . . . . . . 2502--2516
Shrikant Malviya and
Rohit Mishra and
Santosh Kumar Barnwal and
Uma Shanker Tiwary HDRS: Hindi Dialogue Restaurant Search
Corpus for Dialogue State Tracking in
Task-Oriented Environment . . . . . . . 2517--2528
Seokhwan Kim and
Michel Galley and
Chulaka Gunasekara and
Sungjin Lee and
Adam Atkinson and
Baolin Peng and
Hannes Schulz and
Jianfeng Gao and
Jinchao Li and
Mahmoud Adada and
Minlie Huang and
Luis Lastras and
Jonathan K. Kummerfeld and
Walter S. Lasecki and
Chiori Hori and
Anoop Cherian and
Tim K. Marks and
Abhinav Rastogi and
Xiaoxue Zang and
Srinivas Sunkara and
Raghav Gupta Overview of the Eighth Dialog System
Technology Challenge: DSTC8 . . . . . . 2529--2540
Myeongho Jeong and
Seungtaek Choi and
Jinyoung Yeo and
Seung-won Hwang Label and Context Augmentation for
Response Selection at DSTC8 . . . . . . 2541--2550
Qing Liu and
Lei Chen and
Yuan Yuan and
Huarui Wu History Reuse and Bag-of-Words Loss for
Long Summary Generation . . . . . . . . 2551--2560
Lu Zhang and
Mingjiang Wang and
Qiquan Zhang and
Xinsheng Wang and
Ming Liu PhaseDCN: a Phase-Enhanced Dual-Path
Dilated Convolutional Network for
Single-Channel Speech Enhancement . . . 2561--2574
Kazi Nazmul Haque and
Rajib Rana and
Jiajun Liu and
John H. L. Hansen and
Nicholas Cummins and
Carlos Busso and
Björn W. Schuller Guided Generative Adversarial Neural
Network for Representation Learning and
Audio Generation Using Fewer Labelled
Audio Data . . . . . . . . . . . . . . . 2575--2590
Toru Nakashika and
Kohei Yatabe Gamma Boltzmann Machine for Audio
Modeling . . . . . . . . . . . . . . . . 2591--2605
Xintong Li and
Lemao Liu and
Zhaopeng Tu and
Guanlin Li and
Shuming Shi and
Max Q.-H. Meng Attending From Foresight: a Novel
Attention Mechanism for Neural Machine
Translation . . . . . . . . . . . . . . 2606--2616
Hengshun Zhou and
Jun Du and
Yuanyuan Zhang and
Qing Wang and
Qing-Feng Liu and
Chin-Hui Lee Information Fusion in Attention Networks
Using Adaptive and Multi-Level
Factorized Bilinear Pooling for
Audio-Visual Emotion Recognition . . . . 2617--2629
Yuling Li and
Kui Yu and
Yuhong Zhang Learning Cross-Lingual Mappings in
Imperfectly Isomorphic Embedding Spaces 2630--2642
Xiao Zhou and
Zhen-Hua Ling and
Li-Rong Dai UnitNet: a Sequence-to-Sequence Acoustic
Model for Concatenative Speech Synthesis 2643--2655
Zihan Pan and
Malu Zhang and
Jibin Wu and
Jiadong Wang and
Haizhou Li Multi-Tone Phase Coding of Interaural
Time Difference for Sound Source
Localization With Spiking Neural
Networks . . . . . . . . . . . . . . . . 2656--2670
Ken O Hanlon and
Mark B. Sandler FifthNet: Structured Compact Neural
Networks for Automatic Chord Recognition 2671--2682
Simone Spagnol and
Riccardo Miccini and
Marius George Onofrei and
Runar Unnthorsson and
Stefania Serafin Estimation of Spectral Notches From
Pinna Meshes: Insights From a Simple
Computational Model . . . . . . . . . . 2683--2695
Chenglin Xu and
Wei Rao and
Jibin Wu and
Haizhou Li Target Speaker Verification With
Selective Auditory Attention for Single
and Multi-Talker Speech . . . . . . . . 2696--2709
Adel Zahedi and
Michael Syskind Pedersen and
Jan Òstergaard and
Thomas Ulrich Christiansen and
Lars Bramslòw and
Jesper Jensen Minimum Processing Beamforming . . . . . 2710--2724
Xianghui Wang and
Jie Chen and
Xiaoyi Chen and
Jing Guo and
Qian Xiang Multichannel Iterative Noise Reduction
Filters in the
Short-Time-Fourier-Transform Domain
Based on Kronecker Product Decomposition 2725--2740
Kai-Li Yin and
Yi-Fei Pu and
Lu Lu Robust Q-Gradient Subband Adaptive
Filter for Nonlinear Active Noise
Control . . . . . . . . . . . . . . . . 2741--2752
Jaeuk Byun and
Jong Won Shin Monaural Speech Separation Using Speaker
Embedding From Preliminary Separation 2753--2763
Xudong Zhao and
Gongping Huang and
Jingdong Chen and
Jacob Benesty On the Design of 3D Steerable
Beamformers With Uniform Concentric
Circular Microphone Arrays . . . . . . . 2764--2778
Zifeng Cheng and
Zhiwei Jiang and
Yafeng Yin and
Na Li and
Qing Gu A Unified Target-Oriented
Sequence-to-Sequence Model for
Emotion-Cause Pair Extraction . . . . . 2779--2791
Hamid Azadi and
Mohammad-R. Akbarzadeh-T and
Hamid-R. Kobravi and
Ali Shoeibi Robust Voice Feature Selection Using
Interval Type-2 Fuzzy AHP for Automated
Diagnosis of Parkinson's Disease . . . . 2792--2802
Yukiya Hono and
Kei Hashimoto and
Keiichiro Oura and
Yoshihiko Nankaku and
Keiichi Tokuda Sinsy: a Deep Neural Network-Based
Singing Voice Synthesis System . . . . . 2803--2815
Jian Tang and
Jie Zhang and
Yan Song and
Ian McLoughlin and
Li-Rong Dai Multi-Granularity Sequence Alignment
Mapping for Encoder-Decoder Based
End-to-End ASR . . . . . . . . . . . . . 2816--2828
Chongman Leong and
Xuebo Liu and
Derek F. Wong and
Lidia S. Chao Exploiting Translation Model for
Parallel Corpus Mining . . . . . . . . . 2829--2839
Neil Zeghidour and
David Grangier Wavesplit: End-to-End Speech Separation
by Speaker Clustering . . . . . . . . . 2840--2849
Dino Oglic and
Zoran Cvetkovic and
Peter Sollich Learning Waveform-Based Acoustic Models
Using Deep Variational Convolutional
Neural Networks . . . . . . . . . . . . 2850--2863
Alexandru Nelus and
Rainer Martin Privacy-Preserving Audio Classification
Using Variational Information Feature
Extraction . . . . . . . . . . . . . . . 2864--2877
Hao Li and
DeLiang Wang and
Xueliang Zhang and
Guanglai Gao Recurrent Neural Networks and Acoustic
Features for Frame-Level Signal-to-Noise
Ratio Estimation . . . . . . . . . . . . 2878--2887
Yi Zhou and
Xiaoqing Zheng and
Xuanjing Huang Generating Responses With a Given
Syntactic Pattern in Chinese Dialogues 2888--2898
Viktor Gunnarsson and
Mikael Sternad Binaural Auralization of Microphone
Array Room Impulse Responses Using
Causal Wiener Filtering . . . . . . . . 2899--2914
Zuolong Chen and
Huawei Chen and
Quansheng Tu Sensor Imperfection Tolerance Analysis
of Robust Linear Differential Microphone
Arrays . . . . . . . . . . . . . . . . . 2915--2929
Yusheng Su and
Xu Han and
Yankai Lin and
Zhengyan Zhang and
Zhiyuan Liu and
Peng Li and
Jie Zhou and
Maosong Sun CSS-LM: a Contrastive Framework for
Semi-Supervised Fine-Tuning of
Pre-Trained Language Models . . . . . . 2930--2941
Tobias Kabzinski and
Peter Jax A Causality-Constrained Frequency-Domain
Least-Squares Filter Design Method for
Crosstalk Cancellation . . . . . . . . . 2942--2956
Frank Zalkow and
Meinard Müller CTC-Based Learning of Chroma Features
for Score Audio Music Retrieval . . . . 2957--2971
Teck Kai Chan and
Cheng Siong Chin Multi-Branch Convolutional Macaron net
for Sound Event Detection . . . . . . . 2972--2985
Tedd Kourkounakis and
Amirhossein Hajavi and
Ali Etemad FluentNet: End-to-End Detection of
Stuttered Speech Disfluencies With Deep
Learning . . . . . . . . . . . . . . . . 2986--2999
Haoyu Li and
Junichi Yamagishi Multi-Metric Optimization Using
Generative Adversarial Networks for
Near-End Speech Intelligibility
Enhancement . . . . . . . . . . . . . . 3000--3011
Zehao Lin and
Shaobo Cui and
Guodun Li and
Xiaoming Kang and
Feng Ji and
Fenglin Li and
Zhongzhou Zhao and
Haiqing Chen and
Yin Zhang Predict-Then-Decide: a Predictive
Approach for Wait or Answer Task in
Dialogue Systems . . . . . . . . . . . . 3012--3024
Metin Calis and
Steven van de Par and
Richard Heusdens and
Richard Christian Hendriks Localization Based on Enhanced Low
Frequency Interaural Level Difference 3025--3039
Christopher Liberatore Native-Nonnative Voice Conversion by
Residual Warping in a Sparse,
Anchor-Based Representation . . . . . . 3040--3051
Shoichi Koyama and
Jesper Brunnström and
Hayato Ito and
Natsuki Ueno and
Hiroshi Saruwatari Spatial Active Noise Control Based on
Kernel Interpolation of Sound Field . . 3052--3063
Jipeng Qiang and
Yun Li and
Yi Zhu and
Yunhao Yuan and
Yang Shi and
Xindong Wu LSBert: Lexical Simplification Based on
BERT . . . . . . . . . . . . . . . . . . 3064--3076
Ningyu Zhang and
Hongbin Ye and
Shumin Deng and
Chuanqi Tan and
Mosha Chen and
Songfang Huang and
Fei Huang and
Huajun Chen Contrastive Information Extraction With
Generative Transformer . . . . . . . . . 3077--3088
Jianyu Wang and
Shanzheng Guan and
Shupei Liu and
Xiao-Lei Zhang Minimum-Volume Multichannel Nonnegative
Matrix Factorization for Blind Audio
Source Separation . . . . . . . . . . . 3089--3103
Alberto Carini and
Stefania Cecchi and
Alessandro Terenzi and
Simone Orcioni A Room Impulse Response Measurement
Method Robust Towards Nonlinearities
Based on Orthogonal Periodic Sequences 3104--3117
Jie Zhang and
Changheng Li Quantization-Aware Binaural MWF Based
Noise Reduction Incorporating External
Wireless Devices . . . . . . . . . . . . 3118--3131
Biru Zhu and
Xingyao Zhang and
Ming Gu and
Yangdong Deng Knowledge Enhanced Fact Checking and
Verification . . . . . . . . . . . . . . 3132--3143
Mark A. Poletti and
Paul D. Teal A Superfast Toeplitz Matrix Inversion
Method for Single- and Multi-Channel
Inverse Filters and Its Application to
Room Equalization . . . . . . . . . . . 3144--3157
Guanlin Li and
Lemao Liu and
Conghui Zhu and
Rui Wang and
Tiejun Zhao and
Shuming Shi Detecting Source Contextual Barriers for
Understanding Neural Machine Translation 3158--3169
Chia-Chih Kuo and
Kuan-Yu Chen and
Shang-Bao Luo Audio-Aware Spoken Multiple-Choice
Question Answering With Pre-Trained
Language Models . . . . . . . . . . . . 3170--3179
Rui Liu and
Zheng Lin and
Weiping Wang Addressing Extraction and Generation
Separately: Keyphrase Prediction With
Pre-Trained Language Models . . . . . . 3180--3191
Jiangnan Li and
Hongliang Pan and
Zheng Lin and
Peng Fu and
Weiping Wang Sarcasm Detection with Commonsense
Knowledge . . . . . . . . . . . . . . . 3192--3201
Runyan Yang and
Gaofeng Cheng and
Haoran Miao and
Ta Li and
Pengyuan Zhang and
Yonghong Yan Keyword Search Using Attention-Based
End-to-End ASR and Frame-Synchronous
Phoneme Alignments . . . . . . . . . . . 3202--3215
Tareq Alkhaldi and
Chenhui Chu and
Sadao Kurohashi Flexibly Focusing on Supporting Facts,
Using Bridge Links, and Jointly Training
Specialized Modules for Multi-Hop
Question Answering . . . . . . . . . . . 3216--3225
Wenyi Wu and
Yegui Xiao and
Jianhui Lin and
Liying Ma and
Khashayar Khorasani An Efficient Filter Bank Structure for
Adaptive Notch Filtering and
Applications . . . . . . . . . . . . . . 3226--3241
Xinsheng Wang and
Justin van der Hout and
Jihua Zhu and
Mark Hasegawa-Johnson and
Odette Scharenborg Synthesizing Spoken Descriptions of
Images . . . . . . . . . . . . . . . . . 3242--3254
Vincent W. Neo and
Christine Evers and
Patrick A. Naylor Enhancement of Noisy Reverberant Speech
Using Polynomial Matrix Eigenvalue
Decomposition . . . . . . . . . . . . . 3255--3266
Riccardo Giampiccolo and
Mauro Giuseppe de Bari and
Alberto Bernardini and
Augusto Sarti Wave Digital Modeling and Implementation
of Nonlinear Audio Circuits With Nullors 3267--3279
Xixin Wu and
Yuewen Cao and
Hui Lu and
Songxiang Liu and
Disong Wang and
Zhiyong Wu and
Xunying Liu and
Helen Meng Speech Emotion Recognition Using
Sequential Capsule Networks . . . . . . 3280--3291
Yuan Gong and
Yu-An Chung and
James Glass PSLA: Improving Audio Tagging With
Pretraining, Sampling, Labeling, and
Aggregation . . . . . . . . . . . . . . 3292--3306
Licheng Zhang and
Zhendong Mao and
Benfeng Xu and
Quan Wang and
Yongdong Zhang Review and Arrange: Curriculum Learning
for Natural Language Understanding . . . 3307--3320
Fei He and
Ling He and
Jing Zhang and
Yuanyuan Li and
Xi Xiong Automatic Detection of Affective
Flattening in Schizophrenia: Acoustic
Correlates to Sound Waves and Auditory
Perception . . . . . . . . . . . . . . . 3321--3334
Saoussen Mathlouthi Bouzid and
Chiraz Ben Othmane Zribi Efficient Learning Approach for
Pronominal Anaphora and Ellipsis
Identification and Resolution in Arabic
Texts . . . . . . . . . . . . . . . . . 3335--3348
Arda Yüksel and
Berke U\ugurlu and
Aykut Koç Semantic Change Detection With Gaussian
Word Embeddings . . . . . . . . . . . . 3349--3361
Mei Li and
Lu Xiang and
Xiaomian Kang and
Yang Zhao and
Yu Zhou and
Chengqing Zong Medical Term and Status Generation From
Chinese Clinical Dialogue With
Multi-Granularity Transformer . . . . . 3362--3374
Yongwei Li and
Jianhua Tao and
Donna Erickson and
Bin Liu and
Masato Akagi $ F_0 $-Noise-Robust Glottal Source and
Vocal Tract Analysis Based on ARX-LF
Model . . . . . . . . . . . . . . . . . 3375--3383
Xianwen Liao and
Yongzhong Huang and
Yongzhuang Wei and
Chenhao Zhang and
Fu Wang and
Yong Wang Efficient Estimate of Sentence's
Representation Based on the Difference
Semantics Model . . . . . . . . . . . . 3384--3399
Kwang Myung Jeon and
Geon Woo Lee and
Nam Kyun Kim and
Hong Kook Kim TAU-Net: Temporal Activation $U$-Net
Shared With Nonnegative Matrix
Factorization for Speech Enhancement in
Unseen Noise Environments . . . . . . . 3400--3414
Yi-Yang Ding and
Hao-Jian Lin and
Li-Juan Liu and
Zhen-Hua Ling and
Yu Hu Robustness of Speech Spoofing Detectors
Against Adversarial Post-Processing of
Voice Conversion . . . . . . . . . . . . 3415--3426
Yi Zhou and
Xiaohai Tian and
Haizhou Li Language Agnostic Speaker Embedding for
Cross-Lingual Personalized Speech
Generation . . . . . . . . . . . . . . . 3427--3439
Ju Lin and
Adriaan J. de Lind van Wijngaarden and
Kuang-Ching Wang and
Melissa C. Smith Speech Enhancement Using Multi-Stage
Self-Attentive Temporal Convolutional
Networks . . . . . . . . . . . . . . . . 3440--3450
Wei-Ning Hsu and
Benjamin Bolte and
Yao-Hung Hubert Tsai and
Kushal Lakhotia and
Ruslan Salakhutdinov and
Abdelrahman Mohamed HuBERT: Self-Supervised Speech
Representation Learning by Masked
Prediction of Hidden Units . . . . . . . 3451--3460
Kouei Yamaoka and
Nobutaka Ono and
Shoji Makino Time-Frequency-Bin-Wise Linear
Combination of Beamformers for
Distortionless Signal Enhancement . . . 3461--3475
Zhong-Qiu Wang and
Gordon Wichern and
Jonathan Le Roux Convolutive Prediction for Monaural
Speech Dereverberation and
Noisy-Reverberant Speaker Separation . . 3476--3490
Bing Yang and
Hong Liu and
Xiaofei Li Learning Deep Direct-Path Relative
Transfer Function for Binaural Sound
Source Localization . . . . . . . . . . 3491--3503
Yiming Cui and
Wanxiang Che and
Ting Liu and
Bing Qin and
Ziqing Yang Pre-Training With Whole Word Masking for
Chinese BERT . . . . . . . . . . . . . . 3504--3514
Leda Sar and
Mark Hasegawa-Johnson and
Chang D. Yoo Counterfactually Fair Automatic Speech
Recognition . . . . . . . . . . . . . . 3515--3525
Zhuohuang Zhang and
Yong Xu and
Meng Yu and
Shi-Xiong Zhang and
Lianwu Chen and
Donald S. Williamson and
Dong Yu Multi-Channel Multi-Frame ADL-MVDR for
Target Speech Separation . . . . . . . . 3526--3540
Nils L. Westhausen and
Rainer Huber and
Hannah Baumgartner and
Ragini Sinha and
Jan Rennies and
Bernd T. Meyer Reduction of Subjective Listening Effort
for TV Broadcast Signals With Recurrent
Neural Networks . . . . . . . . . . . . 3541--3550
Shota Sasaki and
Jun Suzuki and
Kentaro Inui Subword-Based Compact Reconstruction for
Open-Vocabulary Neural Word Embeddings 3551--3564
Xiaodong Cui and
Wei Zhang and
Abdullah Kayi and
Mingrui Liu and
Ulrich Finkler and
Brian Kingsbury and
George Saon and
David Kung Asynchronous Decentralized Distributed
Training of Acoustic Models . . . . . . 3565--3576
Junqing Zhang and
Wen Zhang and
Jihui Aimee Zhang and
Thushara Dheemantha Abhayapala and
Lijun Zhang Spatial Active Noise Control in Rooms
Using Higher Order Sources . . . . . . . 3577--3591
Bingzhi Chen and
Qi Cao and
Mixiao Hou and
Zheng Zhang and
Guangming Lu and
David Zhang Multimodal Emotion Recognition With
Temporal and Semantic Consistency . . . 3592--3603
S. Supraja and
Andy W. H. Khong and
S. Tatinati Regularized Phrase-Based Topic Model for
Automatic Question Classification With
Domain-Agnostic Class Labels . . . . . . 3604--3616
Natsuko Maeda and
Filippo Maria Fazi and
Falk-Martin Hoffmann Sound Field Reproduction With a
Cylindrical Loudspeaker Array Using
First Order Wall Reflections . . . . . . 3617--3630
Xugang Lu and
Peng Shen and
Yu Tsao and
Hisashi Kawai Coupling a Generative Model With a
Discriminative Learning Framework for
Speaker Verification . . . . . . . . . . 3631--3641
Hannes Helmholz and
David Lou Alon and
Sebasti\`a V. Amengual Garí and
Jens Ahrens Effects of Additive Noise in Binaural
Rendering of Spherical Microphone Array
Signals . . . . . . . . . . . . . . . . 3642--3653
Joanna Hong and
Minsu Kim and
Se Jin Park and
Yong Man Ro Speech Reconstruction With Reminiscent
Sound Via Visual Voice Memory . . . . . 3654--3667
Ran Weisman and
Tom Shlomo and
Vladimir Tourbabin and
Paul Calamia and
Boaz Rafaely Robustness of Acoustic Rake Filters in
Minimum Variance Beamforming . . . . . . 3668--3678
Junhao Xu and
Jianwei Yu and
Shoukang Hu and
Xunying Liu and
Helen Meng Mixed Precision Low-Bit Quantization of
Neural Network Language Models for
Speech Recognition . . . . . . . . . . . 3679--3693
Jidong Ge and
Yunyun Huang and
Xiaoyu Shen and
Chuanyi Li and
Wei Hu Learning Fine-Grained Fact-Article
Correspondence in Legal Cases . . . . . 3694--3706
Qiuqiang Kong and
Bochen Li and
Xuchen Song and
Yuan Wan and
Yuxuan Wang High-Resolution Piano Transcription With
Pedals by Regressing Onset and Offset
Times . . . . . . . . . . . . . . . . . 3707--3717
Anonymous 2021 Index \booktitleIEEE/ACM
Transactions on Audio, Speech, and
Language Processing Vol. 29 . . . . . . 3718--3760
Anonymous IEEE Signal Processing Society . . . . . C2--C2
Qianying Liu and
Wenyu Guan and
Sujian Li and
Fei Cheng and
Daisuke Kawahara and
Sadao Kurohashi RODA: Reverse Operation Based Data
Augmentation for Solving Math Word
Problems . . . . . . . . . . . . . . . . 1--11
Kai Zhen and
Jongmo Sung and
Mi Suk Lee and
Seungkwon Beack and
Minje Kim Scalable and Efficient Neural Speech
Coding: a Hybrid Design . . . . . . . . 12--25
Sen Yang and
Yang Liu and
Dawei Feng and
Dongsheng Li Text Generation From Data With Dynamic
Planning . . . . . . . . . . . . . . . . 26--34
Stefan Liebich and
Peter Vary Occlusion Effect Cancellation in
Headphones and Hearing Devices The
Sister of Active Noise Cancellation . . 35--48
Zhuosheng Zhang and
Haojie Yu and
Hai Zhao and
Masao Utiyama Which Apple Keeps Which Doctor Away?
Colorful Word Representations With
Visual Oracles . . . . . . . . . . . . . 49--59
Zhenyu Wang and
John H. L. Hansen Multi-Source Domain Adaptation for
Text-Independent Forensic Speaker
Recognition . . . . . . . . . . . . . . 60--75
Kengtao Zheng and
Nankai Lin and
Shengyi Jiang Unsupervised Character Embedding
Correction and Candidate Word Denoising 76--86
Bing Ma and
Haifeng Sun and
Jingyu Wang and
Qi Qi and
Jianxin Liao Extractive Dialogue Summarization
Without Annotation Based on Distantly
Supervised Machine Reading Comprehension
in Customer Service . . . . . . . . . . 87--97
Shengcai Liu and
Ning Lu and
Cheng Chen and
Ke Tang Efficient Combinatorial Optimization for
Word-Level Adversarial Textual Attack 98--111
Alessandro Terenzi and
Nicola Ortolani and
Inês Nolasco and
Emmanouil Benetos and
Stefania Cecchi Comparison of Feature Extraction Methods
for Sound-Based Classification of Honey
Bee Activity . . . . . . . . . . . . . . 112--122
Shuiyang Mao and
P. C. Ching and
Tan Lee Enhancing Segment-Based Speech Emotion
Recognition by Iterative Self-Learning 123--134
Abdolreza Sabzi Shahrebabaki and
Giampiero Salvi and
Torbjòrn Svendsen and
Sabato Marco Siniscalchi Acoustic-to-Articulatory Mapping With
Joint Optimization of Deep Speech
Enhancement and Articulatory Inversion
Models . . . . . . . . . . . . . . . . . 135--147
Javier Jorge and
Adri\`a Giménez and
Joan Albert Silvestre-Cerd\`a and
Jorge Civera and
Albert Sanchis and
Alfons Juan Live Streaming Speech Recognition Using
Deep Bidirectional LSTM Acoustic Models
and Interpolated Language Models . . . . 148--161
Muhammed P. V. Shifas and
C\uat\ualin Zoril\ua and
Yannis Stylianou End-to-End Neural Based Modification of
Noisy Speech for Speech-in-Noise
Intelligibility Improvement . . . . . . 162--173
Joon-Young Yang and
Joon-Hyuk Chang VACE-WPE: Virtual Acoustic Channel
Expansion Based on Neural Networks for
Weighted Prediction Error-Based Speech
Dereverberation . . . . . . . . . . . . 174--189
Chenpeng Du and
Kai Yu Phone-Level Prosody Modelling With
GMM-Based MDN for Diverse and
Controllable Speech Synthesis . . . . . 190--201
Haibin Wu and
Xu Li and
Andy T. Liu and
Zhiyong Wu and
Helen Meng and
Hung-Yi Lee Improving the Adversarial Robustness for
Speaker Verification by Self-Supervised
Learning . . . . . . . . . . . . . . . . 202--217
Mixiao Hou and
Zheng Zhang and
Qi Cao and
David Zhang and
Guangming Lu Multi-View Speech Emotion Recognition
Via Collective Relation Construction . . 218--229
Da-rong Liu and
Po-chun Hsu and
Yi-chen Chen and
Sung-feng Huang and
Shun-po Chuang and
Da-yi Wu and
Hung-yi Lee Learning Phone Recognition From Unpaired
Audio and Phone Sequences Based on
Generative Adversarial Network . . . . . 230--243
Yuting Zhao and
Mamoru Komachi and
Tomoyuki Kajiwara and
Chenhui Chu Word-Region Alignment-Guided Multimodal
Neural Machine Translation . . . . . . . 244--259
Zhuosheng Zhang and
Yiqing Zhang and
Hai Zhao Syntax-Aware Multi-Spans Generation for
Reading Comprehension . . . . . . . . . 260--268
Pengfei Zhu and
Zhuosheng Zhang and
Hai Zhao and
Xiaoguang Li DUMA: Reading Comprehension With
Transposition Thinking . . . . . . . . . 269--279
Jiayuan Xie and
Ningxin Peng and
Yi Cai and
Tao Wang and
Qingbao Huang Diverse Distractor Generation for
Constructing High-Quality Multiple
Choice Questions . . . . . . . . . . . . 280--291
Jie Zhang and
Guanghui Zhang A Parametric Unconstrained Beamformer
Based Binaural Noise Reduction for
Assistive Hearing . . . . . . . . . . . 292--304
Luca Turchet and
Johan Pauwels Music Emotion Recognition: Intention of
Composers-Performers Versus Perception
of Musicians, Non-Musicians, and
Listening Machines . . . . . . . . . . . 305--316
Wenxin Hou and
Han Zhu and
Yidong Wang and
Jindong Wang and
Tao Qin and
Renjun Xu and
Takahiro Shinozaki Exploiting Adapters for Cross-Lingual
Low-Resource Speech Recognition . . . . 317--329
Kehai Chen and
Rui Wang and
Masao Utiyama and
Eiichiro Sumita Integrating Prior Translation Knowledge
Into Neural Machine Translation . . . . 330--339
Keqi Deng and
Gaofeng Cheng and
Runyan Yang and
Yonghong Yan Alleviating ASR Long-Tailed Problem by
Decoupling the Learning of
Representation and Classification . . . 340--354
Zuchao Li and
Junru Zhou and
Hai Zhao and
Kevin Parnow HPSG-Inspired Joint Neural Constituent
and Dependency Parsing in $ O(n^3) $
Time Complexity . . . . . . . . . . . . 355--366
Xuan Shi and
Erica Cooper and
Junichi Yamagishi Use of Speaker Recognition Approaches
for Learning and Evaluating Embedding
Representations of Musical Instrument
Sounds . . . . . . . . . . . . . . . . . 367--377
Zengwei Yao and
Wenjie Pei and
Fanglin Chen and
Guangming Lu and
David Zhang Stepwise-Refining Speech Separation
Network via Fine-Grained Encoding in
High-Order Latent Domain . . . . . . . . 378--393
Yanmin Qian and
Zhikai Zhou Optimizing Data Usage for Low-Resource
Speech Recognition . . . . . . . . . . . 394--403
Narla John Metilda Sagaya Mary and
Srinivasan Umesh and
Sandesh Varadaraju Katta S-Vectors and TESA: Speaker Embeddings
and a Speaker Authenticator Based on
Transformer Encoder . . . . . . . . . . 404--413
Bengt J. Borgström Bayesian Estimation of PLDA in the
Presence of Noisy Training Labels, With
Applications to Speaker Verification . . 414--428
Menglong Lu and
Zhen Huang and
Binyang Li and
Yunxiang Zhao and
Zheng Qin and
DongSheng Li SIFTER: a Framework for Robust Rumor
Detection . . . . . . . . . . . . . . . 429--442
Lantian Li and
Dong Wang and
Jiawen Kang and
Renyu Wang and
Jing Wu and
Zhendong Gao and
Xiao Chen A Principle Solution for Enroll-Test
Mismatch in Speaker Recognition . . . . 443--455
Feiran Yang Analysis of Deficient-Length
Partitioned-Block Frequency-Domain
Adaptive Filters . . . . . . . . . . . . 456--467
Hui Jiang and
Linfeng Song and
Yubin Ge and
Fandong Meng and
Junfeng Yao and
Jinsong Su An AST Structure Enhanced Decoder for
Code Generation . . . . . . . . . . . . 468--476
Anssi Kanervisto and
Ville Hautamäki and
Tomi Kinnunen and
Junichi Yamagishi Optimizing Tandem Speaker Verification
and Anti-Spoofing Systems . . . . . . . 477--488
Xin Ni and
Jia Ren FC-U2-Net: a Novel Deep Neural Network
for Singing Voice Separation . . . . . . 489--494
Neil Zeghidour and
Alejandro Luebs and
Ahmed Omran and
Jan Skoglund and
Marco Tagliasacchi SoundStream: an End-to-End Neural Audio
Codec . . . . . . . . . . . . . . . . . 495--507
Wageesha Manamperi and
Thushara D. Abhayapala and
Jihui Zhang and
Prasanga N. Samarasinghe Drone Audition: Sound Source
Localization Using On-Board Microphones 508--519
Qian Li and
Hao Peng and
Jianxin Li and
Jia Wu and
Yuanxing Ning and
Lihong Wang and
Philip S. Yu and
Zheng Wang Reinforcement Learning-Based Dialogue
Guided Event Extraction to Exploit
Argument Relations . . . . . . . . . . . 520--533
Santiago Ruiz and
Toon van Waterschoot and
Marc Moonen Distributed Combined Acoustic Echo
Cancellation and Noise Reduction in
Wireless Acoustic Sensor and Actuator
Networks . . . . . . . . . . . . . . . . 534--547
Lukas Grinewitschus and
Peter Jung The Harmonic Shift Algorithm for
Efficient Multi-Pitch Detection . . . . 548--561
Ziyao Lu and
Xiang Li and
Yang Liu and
Chulun Zhou and
Jianwei Cui and
Bin Wang and
Min Zhang and
Jinsong Su Exploring Multi-Stage Information
Interactions for Multi-Source Neural
Machine Translation . . . . . . . . . . 562--570
Jingxuan Yang and
Si Li and
Sheng Gao and
Jun Guo CorefDPR: a Joint Model for Coreference
Resolution and Dropped Pronoun Recovery
in Chinese Conversations . . . . . . . . 571--581
Timuçin Berk Atalay and
Zühre Sü Gül and
Enzo De Sena and
Zoran Cvetkovi\'c and
Hüseyin Hacìhabibo\uglu Scattering Delay Network Simulator of
Coupled Volume Acoustics . . . . . . . . 582--593
Yi Zhang and
Lei Li and
Yunfang Wu and
Qi Su and
Xu Sun Alleviating the Knowledge-Language
Inconsistency: a Study for Deep
Commonsense Knowledge . . . . . . . . . 594--604
Ke Tan and
Zhong-Qiu Wang and
DeLiang Wang Neural Spectrospatial Filtering . . . . 605--621
Qianren Mao and
Jianxin Li and
Chenghua Lin and
Congwen Chen and
Hao Peng and
Lihong Wang and
Philip S. Yu Adaptive Pre-Training and Collaborative
Fine-Tuning: a Win-Win Strategy to
Improve Review Analysis Tasks . . . . . 622--634
Zifeng Cheng and
Zhiwei Jiang and
Yafeng Yin and
Cong Wang and
Qing Gu Learning to Classify Open Intent via
Soft Labeling and Manifold Mixup . . . . 635--645
Xiaochun An and
Frank K. Soong and
Lei Xie Disentangling Style and Speaker
Attributes for TTS Style Transfer . . . 646--658
Zhuang Chen and
Tieyun Qian Retrieve-and-Edit Domain Adaptation for
End2End Aspect Based Sentiment Analysis 659--672
Jian Liu and
Mengshi Yu and
Yufeng Chen and
Jinan Xu Cross-Domain Slot Filling as Machine
Reading Comprehension: a New Perspective 673--685
Yongkang Liu and
Qingbao Huang and
Jing Li and
Linzhang Mo and
Yi Cai and
Qing Li SSAP: Storylines and Sentiment Aware
Pre-Trained Model for Story Ending
Generation . . . . . . . . . . . . . . . 686--694
Ying Zhou and
Xuefeng Liang and
Yu Gu and
Yifei Yin and
Longshan Yao Multi-Classifier Interactive Learning
for Ambiguous Speech Emotion Recognition 695--705
Poul Hoang and
Jan Mark de Haan and
Zheng-Hua Tan and
Jesper Jensen Multichannel Speech Enhancement With Own
Voice-Based Interfering Speech
Suppression for Hearing Assistive
Devices . . . . . . . . . . . . . . . . 706--720
Weijie Yu and
Chen Xu and
Jun Xu and
Liang Pang and
Ji-Rong Wen Distribution Distance Regularized
Sequence Representation for Text
Matching in Asymmetrical Domains . . . . 721--733
Heming Wang and
DeLiang Wang Neural Cascade Architecture With
Triple-Domain Loss for Speech
Enhancement . . . . . . . . . . . . . . 734--743
Riccardo R. De Lucia and
Antonio Canclini and
Fabio Antonacci and
Augusto Sarti Group Dictionary Equivalent Source
Method for Sparse Nearfield Acoustic
Holography . . . . . . . . . . . . . . . 744--757
Tong Ma and
Ying Wei and
Xin Lou Reconfigurable Nonuniform Filter Bank
for Hearing Aid Systems . . . . . . . . 758--771
Victoria Mingote and
Antonio Miguel and
Dayana Ribas and
Alfonso Ortega and
Eduardo Lleida aDCF Loss Function for Deep Metric
Learning in End-to-End Text-Dependent
Speaker Verification Systems . . . . . . 772--784
Quansheng Tu and
Huawei Chen Theoretical Lower Bounds on the
Performance of the First-Order
Differential Microphone Arrays With
Sensor Imperfections . . . . . . . . . . 785--801
Taihui Wang and
Feiran Yang and
Jun Yang Convolutive Transfer Function-Based
Multichannel Nonnegative Matrix
Factorization for Overdetermined Blind
Source Separation . . . . . . . . . . . 802--815
Yi Zhang and
Guangyou Zhou and
Zhiwen Xie and
Jimmy Xiangji Huang HGEN: Learning Hierarchical
Heterogeneous Graph Encoding for Math
Word Problem Solving . . . . . . . . . . 816--828
Eduardo Fonseca and
Xavier Favory and
Jordi Pons and
Frederic Font and
Xavier Serra FSD50K: an Open Dataset of Human-Labeled
Sound Events . . . . . . . . . . . . . . 829--852
Yi Lei and
Shan Yang and
Xinsheng Wang and
Lei Xie MsEmoTTS: Multi-Scale Emotion Transfer,
Prediction, and Control for Emotional
Speech Synthesis . . . . . . . . . . . . 853--864
Tao Wang and
Ruibo Fu and
Jiangyan Yi and
Jianhua Tao and
Zhengqi Wen NeuralDPS: Neural Deterministic Plus
Stochastic Model With Multiband
Excitation for Noise-Controllable
Waveform Generation . . . . . . . . . . 865--878
Simon Stone and
Yingming Gao and
Peter Birkholz Articulatory Synthesis of Vocalized /r/
Allophones in German . . . . . . . . . . 879--889
Prashant Serai and
Vishal Sunder and
Eric Fosler-Lussier Hallucination of Speech Recognition
Errors With Sequence to Sequence
Learning . . . . . . . . . . . . . . . . 890--900
Bin Wu and
Sakriani Sakti and
Jinsong Zhang and
Satoshi Nakamura Modeling Unsupervised Empirical
Adaptation by DPGMM and DPGMM-RNN Hybrid
Model to Extract Perceptual Features for
Low-Resource ASR . . . . . . . . . . . . 901--916
Mi Zhang and
Tieyun Qian and
Bing Liu Exploit Feature and Relation Hierarchy
for Relation Extraction . . . . . . . . 917--930
Wenxiang Jiao and
Xing Wang and
Shilin He and
Zhaopeng Tu and
Irwin King and
Michael R. Lyu Exploiting Inactive Examples for Natural
Language Generation With Data
Rejuvenation . . . . . . . . . . . . . . 931--943
Youzhi Tu and
Man-Wai Mak Aggregating Frame-Level Information in
the Spectral Domain With Self-Attention
for Speaker Embedding . . . . . . . . . 944--957
Zhixing Tan and
Zeyuan Yang and
Meng Zhang and
Qun Liu and
Maosong Sun and
Yang Liu Dynamic Multi-Branch Layers for
On-Device Neural Machine Translation . . 958--967
Weiwei Lin and
Man-Wai Mak Mixture Representation Learning for Deep
Speaker Embedding . . . . . . . . . . . 968--978
Peng Zhu and
Dawei Cheng and
Fangzhou Yang and
Yifeng Luo and
Dingjiang Huang and
Weining Qian and
Aoying Zhou Improving Chinese Named Entity
Recognition by Large-Scale Syntactic
Dependency Graph . . . . . . . . . . . . 979--991
Xiaobo Liang and
Lijun Wu and
Juntao Li and
Tao Qin and
Min Zhang and
Tie-Yan Liu Multi-Teacher Distillation With Single
Model for Neural Machine Translation . . 992--1002
Xiaofeng Chen and
Guohua Wang and
Haopeng Ren and
Yi Cai and
Ho-fung Leung and
Tao Wang Task-Adaptive Feature Fusion for
Generalized Few-Shot Relation
Classification in an Open World
Environment . . . . . . . . . . . . . . 1003--1015
Yu-Chen Lin and
Cheng Yu and
Yi-Te Hsu and
Szu-Wei Fu and
Yu Tsao and
Tei-Wei Kuo SEOFP-NET: Compression and Acceleration
of Deep Neural Networks for Speech
Enhancement Using Sign-Exponent-Only
Floating-Points . . . . . . . . . . . . 1016--1031
Tomohiro Nakatani and
Rintaro Ikeshita and
Keisuke Kinoshita and
Hiroshi Sawada and
Naoyuki Kamo and
Shoko Araki Switching Independent Vector Analysis
and its Extension to Blind and Spatially
Guided Convolutional Beamforming
Algorithms . . . . . . . . . . . . . . . 1032--1047
Jianhua Geng and
Sifan Wang and
Qinglai Liu and
Xin Lou Multi-Level Time-Frequency Bins
Selection for Direction of Arrival
Estimation Using a Single Acoustic
Vector Sensor . . . . . . . . . . . . . 1048--1060
Qinzhuo Wu and
Qi Zhang and
Xuanjing Huang Automatic Math Word Problem Generation
With Topic-Expression Co-Attention
Mechanism and Reinforcement Learning . . 1061--1072
Michael Nigro and
Sridhar Krishnan Multimodal System for Audio Scene Source
Counting and Analysis . . . . . . . . . 1073--1082
Yishu Peng and
Sheng Zhang and
Jiashu Zhang and
Wei Xing Zheng Combined-Sample Multiband-Structured
Subband Filtering Algorithms . . . . . . 1083--1092
Shoukang Hu and
Xurong Xie and
Mingyu Cui and
Jiajun Deng and
Shansong Liu and
Jianwei Yu and
Mengzhe Geng and
Xunying Liu and
Helen Meng Neural Architecture Search for LF-MMI
Trained Time Delay Neural Networks . . . 1093--1107
Xudong Dang and
Wen Ma and
Emanuël A. P. Habets and
Hongyan Zhu TDOA-Based Robust Sound Source
Localization With Sparse Regularization
in Wireless Acoustic Sensor Networks . . 1108--1123
Shan Gao and
Jing Lin and
Xihong Wu and
Tianshu Qu Sparse DNN Model for Frequency Expanding
of Higher Order Ambisonics Encoding
Process . . . . . . . . . . . . . . . . 1124--1135
Giovanni Pepe and
Leonardo Gabrielli and
Stefano Squartini and
Carlo Tripodi and
Nicol\`o Strozzi Deep Optimization of Parametric IIR
Filters for Audio Equalization . . . . . 1136--1149
Moa Lee and
Junmo Lee and
Joon-Hyuk Chang Non-Autoregressive Fully Parallel Deep
Convolutional Neural Speech Synthesis 1150--1159
Liam Barrett and
Junchao Hu and
Peter Howell Systematic Review of Machine Learning
Approaches for Detecting Developmental
Stuttering . . . . . . . . . . . . . . . 1160--1172
Sang-Hoon Lee and
Hyeong-Rae Noh and
Woo-Jeoung Nam and
Seong-Whan Lee Duration Controllable Voice Conversion
via Phoneme-Based Information Bottleneck 1173--1183
Zhihong Shao and
Zhongqin Wu and
Minlie Huang AdvExpander: Generating Natural Language
Adversarial Examples by Expanding Text 1184--1196
Dhanunjaya Varma Devalraju and
Padmanabhan Rajan Multiview Embeddings for Soundscape
Classification . . . . . . . . . . . . . 1197--1206
Chengyu Wang and
Suyang Dai and
Yipeng Wang and
Fei Yang and
Minghui Qiu and
Kehan Chen and
Wei Zhou and
Jun Huang ARoBERT: an ASR Robust Pre-Trained
Language Model for Spoken Language
Understanding . . . . . . . . . . . . . 1207--1218
Jonah Ong and
Ba Tuong Vo and
Sven Nordholm and
Ba-Ngu Vo and
Diluka Moratuwage and
Changbeom Shim Audio-Visual Based Online Multi-Source
Separation . . . . . . . . . . . . . . . 1219--1234
Leyang Cui and
Yafu Li and
Yue Zhang Label Attention Network for Structured
Prediction . . . . . . . . . . . . . . . 1235--1248
Sarinah Sutojo and
Tobias May and
Steven van de Par Segmentation of Multitalker Mixtures
Based on Local Feature Contrasts and
Auditory Glimpses . . . . . . . . . . . 1249--1262
Hao Gao and
Xuelei Feng and
Yong Shen Weighted Loudspeaker Placement Method
for Sound Field Reproduction . . . . . . 1263--1276
Gongping Huang and
Jacob Benesty and
Israel Cohen and
Jingdong Chen Kronecker Product Multichannel Linear
Filtering for Adaptive Weighted
Prediction Error-Based Speech
Dereverberation . . . . . . . . . . . . 1277--1289
Takehiro Sugimoto Loudness-Level-Chasing Algorithm for
Multiformat Live Audio Production . . . 1290--1304
Junshuang Wu and
Richong Zhang and
Yongyi Mao and
Jinpeng Huai Dealing With Hierarchical Types and
Label Noise in Fine-Grained Entity
Typing . . . . . . . . . . . . . . . . . 1305--1318
Anton Ragni and
Mark J. F. Gales and
Oliver Rose and
Katherine M. Knill and
Alexandros Kastanos and
Qiujia Li and
Preben M. Ness Increasing Context for Estimating
Confidence Scores in Automatic Speech
Recognition . . . . . . . . . . . . . . 1319--1329
Zhongxin Bai and
Jianyu Wang and
Xiao-Lei Zhang and
Jingdong Chen End-to-End Speaker Verification via
Curriculum Bipartite Ranking Weighted
Binary Cross-Entropy . . . . . . . . . . 1330--1344
Shang-Yi Chuang and
Hsin-Min Wang and
Yu Tsao Improved Lite Audio-Visual Speech
Enhancement . . . . . . . . . . . . . . 1345--1359
Gaofeng Cheng and
Haoran Miao and
Runyan Yang and
Keqi Deng and
Yonghong Yan ETEH: Unified Attention-Based End-to-End
ASR and KWS Architecture . . . . . . . . 1360--1373
Ashutosh Pandey and
DeLiang Wang Self-Attending RNN for Speech
Enhancement to Improve Cross-Corpus
Generalization . . . . . . . . . . . . . 1374--1385
Di Jin and
Shuyang Gao and
Seokhwan Kim and
Yang Liu and
Dilek Hakkani-Tür Towards Textual Out-of-Domain Detection
Without In-Domain Labels . . . . . . . . 1386--1395
K. Mrinalini and
P. Vijayalakshmi and
T. Nagarajan SBSim: a Sentence-BERT Similarity-Based
Evaluation Metric for Indian Language
Neural Machine Translation Systems . . . 1396--1406
Changhong Wang and
Emmanouil Benetos and
Vincent Lostanlen and
Elaine Chew Adaptive Scattering Transforms for
Playing Technique Recognition . . . . . 1407--1421
Danwei Cai and
Weiqing Wang and
Ming Li Incorporating Visual Information in
Audio Based Self-Supervised Speaker
Recognition . . . . . . . . . . . . . . 1422--1435
Yu Luo and
Lina Pu EC-ANC: Edge Case-Enhanced Active Noise
Cancellation for True Wireless Stereo
Earbuds . . . . . . . . . . . . . . . . 1436--1447
Tao Li and
Xinsheng Wang and
Qicong Xie and
Zhichao Wang and
Lei Xie Cross-Speaker Emotion Disentangling and
Transfer for End-to-End Speech Synthesis 1448--1460
Yilin Zhao and
Zhuosheng Zhang and
Hai Zhao Reference Knowledgeable Network for
Machine Reading Comprehension . . . . . 1461--1473
Fu-Hao Yu and
Kuan-Yu Chen and
Ke-Han Lu Non-Autoregressive ASR Modeling Using
Pre-Trained Language Models for Chinese
Speech Recognition . . . . . . . . . . . 1474--1482
Yiming Cui and
Ting Liu and
Wanxiang Che and
Zhigang Chen and
Shijin Wang Teaching Machines to Read, Answer and
Explain . . . . . . . . . . . . . . . . 1483--1492
Shota Horiguchi and
Yusuke Fujita and
Shinji Watanabe and
Yawen Xue and
Paola García Encoder-Decoder Based Attractors for
End-to-End Neural Diarization . . . . . 1493--1507
Chenda Li and
Zhuo Chen and
Yanmin Qian Dual-Path Modeling With Memory Embedding
Model for Continuous Speech Separation 1508--1520
Yu Tong and
Jingzhi Guo and
Jizhe Zhou Separation Inference: a Unified
Framework for Word Segmentation in East
Asian Languages . . . . . . . . . . . . 1521--1530
Mrinmoy Bhattacharjee and
S. R. M. Prasanna and
Prithwijit Guha Clean vs. Overlapped Speech-Music
Detection Using Harmonic-Percussive
Features and Multi-Task Learning . . . . 1--10
Zhaojie Luo and
Shoufeng Lin and
Rui Liu and
Jun Baba and
Yuichiro Yoshikawa and
Hiroshi Ishiguro Decoupling Speaker-Independent Emotions
for Voice Conversion via Source-Filter
Networks . . . . . . . . . . . . . . . . 11--24
Jinchuan Tian and
Jianwei Yu and
Chao Weng and
Yuexian Zou and
Dong Yu Integrating Lattice-Free MMI Into
End-to-End Speech Recognition . . . . . 25--38
Ravi Shankar and
Hsi-Wei Hsieh and
Nicolas Charon and
Archana Venkataraman A Diffeomorphic Flow-Based Variational
Framework for Multi-Speaker Emotion
Conversion . . . . . . . . . . . . . . . 39--53
Ryandhimas E. Zezario and
Szu-Wei Fu and
Fei Chen and
Chiou-Shann Fuh and
Hsin-Min Wang and
Yu Tsao Deep Learning-Based Non-Intrusive
Multi-Objective Speech Assessment Model
With Cross-Domain Features . . . . . . . 54--70
Xiaoyi Qin and
Danwei Cai and
Ming Li Robust Multi-Channel Far-Field Speaker
Verification Under Different In-Domain
Data Availability Scenarios . . . . . . 71--85
Vikram C. Mathad and
Julie M. Liss and
Kathy Chapman and
Nancy Scherer and
Visar Berisha Consonant-Vowel Transition Models Based
on Deep Learning for Objective
Evaluation of Articulation . . . . . . . 86--95
Li Li and
Hirokazu Kameoka and
Shoji Makino FastMVAE2: On Improving and Accelerating
the Fast Variational Autoencoder-Based
Source Separation Algorithm for
Determined Mixtures . . . . . . . . . . 96--110
Jie Wang and
Yan Yang and
Keyu Liu and
Zhiping Zhu and
Xiaorong Liu M3S: Scene Graph Driven
Multi-Granularity Multi-Task Learning
for Multi-Modal NER . . . . . . . . . . 111--120
Marc Delcroix and
Jorge Bennasar Vazquez and
Tsubasa Ochiai and
Keisuke Kinoshita and
Yasunori Ohishi and
Shoko Araki SoundBeam: Target Sound Extraction
Conditioned on Sound-Class Labels and
Enrollment Clues for Increased
Performance and Continuous Learning . . 121--136
Daisuke Niizumi and
Daiki Takeuchi and
Yasunori Ohishi and
Noboru Harada and
Kunio Kashino BYOL for Audio: Exploring Pre-Trained
General-Purpose Audio Representations 137--151
Yingrui Xu and
Hao Liu and
Jingguo Ge and
Xiaodan Zhang and
Jingyuan Hu and
Yulei Wu and
Honglei Lv and
Hongbin Shi and
Wei Zhou Mining Weak Relations Between Reviews
for Opinion Spam Detection . . . . . . . 152--162
Yoshiki Masuyama and
Kohei Yatabe and
Kento Nagatomo and
Yasuhiro Oikawa Online Phase Reconstruction via
DNN-Based Phase Differences Estimation 163--176
Jiang Liu and
Donghong Ji and
Jingye Li and
Dongdong Xie and
Chong Teng and
Liang Zhao and
Fei Li TOE: a Grid-Tagging Discontinuous NER
Model Enhanced by Embedding Tag\slash
Word Relations and More Fine-Grained
Tags . . . . . . . . . . . . . . . . . . 177--187
Zhe Hu and
Zhiwei Cao and
Hou Pong Chan and
Jiachen Liu and
Xinyan Xiao and
Jinsong Su and
Hua Wu Controllable Dialogue Generation With
Disentangled Multi-Grained Style
Specification and Attribute Consistency
Reward . . . . . . . . . . . . . . . . . 188--199
Sondes Abderrazek and
Corinne Fredouille and
Alain Ghio and
Muriel Lalain and
Christine Meunier and
Virginie Woisard Interpreting Deep Representations of
Phonetic Features via Neuro-Based
Concept Detector: Application to Speech
Disorders Due to Head and Neck Cancer 200--214
Jie Zhang and
Rui Tao and
Jun Du and
Li-Rong Dai Energy-Efficient Sparsity-Driven Speech
Enhancement in Wireless Acoustic Sensor
Networks . . . . . . . . . . . . . . . . 215--228
Xianke Wang and
Bowen Tian and
Weiming Yang and
Wei Xu and
Wenqing Cheng MusicYOLO: a Vision-Based Framework for
Automatic Singing Transcription . . . . 229--241
Yuanyuan Liu and
Mittapalle Kiran Reddy and
Nelly Penttila and
Tiina Ihalainen and
Paavo Alku and
Okko Rasanen Automatic Assessment of
Parkinson's Disease Using Speech
Representations of Phonation and
Articulation . . . . . . . . . . . . . . 242--255
David Sudholt and
Alec Wright and
Cumhur Erkut and
Vesa Valimaki Pruning Deep Neural Network Models of
Guitar Distortion Effects . . . . . . . 256--264
Fangkai Jiao and
Yangyang Guo and
Minlie Huang and
Liqiang Nie Enhanced Multi-Domain Dialogue State
Tracker With Second-Order Slot
Interactions . . . . . . . . . . . . . . 265--276
Hui Tian and
Yiqin Qiu and
Wojciech Mazurczyk and
Haizhou Li and
Zhenxing Qian STFF-SM: Steganalysis Model Based on
Spatial and Temporal Feature Fusion for
Speech Streams . . . . . . . . . . . . . 277--289
Gopendra Vikram Singh and
Mauajama Firdaus and
Asif Ekbal and
Pushpak Bhattacharyya EmoInt-Trans: a Multimodal Transformer
for Identifying Emotions and Intents in
Social Conversations . . . . . . . . . . 290--300
De De Hu and
Huaiwen Zhang and
Feilong Bao and
Rui Wang Distributed Sampling Rate Offset
Estimation Over Acoustic Sensor Networks
Based on Asynchronous Network Newton
Optimization . . . . . . . . . . . . . . 301--312
David Diaz-Guerra and
Antonio Miguel and
Jose R. Beltran Direction of Arrival Estimation of Sound
Sources Using Icosahedral CNNs . . . . . 313--321
Peiming Guo and
Shen Huang and
Peijie Jiang and
Yueheng Sun and
Meishan Zhang and
Min Zhang Curriculum-Style Fine-Grained Adaption
for Unsupervised Cross-Lingual
Dependency Transfer . . . . . . . . . . 322--332
Naveen Kumar Desiraju and
Simon Doclo and
Markus Buck and
Tobias Wolff Joint Online Estimation of Early and
Late Residual Echo PSD for Residual Echo
Suppression . . . . . . . . . . . . . . 333--344
Guangzhi Sun and
Chao Zhang and
Philip C. Woodland Minimising Biasing Word Errors for
Contextual ASR With the Tree-Constrained
Pointer Generator . . . . . . . . . . . 345--354
Jonah Casebeer and
Nicholas J. Bryan and
Paris Smaragdis Meta-AF: Meta-Learning for Adaptive
Filters . . . . . . . . . . . . . . . . 355--370
Yingwen Fu and
Nankai Lin and
Boyu Chen and
Ziyu Yang and
Shengyi Jiang Cross-Lingual Named Entity Recognition
for Heterogeneous Languages . . . . . . 371--382
Jun-You Wang and
Jyh-Shing Roger Jang Training a Singing Transcription Model
Using Connectionist Temporal
Classification Loss and Cross-Entropy
Loss . . . . . . . . . . . . . . . . . . 383--396
Zhong-Qiu Wang and
Gordon Wichern and
Shinji Watanabe and
Jonathan Le Roux STFT-Domain Neural Speech Enhancement
With Very Low Algorithmic Latency . . . 397--410
Yu Li and
Bojie Hu and
Jian Liu and
Yufeng Chen and
Jinan Xu A Neighborhood Re-Ranking Model With
Relation Constraint for Knowledge Graph
Completion . . . . . . . . . . . . . . . 411--425
Alessio Miaschi and
Dominique Brunato and
Felice Dell'Orletta and
Giulia Venturi On Robustness and Sensitivity of a
Neural Language Model: a Case Study on
Italian L1 Learner Errors . . . . . . . 426--438
Rong Xiao and
Yu Wan and
Baosong Yang and
Haibo Zhang and
Huajin Tang and
Derek F. Wong and
Boxing Chen Towards Energy-Preserving Natural
Language Understanding With Spiking
Neural Networks . . . . . . . . . . . . 439--447
Juan Zhao and
Tianrui Zong and
Yong Xiang and
Longxiang Gao and
Guang Hua and
Keshav Sood and
Yushu Zhang SSVS-SSVD Based Desynchronization
Attacks Resilient Watermarking Method
for Stereo Signals . . . . . . . . . . . 448--461
Qiquan Zhang and
Xinyuan Qian and
Zhaoheng Ni and
Aaron Nicolson and
Eliathamby Ambikairajah and
Haizhou Li A Time-Frequency Attention Module for
Neural Speech Enhancement . . . . . . . 462--475
Binhong Xie and
Yu Li and
Hongyan Zhao and
Lihu Pan and
Enhui Wang A Cross-Attention Fusion Based Graph
Convolution Auto-Encoder for Open
Relation Extraction . . . . . . . . . . 476--485
Qian-Bei Hong and
Chung-Hsien Wu and
Hsin-Min Wang Generalization Ability Improvement of
Speaker Representation and
Anti-Interference for Speaker
Verification . . . . . . . . . . . . . . 486--499
Xinglin Lyu and
Junhui Li and
Min Zhang and
Chenchen Ding and
Hideki Tanaka and
Masao Utiyama Refining History for Future-Aware Neural
Machine Translation . . . . . . . . . . 500--512
Mou Wang and
Junqi Chen and
Xiao-Lei Zhang and
Susanto Rahardja End-to-End Multi-Modal Speech
Recognition on an Air and Bone Conducted
Speech Corpus . . . . . . . . . . . . . 513--524
Asier López Zorrilla and
María Inés Torres and
Heriberto Cuayáhuitl Audio Embedding-Aware Dialogue Policy
Learning . . . . . . . . . . . . . . . . 525--538
Xichen Shang and
Chuxin Chen and
Zipeng Chen and
Qianli Ma Modularized Mutuality Network for
Emotion-Cause Pair Extraction . . . . . 539--549
Xinyuan Qian and
Zhengdong Wang and
Jiadong Wang and
Guohui Guan and
Haizhou Li Audio-Visual Cross-Attention Network for
Robotic Speaker Tracking . . . . . . . . 550--562
Kristina Tesch and
Timo Gerkmann Insights Into Deep Non-Linear Filters
for Improved Multi-Channel Speech
Enhancement . . . . . . . . . . . . . . 563--575
Thilo von Neumann and
Keisuke Kinoshita and
Christoph Boeddeker and
Marc Delcroix and
Reinhold Haeb-Umbach Segment-Less Continuous Speech
Separation of Meetings: Training and
Evaluation Criteria . . . . . . . . . . 576--589
Davide Albertini and
Alberto Bernardini and
Federico Borra and
Fabio Antonacci and
Augusto Sarti Two-Stage Beamforming With Arbitrary
Planar Arrays of Differential Microphone
Array Units . . . . . . . . . . . . . . 590--602
Yi-Syuan Chen and
Yun-Zhu Song and
Hong-Han Shuai SPEC: Summary Preference Decomposition
for Low-Resource Abstractive
Summarization . . . . . . . . . . . . . 603--618
Yingying Xiao and
Shanmou Chen and
Qiangqiang Zhang and
Dongyuan Lin and
Minglin Shen and
Junhui Qian and
Shiyuan Wang Generalized Hyperbolic Tangent Based
Random Fourier Conjugate Gradient Filter
for Nonlinear Active Noise Control . . . 619--632
Jun Qi and
Chao-Han Huck Yang and
Pin-Yu Chen and
Javier Tejedor Exploiting Low-Rank Tensor-Train Deep
Neural Networks Based on Riemannian
Gradient Descent With Illustrations of
Speech Processing . . . . . . . . . . . 633--642
Bin Gu and
Wu Guo and
Jie Zhang Memory Storable Network Based Feature
Aggregation for Speaker Representation
Learning . . . . . . . . . . . . . . . . 643--655
Takumi Abe and
Shoichi Koyama and
Natsuki Ueno and
Hiroshi Saruwatari Amplitude Matching for Multizone Sound
Field Control . . . . . . . . . . . . . 656--669
Mahdi Barhoush and
Ahmed Hallawa and
Arne Peine and
Lukas Martin and
Anke Schmeink Localization-Driven Speech Enhancement
in Noisy Multi-Speaker Hospital
Environments Using Deep Learning and
Meta Learning . . . . . . . . . . . . . 670--683
Herman Kamper Word Segmentation on Discovered Phone
Units With Dynamic Programming and
Self-Supervised Scoring . . . . . . . . 684--694
Changheng Li and
Jorge Martinez and
Richard Christian Hendriks Joint Maximum Likelihood Estimation of
Microphone Array Parameters for a
Reverberant Single Source Scenario . . . 695--705
Shota Horiguchi and
Shinji Watanabe and
Paola García and
Yuki Takashima and
Yohei Kawaguchi Online Neural Diarization of Unlimited
Numbers of Speakers Using Global and
Local Attractors . . . . . . . . . . . . 706--720
Ling He and
Jia Fu and
Yuanyuan Li and
Xi Xiong and
Jing Zhang WNSA-Net: an Axial-Attention-Based
Network for Schizophrenia Detection
Using Wideband and Narrowband
Spectrograms . . . . . . . . . . . . . . 721--733
Anusha Prakash and
Hema A. Murthy Exploring the Role of Language Families
for Building Indic Speech Synthesisers 734--747
Mahdin Rohmatillah and
Jen-Tzung Chien Hierarchical Reinforcement Learning With
Guidance for Multi-Domain Dialogue
Policy . . . . . . . . . . . . . . . . . 748--761
Shahram Ghorbani and
John H. L. Hansen Domain Expansion for End-to-End Speech
Recognition: Applications for
Accent\slash Dialect Speech . . . . . . 762--774
Weidong Chen and
Xiaofen Xing and
Xiangmin Xu and
Jianxin Pang and
Lan Du SpeechFormer++: a Hierarchical Efficient
Framework for Paralinguistic Speech
Processing . . . . . . . . . . . . . . . 775--788
Nicki Holighaus and
Günther Koliander and
Clara Hollomey and
Friedrich Pillichshammer Grid-Based Decimation for Wavelet
Transforms With Stably Invertible
Implementation . . . . . . . . . . . . . 789--801
Weiwei Lin and
Man-Wai Mak Robust Speaker Verification Using Deep
Weight Space Ensemble . . . . . . . . . 802--812
Lin Zhang and
Xin Wang and
Erica Cooper and
Nicholas Evans and
Junichi Yamagishi The PartialSpoof Database and
Countermeasures for the Detection of
Short Fake Speech Segments Embedded in
an Utterance . . . . . . . . . . . . . . 813--825
Jie Mei and
Yufan Wang and
Xinhui Tu and
Ming Dong and
Tingting He Incorporating BERT With
Probability-Aware Gate for Spoken
Language Understanding . . . . . . . . . 826--834
Tsubasa Ochiai and
Marc Delcroix and
Tomohiro Nakatani and
Shoko Araki Mask-Based Neural Beamforming for Moving
Speakers With Self-Attention-Based
Tracking . . . . . . . . . . . . . . . . 835--848
Rongzhi Gu and
Shi-Xiong Zhang and
Yuexian Zou and
Dong Yu Towards Unified All-Neural Beamforming
for Time and Frequency Domain Speech
Separation . . . . . . . . . . . . . . . 849--862
Naotake Masuda and
Daisuke Saito Improving Semi-Supervised Differentiable
Synthesizer Sound Matching for Practical
Applications . . . . . . . . . . . . . . 863--875
Erfan Loweimi and
Zhengjun Yue and
Peter Bell and
Steve Renals and
Zoran Cvetkovic Multi-Stream Acoustic Modelling Using
Raw Real and Imaginary Parts of the
Fourier Transform . . . . . . . . . . . 876--890
Bengt J. Borgström A Generative Approach to Condition-Aware
Score Calibration for Speaker
Verification . . . . . . . . . . . . . . 891--901
Irene Martín-Morató and
Annamaria Mesaros Strong Labeling of Sound Events Using
Crowdsourced Weak Labels and Annotator
Competence Estimation . . . . . . . . . 902--914
Wenzhao Zhu and
Lei Luo and
Jinwei Sun and
Mads Græsbòll Christensen A New Virtual Tracking Sub-Algorithm
Based Hybrid Active Control System for
Narrowband Noise With Impulsive
Interference . . . . . . . . . . . . . . 915--926
Thomas Deppisch and
Sebasti\`a V. Amengual Garí and
Paul Calamia and
Jens Ahrens Direct and Residual Subspace
Decomposition of Spatial Room Impulse
Responses . . . . . . . . . . . . . . . 927--942
Eloi Moliner and
Vesa Välimäki BEHM-GAN: Bandwidth Extension of
Historical Music Using Generative
Adversarial Networks . . . . . . . . . . 943--956
Martin Jälmby and
Filip Elvander and
Toon van Waterschoot Low-Rank Room Impulse Response
Estimation . . . . . . . . . . . . . . . 957--969
Hong Liu and
Yucheng Cai and
Zhenru Lin and
Zhijian Ou and
Yi Huang and
Junlan Feng Variational Latent-State GPT for
Semi-Supervised Task-Oriented Dialog
Systems . . . . . . . . . . . . . . . . 970--984
De Hu and
Qintuya Si and
Rui Liu and
Feilong Bao Distributed Sensor Selection for Speech
Enhancement With Acoustic Sensor
Networks . . . . . . . . . . . . . . . . 985--999
Yingke Zhu and
Brian Mak Bayesian Self-Attentive Speaker
Embeddings for Text-Independent Speaker
Verification . . . . . . . . . . . . . . 1000--1012
Yuying Li and
Yuchen Liu and
Donald S. Williamson A Composite T60 Regression and
Classification Approach for Speech
Dereverberation . . . . . . . . . . . . 1013--1023
Hanyi Zhang and
Longbiao Wang and
Kong Aik Lee and
Meng Liu and
Jianwu Dang and
Helen Meng Meta-Generalization for Domain-Invariant
Speaker Verification . . . . . . . . . . 1024--1036
Shu-Tong Niu and
Jun Du and
Lei Sun and
Yu Hu and
Chin-Hui Lee QDM-SSD: Quality-Aware Dynamic Masking
for Separation-Based Speaker Diarization 1037--1049
Boyang Lyu and
Chunxiao Fan and
Yue Ming and
Panzi Zhao and
Nannan Hu En-HACN: Enhancing Hybrid Architecture
With Fast Attention and Capsule Network
for End-to-end Speech Recognition . . . 1050--1062
Yang Liu and
Haoqin Sun and
Wenbo Guan and
Yuqi Xia and
Yongwei Li and
Masashi Unoki and
Zhen Zhao A Discriminative Feature Representation
Method Based on Cascaded Attention
Network With Adversarial Strategy for
Speech Emotion Recognition . . . . . . . 1063--1074
Hao Zhang and
Nianwen Si and
Yaqi Chen and
Wenlin Zhang and
Xukui Yang and
Dan Qu and
Wei-Qiang Zhang Improving Speech Translation by
Cross-Modal Multi-Grained Contrastive
Learning . . . . . . . . . . . . . . . . 1075--1086
Wei-Cheng Lin and
Carlos Busso Sequential Modeling by Leveraging
Non-Uniform Distribution of Speech
Emotion . . . . . . . . . . . . . . . . 1087--1099
Achyut Mani Tripathi and
Om Jee Pandey Divide and Distill: New Outlooks on
Knowledge Distillation for Environmental
Sound Classification . . . . . . . . . . 1100--1113
Hao Zhang and
Ashutosh Pandey and
De Liang Wang Low-Latency Active Noise Control Using
Attentive Recurrent Network . . . . . . 1114--1123
Avital Bross and
Sharon Gannot Training-Based Multiple Source Tracking
Using Manifold-Learning and Recursive
Expectation-Maximization . . . . . . . . 1124--1140
Guimin Hu and
Yi Zhao and
Guangming Lu Emotion Prediction Oriented Method With
Multiple Supervisions for Emotion-Cause
Pair Extraction . . . . . . . . . . . . 1141--1152
Reza Mohsenipour and
Daniel Massicotte and
Wei-Ping Zhu PI Control of Loudspeakers Based on
Linear Fractional Order Model . . . . . 1153--1162
Tim Lübeck and
Johannes M. Arend and
Christoph Pörschmann Spatial Upsampling of Sparse Spherical
Microphone Array Signals . . . . . . . . 1163--1174
Jiajun Deng and
Xurong Xie and
Tianzi Wang and
Mingyu Cui and
Boyang Xue and
Zengrui Jin and
Guinan Li and
Shujie Hu and
Xunying Liu Confidence Score Based Speaker
Adaptation of Conformer Speech
Recognition Systems . . . . . . . . . . 1175--1190
Hongsheng Zhang and
Jizhang Gan and
Ting Liu and
Kui Huang and
Hong Yang Coefficients-Switched Normalized
Least-Mean- Squares Adaption in Echo
Canceler of Sparse-Echo-Path . . . . . . 1191--1199
Eric Guizzo and
Tillman Weyde and
Simone Scardapane and
Danilo Comminiello Learning Speech Emotion Representations
in the Quaternion Domain . . . . . . . . 1200--1212
Jiaqi Bai and
Ze Yang and
Jian Yang and
Hongcheng Guo and
Zhoujun Li KINet: Incorporating Relevant Facts Into
Knowledge-Grounded Dialog Generation . . 1213--1222
Haiquan Zhao and
Yuan Gao and
Yingying Zhu Robust Subband Adaptive Filter
Algorithms-Based Mixture Correntropy and
Application to Acoustic Echo
Cancellation . . . . . . . . . . . . . . 1223--1233
Chen Zhang and
Luis Fernando D'Haro and
Qiquan Zhang and
Thomas Friedrichs and
Haizhou Li PoE: a Panel of Experts for Generalized
Automatic Dialogue Assessment . . . . . 1234--1250
Qing Wang and
Jun Du and
Hua-Xin Wu and
Jia Pan and
Feng Ma and
Chin-Hui Lee A Four-Stage Data Augmentation Approach
to ResNet-Conformer Based Acoustic
Modeling for Sound Event Localization
and Detection . . . . . . . . . . . . . 1251--1264
Yingwen Fu and
Nankai Lin and
Xiaohui Yu and
Shengyi Jiang Self-Training With Double Selectors for
Low-Resource Named Entity Recognition 1265--1275
Kilian Schulze-Forster and
Gaël Richard and
Liam Kelley and
Clement S. J. Doire and
Roland Badeau Unsupervised Music Source Separation
Using Differentiable Parametric Source
Models . . . . . . . . . . . . . . . . . 1276--1289
Yinggang Liu and
Hong Fu and
Ying Wei and
Hanbing Zhang Sound Event Classification Based on
Frequency-Energy Feature Representation
and Two-Stage Data Dimension Reduction 1290--1304
Ege Erdem and
Zoran Cvetkovi\'c and
Hüseyin Hacìhabibo\uglu $3$D Perceptual Soundfield
Reconstruction via Virtual Microphone
Synthesis . . . . . . . . . . . . . . . 1305--1317
Dongyuan Shi and
Woon-Seng Gan and
Bhan Lam and
Xiaoyi Shen A Frequency-Domain Output-Constrained
Active Noise Control Algorithm Based on
an Intuitive Circulant Convolutional
Penalty Factor . . . . . . . . . . . . . 1318--1332
Muhammed Zahid Ozturk and
Chenshu Wu and
Beibei Wang and
Min Wu and
K. J. Ray Liu RadioSES: mmWave-Based Audioradio Speech
Enhancement and Separation System . . . 1333--1347
Jianwei Zhang and
Julie Liss and
Suren Jayasuriya and
Visar Berisha Robust Vocal Quality Feature Embeddings
for Dysphonic Voice Detection . . . . . 1348--1359
Ashutosh Pandey and
DeLiang Wang Attentive Training: a New Training
Framework for Speech Enhancement . . . . 1360--1370
Hirofumi Inaguma and
Tatsuya Kawahara Alignment Knowledge Distillation for
Online Streaming Attention-Based Speech
Recognition . . . . . . . . . . . . . . 1371--1385
Mittapalle Kiran Reddy and
Paavo Alku Exemplar-Based Sparse Representations
for Detection of Parkinson's Disease
From Speech . . . . . . . . . . . . . . 1386--1396
Shunsuke Kita and
Yoshinobu Kajikawa Sound Source Localization Inside a
Structure Under Semi-Supervised
Conditions . . . . . . . . . . . . . . . 1397--1408
Guowei Wu and
Shipei Liu and
Xiaoya Fan The Power of Fragmentation: a
Hierarchical Transformer Model for
Structural Segmentation in Symbolic
Music Generation . . . . . . . . . . . . 1409--1420
Xueqin Luo and
Gongping Huang and
Jilu Jin and
Jingdong Chen and
Jacob Benesty and
Wen Zhang and
Mengyao Zhu and
Chunjian Li Design of Maximum Directivity
Beamformers With Linear Acoustic Vector
Sensor Arrays . . . . . . . . . . . . . 1421--1435
Ruchao Fan and
Wei Chu and
Peng Chang and
Abeer Alwan A CTC Alignment-Based Non-Autoregressive
Transformer for End-to-End Automatic
Speech Recognition . . . . . . . . . . . 1436--1448
Tianyou Li and
Siyuan Lian and
Sipei Zhao and
Jing Lu and
Ian S. Burnett Distributed Active Noise Control Based
on an Augmented Diffusion FxLMS
Algorithm . . . . . . . . . . . . . . . 1449--1463
Jiayuan Xie and
Wenhao Fang and
Qingbao Huang and
Yi Cai and
Tao Wang Enhancing Paraphrase Question Generation
With Prior Knowledge . . . . . . . . . . 1464--1475
Chen Chen and
Hansheng Hong and
Jie Guo and
Bin Song Inter- Intra Modal Representation
Augmentation With Trimodal Collaborative
Disentanglement Network for Multimodal
Sentiment Analysis . . . . . . . . . . . 1476--1488
Jian Yang and
Yuwei Yin and
Liqun Yang and
Shuming Ma and
Haoyang Huang and
Dongdong Zhang and
Furu Wei and
Zhoujun Li GTrans: Grouping and Fusing Transformer
Layers for Neural Machine Translation 1489--1498
Xin Wu and
Yi Cai and
Zetao Lian and
Ho-fung Leung and
Tao Wang Generating Natural Language From Logic
Expressions With Structural
Representation . . . . . . . . . . . . . 1499--1510
Yi Li and
Yang Sun and
Wenwu Wang and
Syed Mohsen Naqvi U-Shaped Transformer With Frequency-Band
Aware Attention for Speech Enhancement 1511--1521
Christian Antoñanzas and
Miguel Ferrer and
Maria de Diego and
Alberto Gonzalez Remote Microphone Technique for Active
Noise Control Over Distributed Networks 1522--1535
Yi Zhu and
Abhishek Tiwari and
João Monteiro and
Shruti Kshirsagar and
Tiago Henrique Falk COVID-19 Detection via Fusion of
Modulation Spectrum and Linear
Prediction Speech Features . . . . . . . 1536--1549
Jijie Li and
Kai Shuang and
Jinyu Guo and
Zengyi Shi and
Hongman Wang Enhancing Semantic Relation
Classification With Shortest Dependency
Path Reasoning . . . . . . . . . . . . . 1550--1560
Mao-Kui He and
Jun Du and
Qing-Feng Liu and
Chin-Hui Lee ANSD-MA-MSE: Adaptive Neural Speaker
Diarization Using Memory-Aware
Multi-Speaker Embedding . . . . . . . . 1561--1573
Longting Xu and
Jichen Yang and
Chang Huai You and
Xinyuan Qian and
Daiyu Huang Device Features Based on Linear
Transformation With Parallel Training
Data for Replay Speech Detection . . . . 1574--1586
Huajian Fang and
Dennis Becker and
Stefan Wermter and
Timo Gerkmann Integrating Uncertainty Into Neural
Network-Based Speech Enhancement . . . . 1587--1600
Libo Qin and
Xiao Xu and
Lehan Wang and
Yue Zhang and
Wanxiang Che Modularized Pre-Training for End-to-End
Task-Oriented Dialogue . . . . . . . . . 1601--1610
Hanlei Zhang and
Hua Xu and
Shaojie Zhao and
Qianrui Zhou Learning Discriminative Representations
and Decision Boundaries for Open Intent
Detection . . . . . . . . . . . . . . . 1611--1623
Guangsheng Bao and
Yue Zhang A General Contextualized Rewriting
Framework for Text Summarization . . . . 1624--1635
Christoph Kirsch and
Stephan D. Ewert A Universal Filter Approximation of Edge
Diffraction for Geometrical Acoustics 1636--1651
Peyman Goli and
Steven van de Par Deep Learning-Based Speech Specific
Source Localization by Using Binaural
and Monaural Microphone Arrays in
Hearing Aids . . . . . . . . . . . . . . 1652--1666
Nguyen Binh Thien and
Yukoh Wakabayashi and
Kenta Iwai and
Takanobu Nishiura Inter-Frequency Phase Difference for
Phase Reconstruction Using Deep Neural
Networks and Maximum Likelihood . . . . 1667--1680
Srikanth Raj Chetupalli and
Emanuël A. P. Habets Speaker Counting and Separation From
Single-Channel Noisy Mixtures . . . . . 1681--1692
Guangyan Zhang and
Ying Qin and
Wenjie Zhang and
Jialun Wu and
Mei Li and
Yutao Gai and
Feijun Jiang and
Tan Lee iEmoTTS: Toward Robust Cross-Speaker
Emotion Transfer and Control for Speech
Synthesis Based on Disentanglement
Between Prosody and Timbre . . . . . . . 1693--1705
Ruijie Tao and
Kong Aik Lee and
Rohan Kumar Das and
Ville Hautamäki and
Haizhou Li Self-Supervised Training of Speaker
Encoder With Multi-Modal Diverse
Positive Pairs . . . . . . . . . . . . . 1706--1719
Dongchao Yang and
Jianwei Yu and
Helin Wang and
Wen Wang and
Chao Weng and
Yuexian Zou and
Dong Yu Diffsound: Discrete Diffusion Model for
Text-to-Sound Generation . . . . . . . . 1720--1733
Paul Konstantin Krug and
Peter Birkholz and
Branislav Gerazov and
Daniel Rudolph van Niekerk and
Anqi Xu and
Yi Xu Artificial Vocal Learning Guided by
Phoneme Recognition and Visual
Information . . . . . . . . . . . . . . 1734--1744
Qian-Bei Hong and
Chung-Hsien Wu and
Hsin-Min Wang Decomposition and Reorganization of
Phonetic Information for Speaker
Embedding Learning . . . . . . . . . . . 1745--1757
Wenbin Jiang and
Kai Yu Speech Enhancement With Integration of
Neural Homomorphic Synthesis and
Spectral Masking . . . . . . . . . . . . 1758--1770
Shu'ang Li and
Xuming Hu and
Li Lin and
Aiwei Liu and
Lijie Wen and
Philip S. Yu A Multi-Level Supervised Contrastive
Learning Framework for Low-Resource
Natural Language Inference . . . . . . . 1771--1783
Xiaoqing Zheng Building Conventional ``Experts'' With a
Dialogue Logic Programming Language . . 1784--1796
Haitao Lin and
Junnan Zhu and
Lu Xiang and
Feifei Zhai and
Yu Zhou and
Jiajun Zhang and
Chengqing Zong Topic-Oriented Dialogue Summarization 1797--1810
Haohan Guo and
Fenglong Xie and
Xixin Wu and
Frank K. Soong and
Helen Meng MSMC-TTS: Multi-Stage Multi-Codebook
VQ-VAE Based Neural TTS . . . . . . . . 1811--1824
Bei Liu and
Zhengyang Chen and
Yanmin Qian Depth-First Neural Architecture With
Attentive Feature Fusion for Efficient
Speaker Verification . . . . . . . . . . 1825--1838
Ria Ghosh and
John H. L. Hansen Bilateral Cochlear Implant Processing of
Coding Strategies With CCi-MOBILE, an
Open-Source Research Platform . . . . . 1839--1850
Aolong Zhou and
Wen Zhang and
Guojun Xu and
Xiaoyong Li and
Kefeng Deng and
Junqiang Song DBSA-Net: Dual Branch Self-Attention
Network for Underwater Acoustic Signal
Denoising . . . . . . . . . . . . . . . 1851--1865
Weiwei Lin and
Man-Wai Mak Model-Agnostic Meta-Learning for Fast
Text-Dependent Speaker Embedding
Adaptation . . . . . . . . . . . . . . . 1866--1876
Andrea Galassi and
Marco Lippi and
Paolo Torroni Multi-Task Attentive Residual Networks
for Argument Mining . . . . . . . . . . 1877--1892
Yi Luo and
Jianwei Yu Music Source Separation With Band-Split
RNN . . . . . . . . . . . . . . . . . . 1893--1901
Keisuke Matsubara and
Takuma Okamoto and
Ryoichi Takashima and
Tetsuya Takiguchi and
Tomoki Toda and
Hisashi Kawai Harmonic-Net: Fundamental Frequency and
Speech Rate Controllable Fast Neural
Vocoder . . . . . . . . . . . . . . . . 1902--1915
Yi Zhou and
Zhizheng Wu and
Xiaohai Tian and
Haizhou Li Optimization of Cross-Lingual Voice
Conversion With Linguistics Losses to
Reduce Foreign Accents . . . . . . . . . 1916--1926
Qiu-Shi Zhu and
Jie Zhang and
Zi-Qiang Zhang and
Li-Rong Dai A Joint Speech Enhancement and
Self-Supervised Representation Learning
Framework for Noise-Robust Speech
Recognition . . . . . . . . . . . . . . 1927--1939
Siqi Sun and
Korin Richmond and
Hao Tang Improving Seq2Seq TTS Frontends With
Transcribed Speech Audio . . . . . . . . 1940--1952
Shih-Lun Wu and
Yi-Hsuan Yang MuseMorphose: Full-Song and Fine-Grained
Piano Music Style Transfer With One
Transformer VAE . . . . . . . . . . . . 1953--1967
Xiaoxue Gao and
Chitralekha Gupta and
Haizhou Li PoLyScriber: Integrated Fine-Tuning of
Extractor and Lyrics Transcriber for
Polyphonic Music . . . . . . . . . . . . 1968--1981
Zhicheng Lian and
Haonan Cheng and
Jiawan Zhang PQG-A2SA: Performance Quantification
Guided Audio-to-Score Alignment for
Orchestral Music . . . . . . . . . . . . 1982--1992
Jingen Ni and
Ningning Zhang and
Haofen Li Sparsity-Promoting Affine Projection
Algorithm With Periodically-Updated Gain
Matrix and Its Performance Analysis . . 1993--2003
Orchisama Das and
Sebastian J. Schlecht and
Enzo De Sena Grouped Feedback Delay Networks With
Frequency-Dependent Coupling . . . . . . 2004--2015
Xudong Zhao and
Gongping Huang and
Jingdong Chen and
Jacob Benesty Design of $2$D and $3$D Differential
Microphone Arrays With a Multistage
Framework . . . . . . . . . . . . . . . 2016--2031
Jia-Hao Hsu and
Jeremy Chang and
Min-Hsueh Kuo and
Chung-Hsien Wu Empathetic Response Generation Based on
Plug-and-Play Mechanism With Empathy
Perturbation . . . . . . . . . . . . . . 2032--2042
Aditya Dutt and
Paul Gader Wavelet Multiresolution Analysis Based
Speech Emotion Recognition System Using
$1$D CNN LSTM Networks . . . . . . . . . 2043--2054
Arturo Morales and
Juan I. Yuz and
Juan P. Cortés and
Javier G. Fontanet and
Matías Zañartu Glottal Airflow Estimation Using Neck
Surface Acceleration and Low-Order
Kalman Smoothing . . . . . . . . . . . . 2055--2066
Yuya Hosoda and
Arata Kawamura and
Youji Iiguni Complex-Domain Pitch Estimation
Algorithm for Narrowband Speech Signals 2067--2078
Zhidong Liu and
Junhui Li and
Muhua Zhu Alleviating Exposure Bias for Neural
Machine Translation via Contextual
Augmentation and Self Distillation . . . 2079--2089
Hanan Beit-On and
Tom Shlomo and
Boaz Rafaely Weighted Frequency Smoothing for
Enhanced Speaker Localization . . . . . 2090--2099
Shan Gao and
Xihong Wu and
Tianshu Qu A Physical Model-Based Self-Supervised
Learning Method for Signal Enhancement
Under Reverberant Environment . . . . . 2100--2110
Xue Jiang and
Xiulian Peng and
Huaying Xue and
Yuan Zhang and
Yan Lu Latent-Domain Predictive Neural Speech
Coding . . . . . . . . . . . . . . . . . 2111--2123
Shumin Deng and
Jiacheng Yang and
Hongbin Ye and
Chuanqi Tan and
Mosha Chen and
Songfang Huang and
Fei Huang and
Huajun Chen and
Ningyu Zhang LOGEN: Few-Shot Logical
Knowledge-Conditioned Text Generation
With Self-Training . . . . . . . . . . . 2124--2133
Yuanzhi Liu and
Min He and
Qingqing Yang and
Gwanggil Jeon An Unsupervised Framework With Attention
Mechanism and Embedding Perturbed
Encoder for Non-Parallel Text Sentiment
Style Transfer . . . . . . . . . . . . . 2134--2144
Yang Ai and
Zhen-Hua Ling APNet: an All-Frame-Level Neural Vocoder
Incorporating Direct Prediction of
Amplitude and Phase Spectra . . . . . . 2145--2157
Fei Zhao and
Zhen Wu and
Liang He and
Xin-Yu Dai Label-Correction Capsule Network for
Hierarchical Text Classification . . . . 2158--2168
Cem Subakan and
Mirco Ravanelli and
Samuele Cornell and
François Grondin and
Mirko Bronzi Exploring Self-Attention Mechanisms for
Speech Separation . . . . . . . . . . . 2169--2180
Chenggang Zhang and
Jinjiang Liu and
Hao Li and
Xueliang Zhang Neural Multi-Channel and
Multi-Microphone Acoustic Echo
Cancellation . . . . . . . . . . . . . . 2181--2192
Zheng Liu and
Xin Kang and
Fuji Ren Dual-TBNet: Improving the Robustness of
Speech Features via
Dual-Transformer-BiLSTM for Speech
Emotion Recognition . . . . . . . . . . 2193--2203
Sandro Cumani and
Salvatore Sarni The Distributions of Uncalibrated
Speaker Verification Scores: a
Generative Model for Domain Mismatch and
Trial-Dependent Calibration . . . . . . 2204--2219
Xi Ai and
Bin Fang Cross-Modal Language Modeling in
Multi-Motion-Informed Context for Lip
Reading . . . . . . . . . . . . . . . . 2220--2232
Andreas Jonas Fuglsig and
Jesper Jensen and
Zheng-Hua Tan and
Lars Sòndergaard Bertelsen and
Jens Christian Lindof and
Jan Òstergaard Minimum Processing Near-End Listening
Enhancement . . . . . . . . . . . . . . 2233--2245
Zhiwen Xie and
Runjie Zhu and
Jin Liu and
Guangyou Zhou and
Jimmy Xiangji Huang TARGAT: a Time-Aware Relational Graph
Attention Model for Temporal Knowledge
Graph Embedding . . . . . . . . . . . . 2246--2258
Cuilian Zhang and
Derek F. Wong and
Eddy S. K. Lei and
Runzhe Zhan and
Lidia S. Chao Obscurity-Quantified Curriculum Learning
for Machine Translation Evaluation . . . 2259--2271
Yaxin Liu and
Yan Zhou and
Ziming Li and
Junlin Wang and
Wei Zhou and
Songlin Hu HIM: an End-to-End Hierarchical
Interaction Model for Aspect Sentiment
Triplet Extraction . . . . . . . . . . . 2272--2285
Yukoh Wakabayashi and
Kouei Yamaoka and
Nobutaka Ono Sound Field Interpolation for
Rotation-Invariant Multichannel Array
Signal Processing . . . . . . . . . . . 2286--2298
Jesper Kjær Nielsen and
Mads Græsbòll Christensen and
Jesper Bünsow Boldt An Analysis of Traditional Noise Power
Spectral Density Estimators Based on the
Gaussian Stochastic Volatility Model . . 2299--2313
Karen Gissell Rosero Jacome and
Felipe Leonel Grijalva and
Bruno Sanches Masiero Sound Events Localization and Detection
Using Bio-Inspired Gammatone Filters and
Temporal Convolutional Neural Networks 2314--2324
Lin Yuan and
Guoheng Huang and
Fenghuan Li and
Xiaochen Yuan and
Chi-Man Pun and
Guo Zhong RBA-GCN: Relational Bilevel Aggregation
Graph Convolutional Network for Emotion
Recognition . . . . . . . . . . . . . . 2325--2337
Samuel Poirot and
Stefan Bilbao and
Mitsuko Aramaki and
Sòlvi Ystad and
Richard Kronland-Martinet A Perceptually Evaluated Signal Model:
Collisions Between a Vibrating Object
and an Obstacle . . . . . . . . . . . . 2338--2350
Julius Richter and
Simon Welker and
Jean-Marie Lemercier and
Bunlong Lay and
Timo Gerkmann Speech Enhancement and Dereverberation
With Diffusion-Based Generative Models 2351--2364
Siarhei Y. Barysenka and
Vasili I. Vorobiov SNR-Based Inter-Component Phase
Estimation Using Bi-Phase Prior
Statistics for Single-Channel Speech
Enhancement . . . . . . . . . . . . . . 2365--2381
Jiandian Zeng and
Jiantao Zhou and
Caishi Huang Exploring Semantic Relations for Social
Media Sentiment Analysis . . . . . . . . 2382--2394
Fotios Drakopoulos and
Sarah Verhulst A Neural-Network Framework for the
Design of Individualised Hearing-Loss
Compensation . . . . . . . . . . . . . . 2395--2409
Xinbei Ma and
Zhuosheng Zhang and
Hai Zhao Enhanced Speaker-Aware Multi-Party
Multi-Turn Dialogue Comprehension . . . 2410--2423
Tianrui Wang and
Weibin Zhu and
Yingying Gao and
Shilei Zhang and
Junlan Feng Harmonic Attention for Monaural Speech
Enhancement . . . . . . . . . . . . . . 2424--2436
Lei Lei and
Guoshun Yuan and
Hongjiang Yu and
Dewei Kong and
Yuefeng He Multilingual Customized Keyword Spotting
Using Similar-Pair Contrastive Learning 2437--2447
Shaokai Li and
Peng Song and
Wenming Zheng Multi-Source Discriminant Subspace
Alignment for Cross-Domain Speech
Emotion Recognition . . . . . . . . . . 2448--2460
Yeqing Ren and
Haipeng Peng and
Lixiang Li and
Xiaopeng Xue and
Yang Lan and
Yixian Yang Generalized Voice Spoofing Detection via
Integral Knowledge Amalgamation . . . . 2461--2475
Xing Chen and
Jie Wang and
Xiao-Lei Zhang and
Wei-Qiang Zhang and
Kunde Yang LMD: a Learnable Mask Network to Detect
Adversarial Examples for Speaker
Verification . . . . . . . . . . . . . . 2476--2490
Benjamin Yen and
Yameizhen Li and
Yusuke Hioka Rotor Noise-Aware Noise Covariance
Matrix Estimation for Unmanned Aerial
Vehicle Audition . . . . . . . . . . . . 2491--2506
Xuechen Liu and
Xin Wang and
Md Sahidullah and
Jose Patino and
Héctor Delgado and
Tomi Kinnunen and
Massimiliano Todisco and
Junichi Yamagishi and
Nicholas Evans and
Andreas Nautsch and
Kong Aik Lee ASVspoof 2021: Towards Spoofed and
Deepfake Speech Detection in the Wild 2507--2522
Zalán Borsos and
Raphaël Marinier and
Damien Vincent and
Eugene Kharitonov and
Olivier Pietquin and
Matt Sharifi and
Dominik Roblek and
Olivier Teboul and
David Grangier and
Marco Tagliasacchi and
Neil Zeghidour AudioLM: a Language Modeling Approach to
Audio Generation . . . . . . . . . . . . 2523--2533
Xingfeng Li and
Xiaohan Shi and
Desheng Hu and
Yongwei Li and
Qingchen Zhang and
Zhengxia Wang and
Masashi Unoki and
Masato Akagi Music Theory-Inspired Acoustic
Representation for Speech Emotion
Recognition . . . . . . . . . . . . . . 2534--2547
Jiachen Lian and
Chunlei Zhang and
Gopala K. Anumanchipalli and
Dong Yu Unsupervised TTS Acoustic Modeling for
TTS With Conditional Disentangled
Sequential VAE . . . . . . . . . . . . . 2548--2557
Arsalan Malik and
Nipun Agarwal and
Harshavardhan Settibhaktini and
Ananthakrishna Chintanpalli Predicting Level-Dependent Changes in
Concurrent Vowel Scores Using the
$2$D-CNN Models . . . . . . . . . . . . 2558--2566
Michael Krause and
Meinard Müller Hierarchical Classification for
Instrument Activity Detection in
Orchestral Music Recordings . . . . . . 2567--2578
Julie Meyer and
Sebastian Prepeli\ct\ua and
Ali Khajeh-Saeed and
Michael Smirnov and
Pablo Hoffmann Verification on Head-Related Transfer
Functions of a Snowman Model Simulated
Using the Finite-Difference Time-Domain
Method . . . . . . . . . . . . . . . . . 2579--2591
Darius Petermann and
Gordon Wichern and
Aswin Shanmugam Subramanian and
Zhong-Qiu Wang and
Jonathan Le Roux Tackling the Cocktail Fork Problem for
Separation and Transcription of
Real-World Soundtracks . . . . . . . . . 2592--2605
Hailong Cao and
Liguo Li and
Conghui Zhu and
Muyun Yang and
Tiejun Zhao Dual Word Embedding for Robust
Unsupervised Bilingual Lexicon Induction 2606--2615
Lin Xiao and
Pengyu Xu and
Mingyang Song and
Huafeng Liu and
Liping Jing and
Xiangliang Zhang Triple Alliance Prototype Orthotist
Network for Long-Tailed Multi-Label Text
Classification . . . . . . . . . . . . . 2616--2628
Juhua Liu and
Qihuang Zhong and
Liang Ding and
Hua Jin and
Bo Du and
Dacheng Tao Unified Instance and Knowledge Alignment
Pretraining for Aspect-Based Sentiment
Analysis . . . . . . . . . . . . . . . . 2629--2642
Yiming Zhang and
Hong Yu and
Ruoyi Du and
Zheng-Hua Tan and
Wenwu Wang and
Zhanyu Ma and
Yuan Dong ACTUAL: Audio Captioning With Caption
Feature Space Regularization . . . . . . 2643--2657
Jakob Abeßer and
Sascha Grollmisch and
Meinard Müller How Robust are Audio Embeddings for
Polyphonic Sound Event Tagging? . . . . 2658--2667
Wei Xia and
John H. L. Hansen Attention and DCT Based Global Context
Modeling for Text-Independent Speaker
Recognition . . . . . . . . . . . . . . 2668--2679
Takuya Hasumi and
Tomohiko Nakamura and
Norihiro Takamune and
Hiroshi Saruwatari and
Daichi Kitamura and
Yu Takahashi and
Kazunobu Kondo PoP-IDLMA: Product-of-Prior Independent
Deeply Learned Matrix Analysis for
Multichannel Music Source Separation . . 2680--2694
Ben Liu and
Jun Wang and
Guanyuan Yu and
Shaolei Chen CUPVC: a Constraint-Based Unsupervised
Prosody Transfer for Improving Telephone
Banking Services . . . . . . . . . . . . 2695--2706
Guinan Li and
Jiajun Deng and
Mengzhe Geng and
Zengrui Jin and
Tianzi Wang and
Shujie Hu and
Mingyu Cui and
Helen Meng and
Xunying Liu Audio-Visual End-to-End Multi-Channel
Speech Separation, Dereverberation and
Recognition . . . . . . . . . . . . . . 2707--2723
Jean-Marie Lemercier and
Julius Richter and
Simon Welker and
Timo Gerkmann StoRM: a Diffusion-Based Stochastic
Regeneration Model for Speech
Enhancement and Dereverberation . . . . 2724--2737
Yen-Ju Lu and
Chia-Yu Chang and
Cheng Yu and
Ching-Feng Liu and
Jeih-weih Hung and
Shinji Watanabe and
Yu Tsao Improving Speech Enhancement Performance
by Leveraging Contextual Broad Phonetic
Class Information . . . . . . . . . . . 2738--2750
Sungjae Kim and
Yewon Kim and
Jewoo Jun and
Injung Kim MuSE-SVS: Multi-Singer Emotional Singing
Voice Synthesizer That Controls
Emotional Intensity . . . . . . . . . . 2751--2764
Xinxin Su and
Zhen Huang and
Yunxiang Zhao and
Yifan Chen and
Yong Dou and
Hengyue Pan Recent Trends in Deep Learning Based
Textual Emotion Cause Extraction . . . . 2765--2786
Junyu Lu and
Hongfei Lin and
Xiaokun Zhang and
Zhaoqing Li and
Tongyue Zhang and
Linlin Zong and
Fenglong Ma and
Bo Xu Hate Speech Detection via Dual
Contrastive Learning . . . . . . . . . . 2787--2795
Diego Marques do Carmo and
Ricardo A. Borsoi and
Márcio Holsbach Costa Closed-Form Solution to the Multichannel
Wiener Filter With Interaural Level
Difference Preservation . . . . . . . . 2796--2811
Ya-Jie Zhang and
Chao Zhang and
Wei Song and
Zhengchen Zhang and
Youzheng Wu and
Xiaodong He Prosody Modelling With Pre-Trained
Cross-Utterance Representations for
Improved Speech Synthesis . . . . . . . 2812--2823
Ching-Yu Chiu and
Meinard Müller and
Matthew E. P. Davies and
Alvin Wen-Yu Su and
Yi-Hsuan Yang Local Periodicity-Based Beat Tracking
for Expressive Classical Piano Music . . 2824--2835
Feng Chen and
Ke Ma and
Yapeng Mao and
Desen Yang and
Yi Zhang and
Jie Shi and
Shiqi Mo and
Gui Chenyang and
Song Li A Novel Method to Design Steerable
Differential Beamformer Using Linear
Acoustics Vector Sensor Array . . . . . 2836--2849
Tianyu Huang and
Weisheng Dong and
Fangfang Wu and
Xin Li and
Guangming Shi Uncertainty-Driven Knowledge
Distillation for Language Model
Compression . . . . . . . . . . . . . . 2850--2858
Andrés Carofilis and
Enrique Alegre and
Eduardo Fidalgo and
Laura Fernández-Robles Improvement of Accent Classification
Models Through Grad-Transfer From
Spectrograms and Gradient-Weighted Class
Activation Mapping . . . . . . . . . . . 2859--2871
Jacob Hollebon and
Filippo Maria Fazi Higher-Order Stereophony . . . . . . . . 2872--2885
Jeremy H. M. Wong and
Huayun Zhang and
Nancy F. Chen Modelling Inter-Rater Uncertainty in
Spoken Language Assessment . . . . . . . 2886--2898
Qinghua Zheng and
Yuefei Wu and
Guangtao Wang and
Yanping Chen and
Wei Wu and
Zai Zhang and
Bin Shi and
Bo Dong Exploring Interactive and Contrastive
Relations for Nested Named Entity
Recognition . . . . . . . . . . . . . . 2899--2909
Dongyuan Shi and
Woon-Seng Gan and
Bhan Lam and
Zhengding Luo and
Xiaoyi Shen Transferable Latent of CNN-Based
Selective Fixed-Filter Active Noise
Control . . . . . . . . . . . . . . . . 2910--2921
Dorian Desblancs and
Vincent Lostanlen and
Romain Hennequin Zero-Note Samba: Self-Supervised Beat
Tracking . . . . . . . . . . . . . . . . 2922--2934
Nankai Lin and
Yingwen Fu and
Xiaotian Lin and
Dong Zhou and
Aimin Yang and
Shengyi Jiang CL-XABSA: Contrastive Learning for
Cross-Lingual Aspect-Based Sentiment
Analysis . . . . . . . . . . . . . . . . 2935--2946
Hanmeng Liu and
Jian Liu and
Leyang Cui and
Zhiyang Teng and
Nan Duan and
Ming Zhou and
Yue Zhang LogiQA 2.0 --- an Improved Dataset for
Logical Reasoning in Natural Language
Understanding . . . . . . . . . . . . . 2947--2962
Jiangyan Yi and
Jianhua Tao and
Ruibo Fu and
Tao Wang and
Chu Yuan Zhang and
Chenglong Wang Adversarial Multi-Task Learning for
Mandarin Prosodic Boundary Prediction
With Multi-Modal Embeddings . . . . . . 2963--2973
Ji Won Yoon and
Hyung Yong Kim and
Hyeonseung Lee and
Sunghwan Ahn and
Nam Soo Kim Oracle Teacher: Leveraging Target
Information for Better Knowledge
Distillation of CTC Models . . . . . . . 2974--2987
Sufeng Duan and
Hai Zhao and
Dongdong Zhang Syntax-Aware Data Augmentation for
Neural Machine Translation . . . . . . . 2988--2999
Tongzheng Liu and
Zhihua Lu and
João Paulo J. da Costa and
Tai Fei A Hybrid Reverberation Model and Its
Application to Joint Speech
Dereverberation and Separation . . . . . 3000--3014
Junjun Guo and
Junjie Ye and
Yan Xiang and
Zhengtao Yu Layer-Level Progressive Transformer With
Modality Difference Awareness for
Multi-Modal Neural Machine Translation 3015--3026
Qian Tao and
Zhihao Xiong and
Bocheng Han and
Xiaoyang Fan and
Lusi Li A Novel Unsupervised Approach for
Cross-Lingual Word Alignment in Low
Isomorphic Embedding Spaces . . . . . . 3027--3041
Jilu Jin and
Jacob Benesty and
Jingdong Chen and
Gongping Huang Differential Beamforming From a
Geometric Perspective . . . . . . . . . 3042--3054
Alberto Palomo-Alonso and
David Casillas-Pérez and
Silvia Jiménez-Fernández and
Jose A. Portilla-Figueras and
Sancho Salcedo-Sanz A Flexible Architecture Using Temporal,
Spatial and Semantic Correlation-Based
Algorithms for Story Segmentation of
Broadcast News . . . . . . . . . . . . . 3055--3069
Bolaji Yusuf and
Jan \vCernocký and
Murat Saraçlar End-to-End Open Vocabulary Keyword
Search With Multilingual Neural
Representations . . . . . . . . . . . . 3070--3080
Adrian Herzog and
Srikanth Raj Chetupalli and
Emanuël A. P. Habets AmbiSep: Joint Ambisonic-to-Ambisonic
Speech Separation and Noise Reduction 3081--3094
Po-chun Hsu and
Da-rong Liu and
Andy T. Liu and
Hung-yi Lee Parallel Synthesis for Autoregressive
Speech Generation . . . . . . . . . . . 3095--3111
Siddharth Dalmia and
Dmytro Okhonko and
Mike Lewis and
Sergey Edunov and
Shinji Watanabe and
Florian Metze and
Luke Zettlemoyer and
Abdelrahman Mohamed LegoNN: Building Modular Encoder-Decoder
Models . . . . . . . . . . . . . . . . . 3112--3126
Tom Gajecki and
Waldo Nogueira Deep Latent Fusion Layers for Binaural
Speech Enhancement . . . . . . . . . . . 3127--3138
Huawen Feng and
Zhenxi Lin and
Qianli Ma Perturbation-Based Self-Supervised
Attention for Attention Bias in Text
Classification . . . . . . . . . . . . . 3139--3151
Jiaxin Zhong and
Tao Zhuang and
Mengtong Li and
Ray Kirby and
Mahmoud Karimi and
Jing Lu and
Dong Zhang Sidelobe Suppression for a Steerable
Parametric Source Using the Sparse
Random Array Technique . . . . . . . . . 3152--3161
Yan Fang and
Wei Lu and
Xiaodong Liu and
Witold Pedrycz and
Qi Lang and
Jianhua Yang CircularE: a Complex Space Circular
Correlation Relational Model for Link
Prediction in Knowledge Graph Embedding 3162--3175
Jie Zhang and
Rui Tao and
Jun Du and
Li-Rong Dai SDW-SWF: Speech Distortion Weighted
Single-Channel Wiener Filter for Noise
Reduction . . . . . . . . . . . . . . . 3176--3189
Haozhou Li and
Qinke Peng and
Xu Mou and
Ying Wang and
Zeyuan Zeng and
Muhammad Fiaz Bashir Abstractive Financial News Summarization
via Transformer-BiLSTM Encoder and Graph
Attention-Based Decoder . . . . . . . . 3190--3205
Weitao Yuan and
Shengbei Wang and
Jianming Wang and
Masashi Unoki and
Wenwu Wang Unsupervised Deep Unfolded
Representation Learning for Singing
Voice Separation . . . . . . . . . . . . 3206--3220
Zhong-Qiu Wang and
Samuele Cornell and
Shukjae Choi and
Younglo Lee and
Byeong-Yeol Kim and
Shinji Watanabe TF-GridNet: Integrating Full- and
Sub-Band Modeling for Speech Separation 3221--3236
Marvin Tammen and
Simon Doclo Parameter Estimation Procedures for Deep
Multi-Frame MVDR Filtering for
Single-Microphone Speech Enhancement . . 3237--3248
Yi Lin and
Qingyang Wang and
Xincheng Yu and
Zichen Zhang and
Dongyue Guo and
Jizhe Zhou Towards Recognition for Radio-Echo
Speech in Air Traffic Control: Dataset
and a Contrastive Learning Approach . . 3249--3262
Diego Caviedes-Nozal and
Efren Fernandez-Grande Spatio-Temporal Bayesian Regression for
Room Impulse Response Reconstruction
With Spherical Waves . . . . . . . . . . 3263--3277
Xinyu Hu and
Xiaojun Wan RST Discourse Parsing as Text-to-Text
Generation . . . . . . . . . . . . . . . 3278--3289
Shun Lei and
Yixuan Zhou and
Liyang Chen and
Zhiyong Wu and
Xixin Wu and
Shiyin Kang and
Helen Meng MSStyleTTS: Multi-Scale Style Modeling
With Hierarchical Context Information
for Expressive Speech Synthesis . . . . 3290--3303
Pedro Izquierdo Lehmann and
Rodrigo F. Cádiz and
Carlos A. Sing Long Towards Maximizing a Perceptual \em
Sweet Spot for Spatial Sound With
Loudspeakers . . . . . . . . . . . . . . 3304--3319
Han Zhu and
Dongji Gao and
Gaofeng Cheng and
Daniel Povey and
Pengyuan Zhang and
Yonghong Yan Alternative Pseudo-Labeling for
Semi-Supervised Automatic Speech
Recognition . . . . . . . . . . . . . . 3320--3330
Junqing Zhang and
Liming Shi and
Mads Græsbòll Christensen and
Wen Zhang and
Lijun Zhang and
Jingdong Chen CGMM-Based Sound Zone Generation Using
Robust Pressure Matching With ATF
Perturbation Constraints . . . . . . . . 3331--3345
Erfan Loweimi and
Andrea Carmantini and
Peter Bell and
Steve Renals and
Zoran Cvetkovic Phonetic Error Analysis Beyond Phone
Error Rate . . . . . . . . . . . . . . . 3346--3361
Runxuan Yang and
Yuyang Peng and
Xiaolin Hu A Fast High-Fidelity Source-Filter
Vocoder With Lightweight Neural Modules 3362--3373
Yuxiang Zhang and
Zhuo Li and
Jingze Lu and
Hua Hua and
Wenchao Wang and
Pengyuan Zhang The Impact of Silence on Speech
Anti-Spoofing . . . . . . . . . . . . . 3374--3389
Philippe Gonzalez and
Tommy Sonne Alstròm and
Tobias May Assessing the Generalization Gap of
Learning-Based Speech Enhancement
Systems in Noisy and Reverberant
Environments . . . . . . . . . . . . . . 3390--3403
Ziyi Xu and
Ziyue Zhao and
Tim Fingscheidt Coded Speech Quality Measurement by a
Non-Intrusive PESQ-DNN . . . . . . . . . 3404--3417
Tao Li and
Chenxu Hu and
Jian Cong and
Xinfa Zhu and
Jingbei Li and
Qiao Tian and
Yuping Wang and
Lei Xie DiCLET-TTS: Diffusion Model Based
Cross-Lingual Emotion Transfer for
Text-to-Speech --- a Study Between
English and Mandarin . . . . . . . . . . 3418--3430
Xuexin Xu and
Liang Shi and
Xunquan Chen and
Pingyuan Lin and
Jie Lian and
Jinhui Chen and
Zhihong Zhang and
Edwin R. Hancock Any-to-Any Voice Conversion With
Multi-Layer Speaker Adaptation and
Content Supervision . . . . . . . . . . 3431--3445
Chenpeng Du and
Yiwei Guo and
Xie Chen and
Kai Yu Speaker Adaptive Text-to-Speech With
Timbre-Normalized Vector-Quantized
Feature . . . . . . . . . . . . . . . . 3446--3456
Yash Kumar Atri and
Vikram Goyal and
Tanmoy Chakraborty Multi-Document Summarization Using
Selective Attention Span and
Reinforcement Learning . . . . . . . . . 3457--3467
Maochun Huang and
Chunmei Qing and
Junpeng Tan and
Xiangmin Xu Context-Based Adaptive Multimodal Fusion
Network for Continuous Frame-Level
Sentiment Prediction . . . . . . . . . . 3468--3477
Sebastian J. Schlecht and
Jon Fagerström and
Vesa Välimäki Decorrelation in Feedback Delay Networks 3478--3487
Jinliang Lu and
Jiajun Zhang Towards Unified Multi-Domain Machine
Translation With Mixture of Domain
Experts . . . . . . . . . . . . . . . . 3488--3498
Julien Hauret and
Thomas Joubaud and
Véronique Zimpfer and
Éric Bavu Configurable EBEN: Extreme Bandwidth
Extension Network to Enhance
Body-Conducted Speech Capture . . . . . 3499--3512
Wanli Peng and
Sheng Li and
Zhenxing Qian and
Xinpeng Zhang Text Steganalysis Based on Hierarchical
Supervised Learning and Dual Attention
Mechanism . . . . . . . . . . . . . . . 3513--3526
Lin Xu and
Qixian Zhou and
Jinlan Fu and
See-Kiong Ng CET2: Modelling Topic Transitions for
Coherent and Engaging Knowledge-Grounded
Conversations . . . . . . . . . . . . . 3527--3536
Vincent W. Neo and
Christine Evers and
Stephan Weiss and
Patrick A. Naylor Signal Compaction Using Polynomial EVD
for Spherical Array Processing With
Applications . . . . . . . . . . . . . . 3537--3549
Gerald Enzner and
Svantje Voit Hybrid- Frequency-Resolution Adaptive
Kalman Filter for Online Identification
of Long Acoustic Responses With Low
Input-Output Latency . . . . . . . . . . 3550--3563
Shang Gao and
Maoshen Jia and
Dingding Yao and
Jing Wang Multi-Source Localization Using
Optimized Time-Frequency Representation
and Sparsity Component Analysis . . . . 3564--3578
Qi He and
Mingjie Gao and
Ka Fai Cedric Yiu and
Sven Nordholm Distributed Microphone Array
Localization Problem via SDP-SOCP Method 3579--3588
Hiroshi Sawada and
Rintaro Ikeshita and
Keisuke Kinoshita and
Tomohiro Nakatani Multi-Frame Full-Rank Spatial Covariance
Analysis for Underdetermined Blind
Source Separation and Dereverberation 3589--3602
Hongyang Chang and
Hongfei Xu and
Josef van Genabith and
Deyi Xiong and
Hongying Zan JoinER-BART: Joint Entity and Relation
Extraction With Constrained Decoding,
Representation Reuse and Fusion . . . . 3603--3616
Xinqi Huang and
Yingsong Li and
Yuriy Zakharov and
Yongchun Miao and
Zhixiang Huang Squared Sine Adaptive Algorithm and Its
Performance Analysis . . . . . . . . . . 3617--3628
Andong Li and
Guochen Yu and
Chengshi Zheng and
Wenzhe Liu and
Xiaodong Li A General Unfolding Speech Enhancement
Method Motivated by Taylor's Theorem . . 3629--3646
Bin Gu and
Jie Zhang and
Wu Guo A Dynamic Convolution Framework for
Session-Independent Speaker Embedding
Learning . . . . . . . . . . . . . . . . 3647--3658
Daojian Zeng and
Chao Zhao and
Chao Jiang and
Jianling Zhu and
Jianhua Dai Document-Level Relation Extraction With
Context Guided Mention Integration and
Inter-Pair Reasoning . . . . . . . . . . 3659--3666
Lu Li and
Maoshen Jia and
Jing Wang and
Ruiyuan Cao Multiple-Speech-Source DOA Estimation
Based on Single-Source Cluster Detection 3667--3680
Xiaoxiao Miao and
Xin Wang and
Erica Cooper and
Junichi Yamagishi and
Natalia Tomashenko Speaker Anonymization Using Orthogonal
Householder Neural Network . . . . . . . 3681--3695
Zhengshan Xue and
Xiaolei Zhang and
Tingxun Shi and
Deyi Xiong DetTrans: a Lightweight Framework to
Detect and Translate Noisy Inputs
Simultaneously . . . . . . . . . . . . . 3696--3705
Chang Liu and
Zhen-Hua Ling and
Ling-Hui Chen Pronunciation Dictionary-Free
Multilingual Speech Synthesis Using
Learned Phonetic Representations . . . . 3706--3716
Reo Yoneyama and
Yi-Chiao Wu and
Tomoki Toda High-Fidelity and Pitch-Controllable
Neural Vocoder Based on Unified
Source-Filter Networks . . . . . . . . . 3717--3729
Stefan Thaleiser and
Gerald Enzner Binaural-Projection Multichannel Wiener
Filter for Cue-Preserving Binaural
Speech Enhancement . . . . . . . . . . . 3730--3745
Yixin Wang and
Wei Wei and
Xiangming Gu and
Xiaohong Guan and
Ye Wang Disentangled Adversarial Domain
Adaptation for Phonation Mode Detection
in Singing and Speech . . . . . . . . . 3746--3759
Yixuan Zhang and
Heming Wang and
DeLiang Wang $ F0 $ Estimation and Voicing Detection
With Cascade Architecture in Noisy
Speech . . . . . . . . . . . . . . . . . 3760--3770
Zhengdao Zhao and
Yuhua Wang and
Guang Shen and
Yuezhu Xu and
Jiayuan Zhang TDFNet: Transformer-Based Deep-Scale
Fusion Network for Multimodal Emotion
Recognition . . . . . . . . . . . . . . 3771--3782
Johannes M. Arend and
Christoph Pörschmann and
Stefan Weinzierl and
Fabian Brinkmann Magnitude-Corrected and Time-Aligned
Interpolation of Head-Related Transfer
Functions . . . . . . . . . . . . . . . 3783--3799
Desh Raj and
Daniel Povey and
Sanjeev Khudanpur SURT 2.0: Advances in Transducer-Based
Multi-Talker Speech Recognition . . . . 3800--3813
Jiaming An and
Zixiang Ding and
Ke Li and
Rui Xia Global-View and Speaker-Aware Emotion
Cause Extraction in Conversations . . . 3814--3823
Yuqin Lin and
Longbiao Wang and
Yanbing Yang and
Jianwu Dang CFDRN: a Cognition-Inspired Feature
Decomposition and Recombination Network
for Dysarthric Speech Recognition . . . 3824--3836
Rémi Blandin and
Simon Stone and
Angélique Remacle and
Vincent Didone and
Peter Birkholz A Comparative Study of $3$D and $1$D
Acoustic Simulations of the Higher
Frequencies of Speech . . . . . . . . . 3837--3847
Qing Wang and
Jixun Yao and
Li Zhang and
Pengcheng Guo and
Lei Xie Timbre-Reserved Adversarial Attack in
Speaker Identification . . . . . . . . . 3848--3858
Yachao Li and
Junhui Li and
Jing Jiang and
Shimin Tao and
Hao Yang and
Min Zhang P-Transformer: Towards Better
Document-to-Document Neural Machine
Translation . . . . . . . . . . . . . . 3859--3870
Chao Xie and
Tomoki Toda Noisy-to-Noisy Voice Conversion Under
Variations of Noisy Condition . . . . . 3871--3882
Zhichao Wang and
Xinsheng Wang and
Qicong Xie and
Tao Li and
Lei Xie and
Qiao Tian and
Yuping Wang MSM-VC: High-Fidelity Source Style
Transfer for Non-Parallel Voice
Conversion by Multi-Scale Style Modeling 3883--3895
Yilin Zhao and
Hai Zhao and
Sufeng Duan Multi-Grained Evidence Inference for
Multi-Choice Reading Comprehension . . . 3896--3907
Ye-Qian Du and
Jie Zhang and
Xin Fang and
Ming-Hui Wu and
Zhou-Wang Yang A Semi-Supervised Complementary Joint
Training Approach for Low-Resource
Speech Recognition . . . . . . . . . . . 3908--3921
Changheng Li and
Richard C. Hendriks Alternating Least-Squares-Based
Microphone Array Parameter Estimation
for a Single-Source Reverberant and
Noisy Acoustic Scenario . . . . . . . . 3922--3934
Kun Zhou and
Yuanhang Zhou and
Wayne Xin Zhao and
Ji-Rong Wen Learning to Perturb for Contrastive
Learning of Unsupervised Sentence
Representations . . . . . . . . . . . . 3935--3944
Georg Götz and
Sebastian J. Schlecht and
Ville Pulkki Common-Slope Modeling of Late
Reverberation . . . . . . . . . . . . . 3945--3957
Guanhua Chen and
Runzhe Zhan and
Derek F. Wong and
Lidia S. Chao Multi-Level Curriculum Learning for
Multi-Turn Dialogue Generation . . . . . 3958--3967
Yun-Yen Chuang and
Hung-Min Hsu and
Kevin Lin and
Ray-I. Chang and
Hung-Yi Lee MetaEx-GAN: Meta Exploration to Improve
Natural Language Generation via
Generative Adversarial Networks . . . . 3968--3980
Chuxuan Tong and
Xi Zheng and
Jianhua Li and
Xingjun Ma and
Longxiang Gao and
Yong Xiang Query-Efficient Black-Box Adversarial
Attacks on Automatic Speech Recognition 3981--3992
Xixin Wu and
Hui Lu and
Kun Li and
Zhiyong Wu and
Xunying Liu and
Helen Meng Hiformer: Sequence Modeling Networks
With Hierarchical Attention Mechanisms 3993--4003
Ante Wang and
Linfeng Song and
Lifeng Jin and
Junfeng Yao and
Haitao Mi and
Chen Lin and
Jinsong Su and
Dong Yu D$^2$PSG: Multi-Party Dialogue Discourse
Parsing as Sequence Generation . . . . . 4004--4013
Nan Gao and
Yongjian Wang and
Peng Chen and
Jijun Tang Boosting Short Text Classification by
Solving the OOV Problem . . . . . . . . 4014--4024
Jin Chu Wu and
Raghu N. Kacker Statistical Analysis for Speaker
Recognition Evaluation With Data
Dependence and Three Score Distributions 1--14
Yongwei Zhou and
Junwei Bao and
Youzheng Wu and
Xiaodong He and
Tiejun Zhao Operation-Augmented Numerical Reasoning
for Question Answering . . . . . . . . . 15--28
Anurenjan Purushothaman and
Debottam Dutta and
Rohit Kumar and
Sriram Ganapathy Speech Dereverberation With Frequency
Domain Autoregressive Modeling . . . . . 29--38
Leyuan Qu and
Taihao Li and
Cornelius Weber and
Theresa Pekarek-Rosin and
Fuji Ren and
Stefan Wermter Disentangling Prosody Representations
With Unsupervised Speech Reconstruction 39--54
Mathias Bach Pedersen and
Sòren Holdt Jensen and
Zheng-Hua Tan and
Jesper Jensen Data-Driven Non-Intrusive Speech
Intelligibility Prediction Using Speech
Presence Probability . . . . . . . . . . 55--67
Yuanbo Hou and
Bo Kang and
Andrew Mitchell and
Wenwu Wang and
Jian Kang and
Dick Botteldooren Cooperative Scene-Event Modelling for
Acoustic Scene Classification . . . . . 68--82
Xiaotong Jiang and
Peiwen You and
Chen Chen and
Zhongqing Wang and
Guodong Zhou Exploring Scope Detection for
Aspect-Based Sentiment Analysis . . . . 83--94
Xuenan Xu and
Zeyu Xie and
Mengyue Wu and
Kai Yu Beyond the Status Quo: a Contemporary
Survey of Advances and Challenges in
Audio Captioning . . . . . . . . . . . . 95--112
Federico Miotello and
Mirco Pezzoli and
Luca Comanducci and
Fabio Antonacci and
Augusto Sarti Deep Prior-Based Audio Inpainting Using
Multi-Resolution Harmonic Convolutional
Neural Networks . . . . . . . . . . . . 113--123
Cristian-Lucian Stanciu and
Jacob Benesty and
Constantin Paleologu and
Ruxandra-Liana Costea and
Laura-Maria Dogariu and
Silviu Ciochin\ua Decomposition-Based Wiener Filter Using
the Kronecker Product and Conjugate
Gradient Method . . . . . . . . . . . . 124--138
Huiyao Chen and
Yueheng Sun and
Meishan Zhang and
Min Zhang Automatic Noise Generation and Reduction
for Text Classification . . . . . . . . 139--150
Jiaming Xu and
Jian Cui and
Yunzhe Hao and
Bo Xu Multi-Cue Guided Semi-Supervised
Learning Toward Target Speaker
Separation in Real Environments . . . . 151--163
Yang Xiang and
Jesper Lisby Hòjvang and
Morten Hòjfeldt Rasmussen and
Mads Græsbòll Christensen A Two-Stage Deep Representation
Learning-Based Speech Enhancement Method
Using Variational Autoencoder and
Adversarial Training . . . . . . . . . . 164--177
Xiao Li and
Ruirui Liu and
Huichou Huang and
Qingyao Wu Contrastive Learning for Target Speaker
Extraction With Attention-Based Fusion 178--188
Xiaobo Liang and
Runze Mao and
Lijun Wu and
Juntao Li and
Min Zhang and
Qing Li Enhancing Low-Resource NLP by
Consistency Training With Data and Model
Perturbations . . . . . . . . . . . . . 189--199
Haisheng Lu and
Jiangnan Liang and
Chuang Shi Comments on ``Primary-Ambient Extraction
Using Ambient Spectrum Estimation for
Immersive Spatial Audio Reproduction'' 200--202
Szymon Drgas and
Lars Bramslòw and
Archontis Politis and
Gaurav Naithani and
Tuomas Virtanen Dynamic Processing Neural Network
Architecture for Hearing Loss
Compensation . . . . . . . . . . . . . . 203--214
Femke B. Gelderblom and
Tron Vedul Tronstad and
Torbjòrn Svendsen and
Tor Andre Myrvoll On the Predictive Power of Objective
Intelligibility Metrics for the
Subjective Performance of Deep Complex
Convolutional Recurrent Speech
Enhancement Networks . . . . . . . . . . 215--226
Thomas Haubner and
Andreas Brendel and
Walter Kellermann End-to-End Deep Learning-Based
Adaptation Control for Linear Acoustic
Echo Cancellation . . . . . . . . . . . 227--238
Congcong Jiang and
Tieyun Qian and
Bing Liu One General Teacher for Multi-Data
Multi-Task: a New Knowledge Distillation
Framework for Discourse Relation
Analysis . . . . . . . . . . . . . . . . 239--249
Khandokar Md. Nayem and
Donald S. Williamson Attention-Based Speech Enhancement Using
Human Quality Perception Modeling . . . 250--260
Ying Zhang and
Fandong Meng and
Yufeng Chen and
Jinan Xu and
Jie Zhou Complex Question Enhanced Transfer
Learning for Zero-Shot Joint Information
Extraction . . . . . . . . . . . . . . . 261--275
Jingsong Yan and
Piji Li and
Haibin Chen and
Junhao Zheng and
Qianli Ma Does the Order Matter? A Random
Generative Way to Learn Label Hierarchy
for Hierarchical Text Classification . . 276--285
Georgios Paraskevopoulos and
Theodoros Kouzelis and
Georgios Rouvalis and
Athanasios Katsamanis and
Vassilis Katsouros and
Alexandros Potamianos Sample-Efficient Unsupervised Domain
Adaptation of Speech Recognition
Systems: a Case Study for Modern Greek 286--299
Ernesto Accolti and
Javier Gimenez and
Michael Vorländer Uncertainties of Room Acoustics
Simulation Due to Directivity Data of
Musical Instruments . . . . . . . . . . 300--309
Yoshiki Masuyama and
Kouei Yamaoka and
Yuma Kinoshita and
Taishi Nakashima and
Nobutaka Ono Causal and Relaxed-Distortionless
Response Beamforming for Online Target
Source Extraction . . . . . . . . . . . 310--324
Rohit Prabhavalkar and
Takaaki Hori and
Tara N. Sainath and
Ralf Schlüter and
Shinji Watanabe End-to-End Speech Recognition: a Survey 325--351
Yun Zhao and
Dexi Liu and
Changxuan Wan and
Xiping Liu and
Jian-yun Nie and
Jiaming Liu JMS-QA: a Joint Hierarchical
Architecture for Mental Health Question
Answering . . . . . . . . . . . . . . . 352--363
Shiwen Ni and
Jiawen Li and
Min Yang and
Hung-Yu Kao DropAttack: a Random Dropped Weight
Attack Adversarial Training for Natural
Language Understanding . . . . . . . . . 364--373
Tiantian Zhu and
Yang Qin and
Ming Feng and
Qingcai Chen and
Baotian Hu and
Yang Xiang BioPRO: Context-Infused Prompt Learning
for Biomedical Entity Linking . . . . . 374--385
Jiapu Wang and
Boyue Wang and
Junbin Gao and
Simin Hu and
Yongli Hu and
Baocai Yin Multi-Level Interaction Based Knowledge
Graph Completion . . . . . . . . . . . . 386--396
Qiangqiang Zhang and
Dongyuan Lin and
Yingying Xiao and
Yunfei Zheng and
Shiyuan Wang Error Reused Filtered-$X$ Least Mean
Square Algorithm for Active Noise
Control . . . . . . . . . . . . . . . . 397--412
Zengrui Jin and
Mengzhe Geng and
Jiajun Deng and
Tianzi Wang and
Shujie Hu and
Guinan Li and
Xunying Liu Personalized Adversarial Data
Augmentation for Dysarthric and Elderly
Speech Recognition . . . . . . . . . . . 413--429
Jun Kong and
Jin Wang and
Xuejie Zhang Adaptive Ensemble Self-Distillation With
Consistent Gradients for Fast Inference
of Pretrained Language Models . . . . . 430--442
Sr\dbaran Kiti\'c and
Jérôme Daniel Blind Identification of Ambisonic
Reduced Room Impulse Response . . . . . 443--458
Qijie Shao and
Pengcheng Guo and
Jinghao Yan and
Pengfei Hu and
Lei Xie Decoupling and Interacting Multi-Task
Learning Network for Joint Speech and
Accent Recognition . . . . . . . . . . . 459--470
Han Zhu and
Gaofeng Cheng and
Jindong Wang and
Wenxin Hou and
Pengyuan Zhang and
Yonghong Yan Boosting Cross-Domain Speech Recognition
With Self-Supervision . . . . . . . . . 471--485
Yile Wang and
Yue Zhang and
Peng Li and
Yang Liu Gradual Syntactic Label Replacement for
Language Model Pre-Training . . . . . . 486--496
Penghui Ma and
Jianfeng Li and
Jingjing Pan and
Xiaofei Zhang and
Roberto Gil-Pita Coherent Signal DOA Estimation With
Coprime Array: Exploiting Signal
Subspace Reconstructing Strategy . . . . 497--508
Emma Hamel and
Nickvash Kani Factors That Influence Automatic
Recognition of African-American
Vernacular English in Machine-Learning
Models . . . . . . . . . . . . . . . . . 509--516
Jingbei Li and
Sipan Li and
Ping Chen and
Luwen Zhang and
Yi Meng and
Zhiyong Wu and
Helen Meng and
Qiao Tian and
Yuping Wang and
Yuxuan Wang Joint Multiscale Cross-Lingual Speaking
Style Transfer With Bidirectional
Attention Mechanism for Automatic
Dubbing . . . . . . . . . . . . . . . . 517--528
Bing Han and
Zhengyang Chen and
Yanmin Qian Self-Supervised Learning With
Cluster-Aware-DINO for High-Performance
Robust Speaker Verification . . . . . . 529--541
Kristina Tesch and
Timo Gerkmann Multi-Channel Speech Separation Using
Spatially Selective Deep Non-Linear
Filters . . . . . . . . . . . . . . . . 542--553
Hao-Chen Pei and
Hao Fang and
Xin Luo and
Xin-Shun Xu Gradformer: a Framework for Multi-Aspect
Multi-Granularity Pronunciation
Assessment . . . . . . . . . . . . . . . 554--563
Garima Sharma and
Karthikeyan Umapathy and
Sridhar Krishnan Time-Frequency Scattergrams for
Biomedical Audio Signal Representation
and Classification . . . . . . . . . . . 564--576
Zhibo Man and
Zengcheng Huang and
Yujie Zhang and
Yu Li and
Yuanmeng Chen and
Yufeng Chen and
Jinan Xu WDSRL: Multi-Domain Neural Machine
Translation With Word-Level
Domain-Sensitive Representation Learning 577--590
Chin-Po Chen and
Ho-Hsien Pan and
Susan Shur-Fen Gau and
Chi-Chun Lee Using Measures of Vowel Space for
Autistic Traits Characterization . . . . 591--607
Kevin Wilkinghoff and
Frank Kurth Why Do Angular Margin Losses Work Well
for Semi-Supervised Anomalous Sound
Detection? . . . . . . . . . . . . . . . 608--622
Aku Rouhe and
Tamás Grósz and
Mikko Kurimo Principled Comparisons for End-to-End
Speech Recognition: Attention vs Hybrid
at the $ 1000$-Hour Scale . . . . . . . 623--638
Yile Wang and
Yue Zhang Lost in Context? On the Sense-Wise
Variance of Contextualized Word
Embeddings . . . . . . . . . . . . . . . 639--650
Christoph Hold and
Ville Pulkki and
Archontis Politis and
Leo McCormack Compression of Higher-Order Ambisonic
Signals Using Directional Audio Coding 651--665
Shouhui Wang and
Biao Qin A Novel Joint Training Model for
Knowledge Base Question Answering . . . 666--679
Songbin Li and
Jingang Wang and
Peng Liu and
Ke Shi SANet: a Compressed Speech Encoder and
Steganography Algorithm Independent
Steganalysis Deep Neural Network . . . . 680--690
Tarek Kanan and
Amani AbedAlghafer and
Shadi AlZu'bi and
Bilal Hawashin and
Ala Mughaid and
Ghassan Kanaan and
M. M. Kamruzzaman An Intelligent Health Care System for
Detecting Drug Abuse in Social Media
Platforms Based on Low Resource Language 691--703
Alejandro Santorum Varela and
Svetlana Stoyanchev and
Simon Keizer and
Rama Doddipatla and
Kate Knill Entity Resolution in Situated Dialog
With Unimodal and Multimodal
Transformers . . . . . . . . . . . . . . 704--713
Huang He and
Hua Lu and
Siqi Bao and
Fan Wang and
Hua Wu and
Zheng-Yu Niu and
Haifeng Wang Learning to Select External Knowledge
With Multi-Scale Negative Sampling . . . 714--720
Hua Lu and
Zhen Guo and
Chanjuan Li and
Yunyi Yang and
Huang He and
Siqi Bao Towards Building an Open-Domain Dialogue
System Incorporated With Internet Memes 721--726
Jungwoo Lim and
Taesun Whang and
Dongyub Lee and
Heuiseok Lim Adaptive Multi-Domain Dialogue State
Tracking on Spoken Conversations . . . . 727--732
David Thulke and
Nico Daheim and
Christian Dugast and
Hermann Ney Task-Oriented Document-Grounded Dialog
Systems by HLTPR@RWTH for DSTC9 and
DSTC10 . . . . . . . . . . . . . . . . . 733--741
Han Wu and
Kun Xu and
Linqi Song Structure-Aware Dialogue Modeling
Methods for Conversational Semantic Role
Labeling . . . . . . . . . . . . . . . . 742--752
Zhe Chen and
Hongcheng Liu and
Yu Wang DialogMCF: Multimodal Context Flow for
Audio Visual Scene-Aware Dialog . . . . 753--764
Koichiro Yoshino and
Yun-Nung Chen and
Paul Crook and
Satwik Kottur and
Jinchao Li and
Behnam Hedayatnia and
Seungwhan Moon and
Zhengcong Fei and
Zekang Li and
Jinchao Zhang and
Yang Feng and
Jie Zhou and
Seokhwan Kim and
Yang Liu and
Di Jin and
Alexandros Papangelis and
Karthik Gopalakrishnan and
Dilek Hakkani-Tur and
Babak Damavandi and
Alborz Geramifard and
Chiori Hori and
Ankit Shah and
Chen Zhang and
Haizhou Li and
João Sedoc and
Luis F. D'Haro and
Rafael Banchs and
Alexander Rudnicky Overview of the Tenth Dialog System
Technology Challenge: DSTC10 . . . . . . 765--778
Shekhar Kumar Yadav and
Nithin V. George Joint Dereverberation and Beamforming
With Blind Estimation of the Shape
Parameter of the Desired Source Prior 779--793
Yanxiong Li and
Zhongjie Jiang and
Qisheng Huang and
Wenchang Cao and
Jialong Li Lightweight Speaker Verification Using
Transformation Module With Feature
Partition and Fusion . . . . . . . . . . 794--806
Yuhan Dai and
Zhirui Zhang and
Yichao Du and
Shengcai Liu and
Lemao Liu and
Tong Xu Datastore Distillation for Nearest
Neighbor Machine Translation . . . . . . 807--817
Changtao Li and
Feiran Yang and
Jun Yang A Two-Stage Approach to Quality
Restoration of Bone-Conducted Speech . . 818--829
Jie Zhou and
Yuanbiao Lin and
Qin Chen and
Qi Zhang and
Xuanjing Huang and
Liang He CausalABSC: Causal Inference for Aspect
Debiasing in Aspect-Based Sentiment
Classification . . . . . . . . . . . . . 830--840
Ruiying Lu and
Bo Chen and
Dandan Guo and
Dongsheng Wang and
Mingyuan Zhou Hierarchical Topic-Aware Contextualized
Transformers . . . . . . . . . . . . . . 841--852
Yaru Zhao and
Bo Cheng and
Yakun Huang and
Zhiguo Wan FluGCF: a Fluent Dialogue Generation
Model With Coherent Concept Entity Flow 853--867
Changhao Ding and
Zhangjie Fu and
Zhongliang Yang and
Qi Yu and
Daqiu Li and
Yongfeng Huang Context-Aware Linguistic Steganography
Model Based on Neural Machine
Translation . . . . . . . . . . . . . . 868--878
Zainab Alhakeem and
Se-In Jang and
Hong-Goo Kang Disentangled Representations in
Local-Global Contexts for Arabic Dialect
Identification . . . . . . . . . . . . . 879--890
Jae-Hong Lee and
Joon-Hyuk Chang Partitioning Attention Weight:
Mitigating Adverse Effect of Incorrect
Pseudo-Labels for Self-Supervised ASR 891--905
Ryo Fukuda and
Katsuhito Sudoh and
Satoshi Nakamura Improving Speech Translation Accuracy
and Time Efficiency With Fine-Tuned
wav2vec 2.0-Based Speech Segmentation 906--916
Seong-Gyun Leem and
Daniel Fulford and
Jukka-Pekka Onnela and
David Gard and
Carlos Busso Selective Acoustic Feature Enhancement
for Speech Emotion Recognition With
Noisy Speech . . . . . . . . . . . . . . 917--929
Alexander Bohlender and
Ann Spriet and
Wouter Tirry and
Nilesh Madhu Spatially Selective Speaker Separation
Using a DNN With a Location Dependent
Feature Extraction . . . . . . . . . . . 930--945
Matan Karo and
Arie Yeredor and
Itshak Lapidot Compact Time-Domain Representation for
Logical Access Spoofed Audio . . . . . . 946--958
Or Berebi and
Zamir Ben-Hur and
David Lou Alon and
Boaz Rafaely Analysis and Design of Head-Tracked
Compensation for Bilateral Ambisonics 959--972
Wei Wang and
Yanmin Qian Universal Cross-Lingual Data Generation
for Low Resource ASR . . . . . . . . . . 973--983
Davide Berghi and
Philip J. B. Jackson Leveraging Visual Supervision for
Array-Based Active Speaker Detection and
Localization . . . . . . . . . . . . . . 984--995
Daniel Aleksander Krause and
Guillermo García-Barrios and
Archontis Politis and
Annamaria Mesaros Binaural Sound Source Distance
Estimation and Localization for a Moving
Listener . . . . . . . . . . . . . . . . 996--1011
Seung-Bin Kim and
Sang-Hoon Lee and
Ha-Yeong Choi and
Seong-Whan Lee Audio Super-Resolution With Robust
Speech Representation Learning of Masked
Autoencoder . . . . . . . . . . . . . . 1012--1022
Omer Musa Battal and
Aykut Koç Automatic Construction of Sememe
Knowledge Bases From Machine Readable
Dictionaries . . . . . . . . . . . . . . 1023--1035
Varun Krishna and
Tarun Sai and
Sriram Ganapathy Representation Learning With Hidden Unit
Clustering for Low Resource Speech
Applications . . . . . . . . . . . . . . 1036--1047
Zhengding Luo and
Dongyuan Shi and
Woon-Seng Gan and
Qirui Huang Delayless Generative Fixed-Filter Active
Noise Control Based on Deep Learning and
Bayesian Filter . . . . . . . . . . . . 1048--1060
Zewen Chi and
Heyan Huang and
Luyang Liu and
Yu Bai and
Xiaoyan Gao and
Xian-Ling Mao Can Pretrained English Language Models
Benefit Non-English NLP Systems in
Low-Resource Scenarios? . . . . . . . . 1061--1074
Rui Liu and
Yifan Hu and
Haolin Zuo and
Zhaojie Luo and
Longbiao Wang and
Guanglai Gao Text-to-Speech for Low-Resource
Agglutinative Language With
Morphology-Aware Language Model
Pre-Training . . . . . . . . . . . . . . 1075--1087
Shu Jiang and
Zuchao Li and
Hai Zhao and
Weiping Ding Entity-Relation Extraction as Full
Shallow Semantic Dependency Parsing . . 1088--1099
Yoav Vered and
Stephen Elliott A Parallel Analog and Digital Adaptive
Feedforward Controller for Active Noise
Control . . . . . . . . . . . . . . . . 1100--1108
Puning Zhang and
Rongjian Zhao and
Boran Yang and
Yuexian Li and
Zhigang Yang Integrated Syntactic and Semantic Tree
for Targeted Sentiment Classification
Using Dual-Channel Graph Convolutional
Network . . . . . . . . . . . . . . . . 1109--1124
Xu Wang and
Hainan Zhang and
Shuai Zhao and
Hongshen Chen and
Zhuoye Ding and
Zhiguo Wan and
Bo Cheng and
Yanyan Lan Debiasing Counterfactual Context With
Causal Inference for Multi-Turn Dialogue
Reasoning . . . . . . . . . . . . . . . 1125--1132
Hoang Ngoc Chau and
Tien Dat Bui and
Huu Binh Nguyen and
Thanh Thi Hien Duong and
Quoc Cuong Nguyen A Novel Approach to Multi-Channel Speech
Enhancement Based on Graph Neural
Networks . . . . . . . . . . . . . . . . 1133--1144
Yuchen Hu and
Chen Chen and
Qiushi Zhu and
Eng Siong Chng Wav2code: Restore Clean Speech
Representations via Codebook Lookup for
Noise-Robust ASR . . . . . . . . . . . . 1145--1156
Tetsuya Ueda and
Tomohiro Nakatani and
Rintaro Ikeshita and
Keisuke Kinoshita and
Shoko Araki and
Shoji Makino Blind and Spatially-Regularized Online
Joint Optimization of Source Separation,
Dereverberation, and Noise Reduction . . 1157--1172
Vibhav Agarwal and
Sourav Ghosh and
Harichandana BSS and
Himanshu Arora and
Barath Raj Kandur Raja TrICy: Trigger-Guided Data-to-Text
Generation With Intent Aware
Attention-Copy . . . . . . . . . . . . . 1173--1184
Christoph Boeddeker and
Aswin Shanmugam Subramanian and
Gordon Wichern and
Reinhold Haeb-Umbach and
Jonathan Le Roux TS-SEP: Joint Diarization and Separation
Conditioned on Estimated Speaker
Embeddings . . . . . . . . . . . . . . . 1185--1197
Reza Varzandeh and
Simon Doclo and
Volker Hohmann Speech-Aware Binaural DOA Estimation
Utilizing Periodicity and Spatial
Features in Convolutional Neural
Networks . . . . . . . . . . . . . . . . 1198--1213
Yigitcan Özer and
Meinard Müller Source Separation of Piano Concertos
Using Musically Motivated Augmentation
Techniques . . . . . . . . . . . . . . . 1214--1225
Lior Frenkel and
Shlomo E. Chazan and
Jacob Goldberger Domain Adaptation Using Suitable Pseudo
Labels for Speech Enhancement and
Dereverberation . . . . . . . . . . . . 1226--1236
Jiahao Zhao and
Wenji Mao and
Daniel Dajun Zeng Disentangled Text Representation
Learning With Information-Theoretic
Perspective for Adversarial Robustness 1237--1247
Dong Zhou and
Fang Lei and
Lin Li and
Yongmei Zhou and
Aimin Yang Cross-Modal Interaction via
Reinforcement Feedback for Audio-Lyrics
Retrieval . . . . . . . . . . . . . . . 1248--1260
Xuechen Liu and
Md Sahidullah and
Kong Aik Lee and
Tomi Kinnunen Generalizing Speaker Verification for
Spoof Awareness in the Embedding Space 1261--1273
Shiyao Cui and
Jiangxia Cao and
Xin Cong and
Jiawei Sheng and
Quangang Li and
Tingwen Liu and
Jinqiao Shi Enhancing Multimodal Entity and Relation
Extraction With Variational Information
Bottleneck . . . . . . . . . . . . . . . 1274--1285
Yizhou Tan and
Haojun Ai and
Shengchen Li and
Mark D. Plumbley Acoustic Scene Classification Across
Cities and Devices via Feature
Disentanglement . . . . . . . . . . . . 1286--1297
Orel Ben Zaken and
Anurag Kumar and
Vladimir Tourbabin and
Boaz Rafaely Neural- Network-Based
Direction-of-Arrival Estimation for
Reverberant Speech --- The Importance of
Energetic, Temporal, and Spatial
Information . . . . . . . . . . . . . . 1298--1309
Changsheng Quan and
Xiaofei Li SpatialNet: Extensively Learning Spatial
Information for Multichannel Joint
Speech Separation, Denoising and
Dereverberation . . . . . . . . . . . . 1310--1323
Matthew Baas and
Herman Kamper Disentanglement in a GAN for
Unconditional Speech Synthesis . . . . . 1324--1335
Xian Li and
Nian Shao and
Xiaofei Li Self-Supervised Audio Teacher-Student
Transformer for Both Clip-Level and
Frame-Level Tasks . . . . . . . . . . . 1336--1351
Yifan Chen and
Gaofeng Cheng and
Runyan Yang and
Pengyuan Zhang and
Yonghong Yan Interrelate Training and Clustering for
Online Speaker Diarization . . . . . . . 1352--1364
Sheng Feng and
Xiaoqian Zhu and
Shuqing Ma Masking Hierarchical Tokens for
Underwater Acoustic Target Recognition
With Self-Supervised Learning . . . . . 1365--1379
Yangyang Zhao and
Kai Yin and
Zhenyu Wang and
Mehdi Dastani and
Shihan Wang Decomposed Deep $Q$-Network for Coherent
Task-Oriented Dialogue Policy Learning 1380--1391
Jayneel Parekh and
Sanjeel Parekh and
Pavlo Mozharovskyi and
Gaël Richard and
Florence d'Alché-Buc Tackling Interpretability in Audio
Classification Networks With
Non-negative Matrix Factorization . . . 1392--1405
Xiuying Chen and
Shen Gao and
Mingzhe Li and
Qingqing Zhu and
Xin Gao and
Xiangliang Zhang Write Summary Step-by-Step: a Pilot
Study of Stepwise Summarization . . . . 1406--1415
Changkai Lin and
Hongju Cheng and
Qiang Rao and
Yang Yang M$^3$SA: Multimodal Sentiment Analysis
Based on Multi-Scale Feature Extraction
and Multi-Task Learning . . . . . . . . 1416--1429
Rui-Chen Zheng and
Yang Ai and
Zhen-Hua Ling Incorporating Ultrasound Tongue Images
for Audio-Visual Speech Enhancement . . 1430--1444
Ritujoy Biswas and
Karan Nathwani and
Vinayak Abrol Statistically Guided Near-End Speech
Intelligibility Improvement Through
Voice Transformation and Transfer
Learning . . . . . . . . . . . . . . . . 1445--1456
Linhui Sun and
Shuo Yuan and
Aifei Gong and
Lei Ye and
Eng Siong Chng Dual-Branch Modeling Based on
State-Space Model for Speech Enhancement 1457--1467
Alkis Koudounas and
Eliana Pastor and
Giuseppe Attanasio and
Vittorio Mazzia and
Manuel Giollo and
Thomas Gueudre and
Elisa Reale and
Luca Cagliero and
Sandro Cumani and
Luca de Alfaro and
Elena Baralis and
Daniele Amberti Towards Comprehensive Subgroup
Performance Analysis in Speech Models 1468--1480
Wenmeng Xiong and
Changchun Bao and
Jing Zhou and
Maoshen Jia and
José Picheral Joint DOA Estimation and Dereverberation
Based on Multi-Channel Linear Prediction
Filtering and Azimuth Sparsity . . . . . 1481--1493
Yehav Alkaher and
Israel Cohen Howling Detection and Gain Control for
Speech Reinforcement in a Noisy Car
Cabin Environment . . . . . . . . . . . 1494--1505
Xinfa Zhu and
Yi Lei and
Tao Li and
Yongmao Zhang and
Hongbin Zhou and
Heng Lu and
Lei Xie METTS: Multilingual Emotional
Text-to-Speech by Cross-Speaker and
Cross-Lingual Emotion Transfer . . . . . 1506--1518
Myeonghun Jeong and
Minchan Kim and
Byoung Jin Choi and
Jaesam Yoon and
Won Jang and
Nam Soo Kim Transfer Learning for Low-Resource,
Multi-Lingual, and Zero-Shot
Multi-Speaker Text-to-Speech . . . . . . 1519--1530
Jiadi Yao and
Hong Luo and
Jun Qi and
Xiao-Lei Zhang Interpretable Spectrum Transformation
Attacks to Speaker Recognition Systems 1531--1545
Xiang Chen and
Lei Li and
Yuqi Zhu and
Shumin Deng and
Chuanqi Tan and
Fei Huang and
Luo Si and
Ningyu Zhang and
Huajun Chen Sequence Labeling as Non-Autoregressive
Dual-Query Set Generation . . . . . . . 1546--1558
Lei Liu and
Li Liu and
Haizhou Li Computation and Parameter Efficient
Multi-Modal Fusion Transformer for Cued
Speech Recognition . . . . . . . . . . . 1559--1572
Adrián Barahona-Ríos and
Tom Collins NoiseBandNet: Controllable Time-Varying
Neural Synthesis of Sound Effects Using
Filterbanks . . . . . . . . . . . . . . 1573--1585
Siyuan Wang and
Zhongyu Wei and
Jiarong Xu and
Taishan Li and
Zhihao Fan Unifying Structure Reasoning and
Language Pre-Training for Complex
Reasoning Tasks . . . . . . . . . . . . 1586--1595
Yijing Chu and
Sipei Zhao and
Feng Niu and
Yongzheng Dong and
Yuezhe Zhao A New Diffusion Filtered-$X$ Affine
Projection Algorithm: Performance
Analysis and Application in Windy
Environment . . . . . . . . . . . . . . 1596--1608
Yuquan Le and
Zhe Quan and
Jiawei Wang and
Da Cao and
Kenli Li $ R^2 $: a Novel Recall & Ranking
Framework for Legal Judgment Prediction 1609--1622
Xiaotong Jiang and
Ruirui Bai and
Zhongqing Wang and
Guodong Zhou Cross-Domain Aspect-Based Sentiment
Classification With Tripartite Graph
Modeling . . . . . . . . . . . . . . . . 1623--1635
Zhengyang Chen and
Bing Han and
Shuai Wang and
Yanmin Qian Attention-Based Encoder-Decoder
End-to-End Neural Diarization With
Embedding Enhancer . . . . . . . . . . . 1636--1649
Chenfeng Miao and
Qingying Zhu and
Minchuan Chen and
Jun Ma and
Shaojun Wang and
Jing Xiao EfficientTTS 2: Variational End-to-End
Text-to-Speech Synthesis and Voice
Conversion . . . . . . . . . . . . . . . 1650--1661
Orel Peretz and
Israel Cohen Constant Elevation-Beamwidth Beamforming
With Concentric Ring Arrays . . . . . . 1662--1672
Zhibin Quan and
Chi-Man Vong and
Weili Zeng and
Wankou Yang The MorPhEMe Machine: an Addressable
Neural Memory for Learning
Knowledge-Regularized Deep
Contextualized Chinese Embedding . . . . 1673--1686
Lijian Gao and
Qirong Mao and
Ming Dong On Local Temporal Embedding for
Semi-Supervised Sound Event Detection 1687--1698
Xuehao Zhou and
Mingyang Zhang and
Yi Zhou and
Zhizheng Wu and
Haizhou Li Accented Text-to-Speech Synthesis With
Limited Data . . . . . . . . . . . . . . 1699--1711
Vinay Kothapally and
John H. L. Hansen Monaural Speech Dereverberation Using
Deformable Convolutional Networks . . . 1712--1723
Taihui Wang and
Feiran Yang and
Jun Yang Multichannel Linear Prediction-Based
Speech Dereverberation Considering
Sparse and Low-Rank Priors . . . . . . . 1724--1735
Saurabh Kataria and
Jesús Villalba and
Laureano Moro-Velázquez and
Piotr \.Zelasko and
Najim Dehak Time-Domain Speech Super-Resolution With
GAN Based Modeling for Telephony Speaker
Verification . . . . . . . . . . . . . . 1736--1749
Marco Olivieri and
Amy Bastine and
Mirco Pezzoli and
Fabio Antonacci and
Thushara Abhayapala and
Augusto Sarti Acoustic Imaging With Circular
Microphone Array: a New Approach for
Sound Field Analysis . . . . . . . . . . 1750--1761
Tengfei Liu and
Yongli Hu and
Junbin Gao and
Yanfeng Sun and
Baocai Yin Hierarchical Multi-Granularity
Interaction Graph Convolutional Network
for Long Document Classification . . . . 1762--1775
Douglas O'Shaughnessy Review of Methods for Automatic Speaker
Verification . . . . . . . . . . . . . . 1776--1789
Etienne Thuillier and
Craig T. Jin and
Vesa Välimäki HRTF Interpolation Using a Spherical
Neural Process Meta-Learner . . . . . . 1790--1802
Xun Gong and
Yu Wu and
Jinyu Li and
Shujie Liu and
Rui Zhao and
Xie Chen and
Yanmin Qian Advanced Long-Content Speech Recognition
With Factorized Neural Transducer . . . 1803--1815
Yoshiki Masuyama and
Kouei Yamaoka and
Takao Kawamura and
Nobutaka Ono Efficient Joint Optimization of Sampling
Rate Offsets Using Entire Multichannel
Signal . . . . . . . . . . . . . . . . . 1816--1828
Takaaki Saeki and
Soumi Maiti and
Xinjian Li and
Shinji Watanabe and
Shinnosuke Takamichi and
Hiroshi Saruwatari Text-Inductive Graphone-Based Language
Adaptation for Low-Resource Speech
Synthesis . . . . . . . . . . . . . . . 1829--1844
Yingming Gao and
Peter Birkholz and
Ya Li Articulatory Copy Synthesis Based on the
Speech Synthesizer VocalTractLab and
Convolutional Recurrent Neural Networks 1845--1858
Théo Mariotte and
Anthony Larcher and
Silvio Montrésor and
Jean-Hugh Thomas Channel-Combination Algorithms for
Robust Distant Voice Activity and
Overlapped Speech Detection . . . . . . 1859--1872
Luciana M. X. de Souza and
Márcio H. Costa and
Renata Coelho Borges Envelope-Based Multichannel Noise
Reduction for Cochlear Implant
Applications . . . . . . . . . . . . . . 1873--1884
Linjian Li and
Yi Cai and
Xin Wu Unsupervised Disentanglement Learning
Model for Exemplar-Guided Paraphrase
Generation . . . . . . . . . . . . . . . 1885--1900
Amir Ivry and
Israel Cohen and
Baruch Berdugo A User-Centric Approach for Deep
Residual-Echo Suppression in Double-Talk 1901--1914
Geng Zhang and
Jin Liu and
Guangyou Zhou and
Kunsong Zhao and
Zhiwen Xie and
Bo Huang Question-Directed Reasoning With
Relation-Aware Graph Attention Network
for Complex Question Answering Over
Knowledge Graph . . . . . . . . . . . . 1915--1927
Yu Yao and
Peng Yang and
Guangzhen Zhao and
Guoshun Yin KGAgent: Learning a Deep Reinforced
Agent for Keyphrase Generation . . . . . 1928--1940
Jiahong Li and
Chenda Li and
Yifei Wu and
Yanmin Qian Unified Cross-Modal Attention: Robust
Audio-Visual Speech Recognition and
Beyond . . . . . . . . . . . . . . . . . 1941--1953
Mieszko Fra\'s and
Konrad Kowalczyk Reverberant Source Separation Using NTF
With Delayed Subsources and Spatial
Priors . . . . . . . . . . . . . . . . . 1954--1967
Rui Wang and
Li Li and
Tomoki Toda Dual-Channel Target Speaker Extraction
Based on Conditional Variational
Autoencoder and Directional Information 1968--1979