11/9/2023 0 Comments Praat vocal toolkit![]() ![]() Within each batch at both the discriminator and the generator, we zero-pad the speech samples until all are of identical temporal length. The architectures of the multilayer convolutional neural network (CNN) in the generator and the discriminator are shown in Table 1. The lower panel is the generator while the upper panel is the discriminator. The structure of the model we used is shown in Figure 2. In our study, we take advantage of adversarial training and use DCGANs to transform healthy speech to ALS speech. As an alternative, we use deep convolutional generative adversarial networks (DCGANs) that has been used for speech synthesis postfiltering to transform speech features. However, this assumption fails for ALS speech, where there is a significant deviation from healthy in the acoustic representation of phonemes. In the most recent work in, the authors used variational autoencoding Wasserstein generative adversarial network (VAW-GAN) to transform speech features, with the assumption that there is a latent variable representing the common phonetic content. Adversarial training has been recently explored in the field of voice conversion. In contrast to these studies, the present study attempts to transform healthy speech to pathological speech for the purpose of data augmentation during model training rather than improving intelligibility. Speech transformation of dysarthric speech has also been previously studied. This paper is motivated by this idea, however the source and target speakers in our study are a group of healthy speakers and a group of ALS speakers. Relation to previous work: Voice conversion is a technique to modify a source speaker’s speech to be perceived as if a target speaker had spoken it. In a pilot classification experiment, we show that by using the simulated speech samples to balance an existing dataset, the classification accuracy improves by about 10% after data augmentation. ![]() The results reveal that the SLPs identify the transformed speech as dysarthric 65% of the time. We present the transformed samples to five experienced speech-language pathologists (SLPs) and ask them to identify the samples as healthy or dysarthric. We evaluate the efficacy of our approach using both objective and subjective criteria. In this paper, we propose a method for simulating training data for clinical applications by transforming healthy speech to dysarthric speech using adversarial training. As a result, clinical speech applications are typically developed using small data sets with only tens of speakers. This is problematic for clinical applications where obtaining such data is prohibitively expensive because of privacy concerns or lack of access. Training machine learning algorithms for speech applications requires large, labeled training data sets. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |