Automatic Speech Recognition Using Deep Neural Networks: New Possibilities

Abdel-Hamid, Ossama Abdel-Hamid Mohamed

Automatic Speech Recognition Using Deep Neural Networks: New Possibilities

dc.contributor.advisor	Jiang, Hui
dc.creator	Abdel-Hamid, Ossama Abdel-Hamid Mohamed
dc.date.accessioned	2015-08-28T15:13:34Z
dc.date.available	2015-08-28T15:13:34Z
dc.date.copyright	2014-11-07
dc.date.issued	2015-08-28
dc.date.updated	2015-08-28T15:13:34Z
dc.degree.discipline	Computer Science
dc.degree.level	Doctoral
dc.degree.name	PhD - Doctor of Philosophy
dc.description.abstract	Recently, automatic speech recognition (ASR) systems that use deep neural networks (DNNs) for acoustic modeling have attracted huge research interest. This is due to the recent results that have significantly raised the state of the art performance of ASR systems. This dissertation proposes a number of new methods to improve the state of the art ASR performance by exploiting the power of DNNs. The first method exploits domain knowledge in designing a special neural network (NN) structure called a convolutional neural network (CNN). This dissertation proposes to use the CNN in a way that applies convolution and pooling operations along frequency to handle frequency variations that commonly happen due to speaker and pronunciation differences in speech signals. Moreover, a new CNN structure called limited weight sharing is proposed to better suit special spectral characteristics of speech signals. Our experimental results have shown that the use of a CNN leads to 6-9% relative reduction in error rate. The second proposed method deals with speaker variations in a more explicit way through using a new speaker code based adaptation. This method adapts the speech acoustic model to a new speaker by learning a suitable speaker representation based on a small amount of adaptation data from each target speaker. This method alleviates the need to modify any model parameters as is done with other commonly used adaptation methods for neural networks. This greatly reduces the number of parameters to estimate during adaptation; hence, it allows rapid speaker adaptation. The third proposed method aims to handle the temporal structure within speech segments by using a deep segmental neural network (DSNN). The DSNN model alleviates the need to use an HMM model as it directly models the posterior probability of the label sequence. Moreover, a segment-aware NN structure has been proposed. It is able to model the dependency among speech frames within each segment and performs better than the conventional frame based DNNs. Experimental results show that the proposed DSNN can significantly improve recognition performance as compared with the conventional frame based models.
dc.identifier.uri	http://hdl.handle.net/10315/29980
dc.language.iso	en
dc.rights	Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subject	Computer science
dc.subject.keywords	Automatic speech recognition
dc.subject.keywords	Neural networks
dc.subject.keywords	Speaker adaptation
dc.subject.keywords	Convolutional neural networks
dc.subject.keywords	Hidden Markov models
dc.subject.keywords	Segmental speech recognition
dc.subject.keywords	Speaker code.
dc.subject.keywords	Speaker representation
dc.title	Automatic Speech Recognition Using Deep Neural Networks: New Possibilities
dc.type	Electronic Thesis or Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Abdelhamid_Ossama_A_2014_Phd.pdf
Size:: 4.04 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: license.txt
Size:: 1.83 KB
Format:: Plain Text
Description:

Download

Name:: YorkU_ETDlicense.txt
Size:: 3.38 KB
Format:: Plain Text
Description:

Download

Collections

Computer Science and Engineering