An Encoder-Decoder Based Basecaller for Nanopore DNA Sequencing

dc.contributor.advisorMagierowski, Sebastian
dc.creatorAbbaszadegan, Mahdieh
dc.date.accessioned2019-07-02T16:13:29Z
dc.date.available2019-07-02T16:13:29Z
dc.date.copyright2019-02-12
dc.date.issued2019-07-02
dc.date.updated2019-07-02T16:13:29Z
dc.degree.disciplineElectrical and Computer Engineering
dc.degree.levelMaster's
dc.degree.nameMASc - Master of Applied Science
dc.description.abstractNanopore DNA sequencing is a method in which DNA bases are determined (basecalled) using electric current signals generated by passing DNA through nanopore sensors. The raw measured signals can be aggregated into event data presenting new bases entering the nanopore. This thesis has two contributions. First, we implemented RNN-based single- and double-strand basecallers for simulated event data to analyze the effect of signal noise. As the SNR decreased from 20 dB to 5 dB, the accuracy of the single-strand basecaller dropped 9% while the accuracy of double-strand basecaller only dropped 0.5%. Second, we implemented an end-to-end single-strand basecaller, directly processing the raw signal using an encoder-decoder model with attention instead of the CTC-style approach used in available basecallers. We achieved an accuracy of 81.9% for a viral sample and an accuracy of 90.9% for a bacterial sample. Our accuracy is comparable to state-of-the-art basecallers with a considerably smaller model.
dc.identifier.urihttp://hdl.handle.net/10315/36268
dc.language.isoen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectComputer science
dc.subject.keywordsDNA Sequencing
dc.subject.keywordsNanopore Sequencing
dc.subject.keywordsDeep Learning
dc.subject.keywordsRecurrent Neural Networks
dc.subject.keywordsSeq2seq
dc.subject.keywordsAttention Mechanism
dc.titleAn Encoder-Decoder Based Basecaller for Nanopore DNA Sequencing
dc.typeElectronic Thesis or Dissertation

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Abbaszadegan_Mahdieh_2019_Masters.pdf
Size:
1.78 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
YorkU_ETDlicense.txt
Size:
3.39 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
license.txt
Size:
1.87 KB
Format:
Plain Text
Description: