An Encoder-Decoder Based Basecaller for Nanopore DNA Sequencing

Date

2019-07-02

Authors

Abbaszadegan, Mahdieh

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Nanopore DNA sequencing is a method in which DNA bases are determined (basecalled) using electric current signals generated by passing DNA through nanopore sensors. The raw measured signals can be aggregated into event data presenting new bases entering the nanopore. This thesis has two contributions. First, we implemented RNN-based single- and double-strand basecallers for simulated event data to analyze the effect of signal noise. As the SNR decreased from 20 dB to 5 dB, the accuracy of the single-strand basecaller dropped 9% while the accuracy of double-strand basecaller only dropped 0.5%. Second, we implemented an end-to-end single-strand basecaller, directly processing the raw signal using an encoder-decoder model with attention instead of the CTC-style approach used in available basecallers. We achieved an accuracy of 81.9% for a viral sample and an accuracy of 90.9% for a bacterial sample. Our accuracy is comparable to state-of-the-art basecallers with a considerably smaller model.

Description

Keywords

Computer science

Citation