Optimization of Rabin Karp Pattern Matching Algorithm Based on Parallel Computing Techniques for DNA Sequence Analysis

Show simple item record

dc.contributor.author Anjalee, M.G.M.
dc.contributor.author Fernando, W.P.U.
dc.contributor.author Thambawita, D.R.V.L.B.
dc.date.accessioned 2019-04-06T06:33:07Z
dc.date.available 2019-04-06T06:33:07Z
dc.date.issued 2019-02
dc.identifier.isbn 9789550481255
dc.identifier.uri http://www.erepo.lib.uwu.ac.lk/bitstream/handle/123456789/109/70.pdf?sequence=1&isAllowed=y
dc.description.abstract String matching algorithms are used to discover the occurrences of a defined pattern in a given text or a pool of strings which is widely used in detecting plagiarism, spam filtering and most importantly in computational biology including DNA sequencing. The existence and the intensity of a muted sequence in DNA caused for various diseases can be identified using Rabin Karp string matching algorithm. The main contribution of the study is to bring an efficient version of Rabin Karp algorithm by minimizing the spurious hits while using both Central Processing Unit (CPU) parallel techniques and General Purpose Graphics Processing Unit (GPGPU) parallel techniques specifically for DNA sequence analysis. The improved Rabin Karp is implemented using C language with POSIX Threads library, OpenMP and MPI and using Compute Unified Device Architecture (CUDA). When accelerating computations based on GPU, a special consideration has given to global memory, shared memory and texture memory, the types of memories with particular importance offered in CUDA architecture. By experimental studies, we investigated a new method to eliminate brute force matching and the GPU optimization is presented with stencil method ensuring efficiency in terms of memory overhead due to redundant data access in the serial CPU implementation. We have compared these parallel implementations for evaluating the effect of varying number of threads per block as well as varying DNA file sizes. The results obtained in this study present that the proposed implementation provides acceleration surpassing 36x speedup for string size 220 characters compared to a sequential (CPU) implementation. Eventually, using the empirical results, we could conclude that the improved CUDA C implementation of shared memory version can achieve 35 times of performance than serial implementation for a large pool of DNA data in string matching. en_US
dc.language.iso en en_US
dc.publisher Uva Wellassa University of Sri Lanka en_US
dc.subject Computer Science en_US
dc.subject Information Science en_US
dc.subject Computing and Information Science en_US
dc.title Optimization of Rabin Karp Pattern Matching Algorithm Based on Parallel Computing Techniques for DNA Sequence Analysis en_US
dc.title.alternative International Research Conference 2019 en_US
dc.type Other en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search UWU eRepository


My Account