Journal of Proteomics & Bioinformatics

Journal of Proteomics & Bioinformatics
Open Access

ISSN: 0974-276X

Abstract

NLPARG: Comparative Neural Word Embedding Approaches for Antibiotic Resistance Prediction

Daniel Ananey Obiri and Kristen L Rhinehardt*

Antibiotic resistance increasingly has become a threat to global health as it hampers the efficacy of current antibiotics and the development of new antibiotic drugs. Appropriate identification of Antibiotic Resistance Genes (ARGs) is fundamental to the administration of the right antibiotics and for epidemiological purposes. However, mechanisms for identifying antimicrobial resistance such as minimum inhibitory concentration are tedious and time consuming. Also, using sequence similarity-based models have also not been able to identify novel ARGs which are highly diverse compared to known genes. To explore ARGs among genomic sequences, we comparatively applied three-word embedding techniques, Global Vectors (GloVe), Skip-Gram (SG) and Continuous Bag of Words (CBOW) to bacterial sequences and subsequently using the word vectors as embedding layers in Bidirectional Recurrent Neural Networks (BiRNNs) to classify bacterial sequences as ARGs or not. Among our three models, BiRNN with GloVe embedding layer achieved the highest accuracy of >97% on test dataset. Our models were able to identify novel resistance genes with high recalls >0.99% and precision >91%. Our models outperformed commonly used bioinformatics baseline models, Basic Local Alignment Search Tool (BLAST), Resistant Gene Identifier (RGI), Fragmented Antibiotic Resistance Gene (fARGene) and HMMER. Deep learning models with word embedding layers provide efficient tools for identifying diverse and novel resistance genes.

Published Date: 2023-12-06; Received Date: 2023-11-06

Top