Fake News Detection Using Naive Bayes Classifier: A Comparative Study

Abhinandan Yadav; Devaraju Venkata Rao

doi:10.54060/jmss.2023.22

Authors

Abhinandan Yadav Amity School of Engineering and Technology, Amity University Uttar Pradesh, Lucknow Campus, India
Dr. Devaraju Venkata Rao Department of Accounting and Management, Global Institute of Management, Mangalpally, Tehsil Ibrahimpatnam, Telangana, India

DOI:

https://doi.org/10.54060/jmss.2023.22

Keywords:

Classification Problem, Confusion Matrix, CountVectorizer, Fake news, Naïve Bayes Classifier, TF-IDF Vectorizer

Abstract

Machine learning is a subfield of artificial intelligence (AI) and computer science that utilizes data and algorithms to imitate how people learn, progressively improving its accuracy. Machine learning is an important component of the growing field of data science. Through the use of statistical methods, algorithms are trained to make classifications or predictions, uncovering key insights. Detecting fake news comes under a classification problem. Fake news is false or misleading information presented as news. The initial stage in classification is dataset collection, which is followed by preprocessing, feature selection, dataset training and testing, and finally executing the classifier. There is a large amount of written text in the news. This text is processed using NLP. NLP can perform an intelligent analysis of large amounts of plain written text and generate insights from it. It involves methods like data preprocessing and feature selection. Data pre-processing involves data cleaning, removing any incorrect, duplicate, or incomplete data within a dataset. Feature selection is done using the CountVectorizer and TF-IDF Vectorizer. Then comes dataset training and testing and the use of similar data for training and testing reduces the impact of data inconsistencies. After processing the model using the training set, the model is tested by making predictions against the test set. Then, to assess the performance of the classification model for the provided set of test data confusion matrix is used. The primary purpose is to use the Naive Bayes (NB) Classifier technique to generate two classification models one using CountVectorizer and other using TF-IDF Vectorizer and compare their accuracy.

Downloads

Download data is not yet available.

References

X. Zhang and A. A. Ghorbani, “An overview of online fake news: Characterization, detection, and discussion,” Inf. Process. Manag., vol. 57, no. 2, p. 102025, 2020.

S. Raza and C. Ding, “Fake news detection based on news content and social contexts: a transformer-based approach,” Int. J. Data Sci. Anal., vol. 13, no. 4, pp. 335–362, 2022.

J. C. S. Reis, A. Correia, F. Murai, A. Veloso, and F. Benevenuto, “Supervised learning for fake news detection,” IEEE Intell. Syst., vol. 34, no. 2, pp. 76–81, 2019.

“Unsupervised learning: Algorithms and examples,” AltexSoft, 14-Apr-2021.

A. Jain and A. Kasbe, “Fake News Detection,” in 2018 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), 2018, pp. 1–5.

A. Singh, P. Singh, A. K. Tiwari, “A comprehensive survey on Machine Learning,” J. Manage. Serv. Sci., vol. 1, no. 1, pp. 1–17, 2021. DOI: https://doi.org/10.54060/JMSS/001.01.003

I. H. Sarker, “Machine learning: Algorithms, real-world applications and research directions,” SN Comput Sci, vol. 2, no. 3, p. 160, 2021.

N. Singh Kushwaha and P. Singh, “Fake News Detection using Machine Learning: A Comprehensive Analysis”,” J. Manage. Serv. Sci, vol. 2, no. 1, pp. 1–15, 2022. DOI: https://doi.org/10.54060/JMSS/002.01.001

I. Ahmad, M. Yousaf, S. Yousaf, and M. O. Ahmad, “Fake news detection using machine learning ensemble methods,” Complexity, vol. 2020, pp. 1–11, 2020.

U. Sharma, S. Saran, and M. Shankar, “Fake News Detection using Machine Learning Algorithms,” INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) NTASU, vol. 2020, no. 03, 2021.

S. Aphiwongsophon and P. Chongstitvatana, “Detecting fake news with machine learning method,” in 2018 15th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), 2018.

Z. Khanam, B. N. Alwasel, H. Sirafi, and M. Rashid, “Fake news detection using machine learning approaches,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1099, no. 1, p. 012040, 2021.

Z. Tian and S. Baskiyar, “Fake news detection using machine learning with feature selection,” in 2021 6th International Conference on Computing, Communication and Security (ICCCS), 2021, pp. 1–6.

G. Carleo et al., “Machine learning and the physical sciences,” arXiv [physics.comp-ph], 2019.

Z. Guo, “Text classification based on naive Bayes with adjusted weights via frequency ratio of feature words,” in 2021 International Conference on Computer Technology and Media Convergence Design (CTMCD), 2021, pp. 263–267.

R. Srivastava, P. Singh, “Fake news Detection Using Naive Bayes Classifier,” J. Manage. Serv. Sci., vol. 2, no. 1, pp. 1–7, 2022. DOI: https://doi.org/10.54060/JMSS/002.01.005

M. Granik and V. Mesyura, “Fake news detection using naive Bayes classifier,” in 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), 2017.

P. Jain, S. Sharma, Monica, and P. K. Aggarwal, “Classifying fake news detection using SVM, naive Bayes and LSTM,” in 2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2022, pp. 460–464.

S. Sarlis and I. Maglogiannis, “On the reusability of sentiment analysis datasets in applications with dissimilar contexts,” in IFIP Advances in Information and Communication Technology, Cham: Springer International Publishing, 2020, pp. 409–418.

L. Qadi, H. Rifai, S. Obaid, and A. Elnagar, “A scalable shallow learning approach for tagging Arabic news articles,” Jordanian j. comput. inf. technol., no. 0, p. 1, 2020.