Authors :
S.M. Yousuf Iqbal Tomal
Volume/Issue :
Volume 9 - 2024, Issue 5 - May
Google Scholar :
https://tinyurl.com/5mjr6peb
Scribd :
https://tinyurl.com/mreuf7kn
DOI :
https://doi.org/10.38124/ijisrt/IJISRT24MAY1625
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
This paper presents a sentiment analysis
project focusing on IMDb movie reviews, aimed at
classifying reviews as either positive or negative based on
their textual content. Utilizing a dataset of 50,000 IMDb
movie reviews, sourced from Kaggle, the study addresses
the binary classification challenge by employing pre-
processing techniques such as TF-IDF vectorization. The
dataset is split into training and testing sets, with models
trained on the former and evaluated on the latter. Three
machine learning algorithms—Logistic Regression,
Random Forest, and Decision Tree—are implemented
and compared using performance metrics including
precision, recall, and F1-score. Results indicate that
Logistic Regression outperforms other models in
sentiment analysis classification. The report concludes by
highlighting the project’s contributions and suggesting
avenues for future research, emphasizing the potential
benefits of expanding sentiment types and dataset size.
References :
- Svetlana Kiritchenko, Xiaodan Zhu, and Saif M. Mohammad. Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research, 50:723–762, 2014.
- Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedingsofthe49thAnnualMeetingoftheAssociation for Computational Linguistics: Human Language Technologies, pages 142–150, 2011.
- Soujanya Poria, Erik Cambria, and Alexander Gelbukh. Aspect extraction for opinion mining with a deep convolutional neural network. Knowledge-Based Systems, 108:42–49, 2016.
- Sonia Rodr´ıguez-Fernandez and Francisco Ortega. Analysis of the factors influencing sentiment analysis´ accuracy. Expert Systems with Applications, 77:185–200, 2017.
- Bogdan I. Vasilescu, Alexander Serebrenik, and Premkumar Devanbu. How social q&a sites are changing knowledge sharing in open source software communities. IEEE Transactions on Software Engineering, 41 (9):900–912, 2015.
This paper presents a sentiment analysis
project focusing on IMDb movie reviews, aimed at
classifying reviews as either positive or negative based on
their textual content. Utilizing a dataset of 50,000 IMDb
movie reviews, sourced from Kaggle, the study addresses
the binary classification challenge by employing pre-
processing techniques such as TF-IDF vectorization. The
dataset is split into training and testing sets, with models
trained on the former and evaluated on the latter. Three
machine learning algorithms—Logistic Regression,
Random Forest, and Decision Tree—are implemented
and compared using performance metrics including
precision, recall, and F1-score. Results indicate that
Logistic Regression outperforms other models in
sentiment analysis classification. The report concludes by
highlighting the project’s contributions and suggesting
avenues for future research, emphasizing the potential
benefits of expanding sentiment types and dataset size.