A Learning Based Approach for Automatic Text Document Classification


Authors : Ravi Prasad Ravuri

Volume/Issue : Volume 8 - 2023, Issue 6 - June

Google Scholar : https://bit.ly/3TmGbDi

Scribd : https://tinyurl.com/swrxmcfv

DOI : https://doi.org/10.5281/zenodo.8146672

Abstract : Abstract:-Text documents over Internet, social media and in internal applications of various organizations such as judiciary are increasing exponentially. Manual observation of such documents and classifying them for further processing is tedious task. There is need for automatic text document classification. Traditional heuristics based approaches have limitations to scale up to the demand in terms of volumes of input documents. To overcome this problem, machine learning (ML) techniques are used as they can learn from the training data and perform classification. They can also deal with large corpus. However, existing ML models when used directly their performance gets deteriorated due to lack of training quality. In this paper we proposed a framework that has a hybrid approach including feature selection and also ML models towards leveraging prediction performance. Our framework is named as Learning based Text Document Classification Framework (LbTDCF). We also proposed an algorithm known as Intelligent Document Classification Algorithm (IDCA) to realize our framework.

Keywords : Machine Learning, Text Document Classification, Supervised Learning, Intelligent Document Classification

Abstract:-Text documents over Internet, social media and in internal applications of various organizations such as judiciary are increasing exponentially. Manual observation of such documents and classifying them for further processing is tedious task. There is need for automatic text document classification. Traditional heuristics based approaches have limitations to scale up to the demand in terms of volumes of input documents. To overcome this problem, machine learning (ML) techniques are used as they can learn from the training data and perform classification. They can also deal with large corpus. However, existing ML models when used directly their performance gets deteriorated due to lack of training quality. In this paper we proposed a framework that has a hybrid approach including feature selection and also ML models towards leveraging prediction performance. Our framework is named as Learning based Text Document Classification Framework (LbTDCF). We also proposed an algorithm known as Intelligent Document Classification Algorithm (IDCA) to realize our framework.

Keywords : Machine Learning, Text Document Classification, Supervised Learning, Intelligent Document Classification

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe