Automate Data Classification in an Unstructured Data Flow using Self-Organizing Maps


Authors : Dilushinie Narmada Fernando; Dr. Lakmal Rupasinghe

Volume/Issue : Volume 7 - 2022, Issue 3 - March

Google Scholar : https://bit.ly/3IIfn9N

Scribd : https://bit.ly/3tRuIAo

DOI : https://doi.org/10.5281/zenodo.6395394

Abstract : Nowadays, when protecting the information of an organization, professionals would consider the level of confidentiality and sensitivity of the data as a major concern. This is reflected in a manual process where ideas, decisions, and expectations of the data owners and other professionals classify data according to their perspectives. The classification of data will depend on the decisions made by humans and expose sensitive data to many users who are unauthorized to access and alter it. This research was developed to reduce the involvement of humans in making decisions on data classification and divided them into different clusters according to the level of confidentiality. The system divides documents into 3 major categories, such as confidential, sensitive, and public data, using the unsupervised self-organizing map method, which is an artificial neural network originally designed for the clustering of high-dimensional data samples onto a low-dimensional map.

Keywords : Information Technology, Intellectual Property, Self-Organizing Map, Information retrieval, Statistical Natural Language Processing

Nowadays, when protecting the information of an organization, professionals would consider the level of confidentiality and sensitivity of the data as a major concern. This is reflected in a manual process where ideas, decisions, and expectations of the data owners and other professionals classify data according to their perspectives. The classification of data will depend on the decisions made by humans and expose sensitive data to many users who are unauthorized to access and alter it. This research was developed to reduce the involvement of humans in making decisions on data classification and divided them into different clusters according to the level of confidentiality. The system divides documents into 3 major categories, such as confidential, sensitive, and public data, using the unsupervised self-organizing map method, which is an artificial neural network originally designed for the clustering of high-dimensional data samples onto a low-dimensional map.

Keywords : Information Technology, Intellectual Property, Self-Organizing Map, Information retrieval, Statistical Natural Language Processing

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe