Collating sql databases nosql databases and machine learning algorithms for data analysis| International Journal of Innovative Science and Research Technology

Collating SQL Databases, No-SQL Databases and Machine Learning Algorithms for Data Analysis

Authors : Dylan Coelho; Cliff Machado; Leon Correia; Shree Jaswal; Neil Fernando

Volume/Issue : Volume 7 - 2022, Issue 4 - April

Google Scholar : https://bit.ly/3IIfn9N

DOI : https://doi.org/10.5281/zenodo.6562532

Abstract : Big Data Tools and Machine learning algorithms have been applied to data analytics and prediction frequently. This paper evaluates and illustrates the differences between SQL and NoSQL for storage of Big Data and processing and compares various algorithms used for analysis and predictions. The paper shows our basic understanding of Hadoop and Spark cloud and compares the two platforms on various parameters such as the time taken for input data and the time taken for the output data and the total memory used by the databases. The system has implementing the Databases in Hadoop and Spark.In Hadoop, the Hive database will be used for implementingthe SQL part and Cassandra for NOSQL. In Spark the SQLpart will be implemented using Post GreSQL and NOSQL uses MongoDB. We get the end results by comparing various parameters like the input, output data and the total memory used will be represented graphically after which a user will be in a position to choose the appropriate database accordingto their requirements. Additionally, we will also be studyingand comparing various Machine Learning algorithms by implementing them on the selected dataset. To compare the algorithms, we will be considering parameters of Accuracy, Root Mean Square Error and Mean Absolute Value. Choosing the right machine learning algorithm can be difficult, but doing so is essential to answering the given question with great speed and accuracy. In order for the user to yield the required insights, algorithms must be carefully analysed and studied upon considering parameters like these. The final research results will be illustrated with the help of graph on a UI which will help to better understand the results obtained on our selected datasetfor this particular paper.

Keywords : Hadoop, NoSQL, Spark, SQL.

Big Data Tools and Machine learning algorithms have been applied to data analytics and prediction frequently. This paper evaluates and illustrates the differences between SQL and NoSQL for storage of Big Data and processing and compares various algorithms used for analysis and predictions. The paper shows our basic understanding of Hadoop and Spark cloud and compares the two platforms on various parameters such as the time taken for input data and the time taken for the output data and the total memory used by the databases. The system has implementing the Databases in Hadoop and Spark.In Hadoop, the Hive database will be used for implementingthe SQL part and Cassandra for NOSQL. In Spark the SQLpart will be implemented using Post GreSQL and NOSQL uses MongoDB. We get the end results by comparing various parameters like the input, output data and the total memory used will be represented graphically after which a user will be in a position to choose the appropriate database accordingto their requirements. Additionally, we will also be studyingand comparing various Machine Learning algorithms by implementing them on the selected dataset. To compare the algorithms, we will be considering parameters of Accuracy, Root Mean Square Error and Mean Absolute Value. Choosing the right machine learning algorithm can be difficult, but doing so is essential to answering the given question with great speed and accuracy. In order for the user to yield the required insights, algorithms must be carefully analysed and studied upon considering parameters like these. The final research results will be illustrated with the help of graph on a UI which will help to better understand the results obtained on our selected datasetfor this particular paper.

Keywords : Hadoop, NoSQL, Spark, SQL.