Authors : Dylan Coelho; Cliff Machado; Leon Correia; Shree Jaswal; Neil Fernando
Volume/Issue : Volume 7 - 2022, Issue 4 - April
Google Scholar : https://bit.ly/3IIfn9N
Scribd : https://bit.ly/3Nsllyl
DOI : https://doi.org/10.5281/zenodo.6562532
Big Data Tools and Machine learning
algorithms have been applied to data analytics and
prediction frequently. This paper evaluates and
illustrates the differences between SQL and NoSQL for
storage of Big Data and processing and compares various
algorithms used for analysis and predictions. The paper
shows our basic understanding of Hadoop and Spark
cloud and compares the two platforms on various
parameters such as the time taken for input data and the
time taken for the output data and the total memory used
by the databases. The system has implementing the
Databases in Hadoop and Spark.In Hadoop, the Hive
database will be used for implementingthe SQL part
and Cassandra for NOSQL. In Spark the SQLpart will
be implemented using Post GreSQL and NOSQL uses
MongoDB. We get the end results by comparing various
parameters like the input, output data and the total
memory used will be represented graphically after
which a user will be in a position to choose the
appropriate database accordingto their requirements.
Additionally, we will also be studyingand comparing
various Machine Learning algorithms by implementing
them on the selected dataset. To compare the algorithms,
we will be considering parameters of Accuracy, Root
Mean Square Error and Mean Absolute Value.
Choosing the right machine learning algorithm can be
difficult, but doing so is essential to answering the given
question with great speed and accuracy. In order for the
user to yield the required insights, algorithms must be
carefully analysed and studied upon considering
parameters like these. The final research results will be
illustrated with the help of graph on a UI which will help
to better understand the results obtained on our
selected datasetfor this particular paper.
Keywords : Hadoop, NoSQL, Spark, SQL.