Big data:Case Study Analysis of Meteorological and Oceanographic Data on Hive

Tracking #: 531-1511

Authors:
NameORCID
Usman AbdullahiORCID logo https://orcid.org/0000-0001-9645-3642
Rohiza Ahmad
M Nordin Zakaria


Submission Type: 

Research Paper

Abstract: 

Tools and techniques for big data analytics are emerging and developing. This has gone a long way in motivating industries and organizations to embrace and explore research in the big data area. These tools, techniques and systems offer solutions to the challenges faced by traditional database systems in handling big data. Also, Real-world big data datasets, benchmarks and workloads are evolving. However, these big data datasets, benchmarks and workloads are representative of a particular aspect of Information Technology industries. Datasets from other sources may differ in its nature and complexity from the ones available in literature. Thus, there is the need for using data from other domains to evaluate the performance and maturity of the big data technologies. This paper conducted case study analysis (benchmarking) of Meteorological and Oceanographic data on Hive. Our aim is to expose the approach to formatting and loading of such type of data.The response time for indexed and non-indexed retrievals for the case study data was determined and compared with that of the benchmark data. The experimental results show that the response time of the case study data is higher than that of the benchmark data for both indexed and Non-indexed retrievals. Also, the indexed retrieval shows better response time for Type 1 -'SELECT...WHERE' queries for all data sizes and Type 3 -'SELECT...WHERE...GROUPBY' queries for data sizes of 100GB and less. The retrieval shows additional overhead for Type 2 'SELECT...JOIN...WHERE' queries and Type 3 queries for 500GB and 1TB. The response time of the benchmark data gives a steady curve for both retrievals with slight increase for Type 2 query at 500GB and 1Tb. The Meteorological and Oceanographic data when properly formatted its analytics with Hive proved to be efficient compared to the traditional database systems. The results of this study has the potentials of attracting the oil and gas companies to adopt big data technologies for the handling of their exploration dataset.

Manuscript: 

Tags: 

  • Reviewed

Data repository URLs: 

http://prof.ict.ac.cn/BigDataBench http://hpc.utp.edu.my

Date of Submission: 

Monday, April 2, 2018

Decision: 

Reject (Pre-Screening)