How Apache MapReduce Handles Big Data Query?
Faculty of Computer Science and Engineering, Hodeidah University, Al Hudaydah, Yemen
*Corresponding Author: Radhya Sahal, Faculty of Computer Science and Engineering, Hodeidah University, Al Hudaydah, Yemen.
December 20, 2021; Published: January 18, 2022
Apache MapReduce is the most popular framework for batch data processing. However, despite its merits, the critical challenge of Apache MapReduce is rapidly handling queries over large scale data. This review aims to provide the state-of-the-art of Apache Hive, a famous language to handle big query data on Apache MapReduce.
Keywords: Query Processing; Apache MapReduce; Hive; HiveQL
- J Dean and S Ghemawat. "MapReduce: simplified data processing on large clusters”. Communications of the ACM 51 (2008): 107-113.
- R Lämmel. "Google’s MapReduce programming model-Revisited”. Science of Computer Programming 70 (2008): 1-30.
- S Wu., et al. “Query optimization for massively parallel data processing”. In Proceedings of the 2nd ACM Symposium on Cloud Computing (2011): 12.
- J Dean and S Ghemawat. “MapReduce: a flexible data processing tool”. Communications of the ACM 53 (2010): 72-77.
- R Sahal., et al. “Exploiting Coarse-grained Reused-based Opportunities in Big Data Multi-Query Optimization”. Journal of Computational Science 26 (2018): 432-452.
- R Sahal., et al. “Comparative Study of Multi-query Optimization Techniques using Shared Predicate-based for Big Data”. International Journal of Grid and Distributed Computing 9 (2016): 229-240.
- R Sahal. et al. “iHOME: Index-based JOIN Query Optimization for Limited Big Data Storage”. Journal of Grid Computing 16 (2018): 345-380.
- X-Y Gao. et al. “Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink”. Complexity (2020): 2020.
- A Thusoo. et al. “Hive: a warehousing solution over a map-reduce framework”. PVLDB 2 (2009): 1626-1629.
- A Thusoo. et al. “Hive-a petabyte scale data warehouse using Hadoop”. In 26th IEEE International Conference on Data Engineering (ICDE) (2010): 996-1005.
- J LeFevre. et al. “Opportunistic physical design for big data analytics”. In Proceedings of ACM SIGMOD international conference on management of data (2014): 851-862.
- HSA Azez. et al. “JOUM: An Indexing Methodology for Improving Join in Hive Star schema”. International Journal of Scientific and Engineering Research 6 (2015): 111-119, 2015.
- MN Abdullah. et al. “HOME: HiveQL Optimization in Multi-Session Environment”. In Proceedings of the 5th European Conference of Computer Science (ECCS14) (2014): 80-89.
- T Dokeroglu. et al. “Improving the performance of Hadoop Hive by sharing scan and computation tasks”. Journal of Cloud Computing 3 (2014): 1-11.
- A Gruenheid. et al. “Query optimization using column statistics in hive”. In Proceedings of the 15th Symposium on International Database Engineering and Applications (2011): 97-105.
- E Capriolo. et al. “Programming Hive”. Data warehouse and query language for Hadoop, O'Reilly Media, Inc (2012).
- R Kumar. et al. “Comparison of SQL with HiveQL”. International Journal for Research in Technological Studies 1 (2014): 2348-1439.