Query Optimization for Big Data Batch Processing and Stream Processing
Radhya Sahal*
CONFIRM Centre for Smart Manufacturing, University College Cork, Ireland
*Corresponding Author: Radhya Sahal, CONFIRM Centre for Smart Manufacturing, School of Computer Science and IT, University College Cork, Ireland.
Received:
November 11, 2021; Published: December 13, 2021
Abstract
Big data refers to huge and complex data sets made up of a variety of structured and unstructured data that are too big, too fast and too hard to be managed by traditional techniques. Big data exceeds the processing capacity of conventional database systems. Recently, new technologies have been invented to analyze and query this massive data. In this work, we have introduced two types of big data query optimization including batch processing and streaming processing.
Keywords: Query; Optimization; DBMS; Batch Data Processing; MapReduce; Stream Data Processing
References
- HFK Abraham., et al. “Database System Concepts”. The 6th McGraw-Hill (2011).
- R Sahal., et al. “Automatic calibration of database cost model in cloud computing”. in 2012 8th International conference on informatics and systems (INFOS) (2012): CC-25-CC-34.
- G Papakonstantinou and J Kontos. "A query-oriented file organization technique”. International Journal of Systems Science 5 (1974): 743-751.
- H Turtle and J Flood. "Query evaluation: Strategies and optimizations”. Information Processing and Management 31 (1995): 831-850.
- S Chaudhuri. "An overview of query optimization in relational systems”. in Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, Seattle, Washington, United States (1998): 34-43.
- R Sahal S., et al. “GPSO: An improved search algorithm for resource allocation in cloud databases”. in 2013 ACS International Conference on Computer Systems and Applications (AICCSA) (2013): 1-8.
- JC Freytag. "The Basic Principles of Query Optimization in Relational Database Management Systems”. in Proceedings of 11th IFIP World Computer Congress, San Francisco (1989): 801-807.
- S Chu., et al. “From theory to practice: Efficient join query evaluation in a parallel database system”. in Proceedings of ACM SIGMOD International Conference on Management of Data (2015): 63-78.
- FA Omara., et al. “Optimum Resource Allocation of Database in Cloud Computing”. Egyptian Informatics Journal 15 (2014): 1-12.
- R Akerkar. “Big data computing”. CRC Press (2013).
- Gkoulalas-Divanis and A Labbi. “Large-Scale Data Analytics”. Springer (2016).
- Thusoo A., et al. “Hive: a warehousing solution over a map-reduce framework”. PVLDB 2 (2009): 1626-1629.
- Thusoo A., et al. “Hive-a petabyte scale data warehouse using Hadoop”. in 26th IEEE International Conference on Data Engineering (ICDE) (2010): 996-1005.
- Olston B., et al. “Pig latin: a not-so-foreign language for data processing”. in Proceedings of ACM SIGMOD international conference on management of data (2008): 1099-1110.
- S Wu., et al. “Query optimization for massively parallel data processing”. in Proceedings of the 2nd ACM Symposium on Cloud Computing (2011): 12.
- R Sahal., et al. “SOOM: Sort-Based Optimizer for Big Data Multi-Query”. Big Data 8 (2020): 38-61.
- X-Y Gao., et al. “Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink”. Complexity 2020 (2020).
- R Sahal., et al. “Exploiting Coarse-grained Reused-based Opportunities in Big Data Multi-Query Optimization”. Journal of Computational Science 26 (2018): 432-452.
- R Sahal., et al. “iHOME: Index-based JOIN Query Optimization for Limited Big Data Storage”. Journal of Grid Computing 16 (2018): 345-380.
- R Sahal., et al. “Big data multi-query optimisation with Apache Flink”. International Journal of Web Engineering and Technology 13 (2018): 78-97.
- R Sahal., et al. “Big data and stream processing platforms for Industry 4.0 requirements mapping for a predictive maintenance use case”. Journal of Manufacturing Systems 54 (2020): 138-151.
- R Sahal., et al. “On Evaluating the Impact of Changes in IoT Data Streams Rate over Query Window Configurations”. in Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems (2019): 262-263.
- R Sahal., et al. “Industry 4.0 towards Forestry 4.0: Fire Detection Use Case”. Sensors 21 (2021): 694.
Citation
Copyright