Acta Scientific Computer Sciences

Research Article Volume 3 Issue 1

Query Optimization for Big Data Batch Processing and Stream Processing

Radhya Sahal*

CONFIRM Centre for Smart Manufacturing, University College Cork, Ireland

*Corresponding Author: Radhya Sahal, CONFIRM Centre for Smart Manufacturing, School of Computer Science and IT, University College Cork, Ireland.

Received: November 11, 2021; Published: December 13, 2021

Abstract

  Big data refers to huge and complex data sets made up of a variety of structured and unstructured data that are too big, too fast and too hard to be managed by traditional techniques. Big data exceeds the processing capacity of conventional database systems. Recently, new technologies have been invented to analyze and query this massive data. In this work, we have introduced two types of big data query optimization including batch processing and streaming processing.


Keywords: Query; Optimization; DBMS; Batch Data Processing; MapReduce; Stream Data Processing

References

  1. HFK Abraham., et al. “Database System Concepts”. The 6th McGraw-Hill (2011).
  2. R Sahal., et al. “Automatic calibration of database cost model in cloud computing”. in 2012 8th International conference on informatics and systems (INFOS) (2012): CC-25-CC-34.
  3. G Papakonstantinou and J Kontos. "A query-oriented file organization technique”. International Journal of Systems Science 5 (1974): 743-751.
  4. H Turtle and J Flood. "Query evaluation: Strategies and optimizations”. Information Processing and Management 31 (1995): 831-850.
  5. S Chaudhuri. "An overview of query optimization in relational systems”. in Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, Seattle, Washington, United States (1998): 34-43.
  6. R Sahal S., et al. “GPSO: An improved search algorithm for resource allocation in cloud databases”. in 2013 ACS International Conference on Computer Systems and Applications (AICCSA) (2013): 1-8.
  7. JC Freytag. "The Basic Principles of Query Optimization in Relational Database Management Systems”. in Proceedings of 11th IFIP World Computer Congress, San Francisco (1989): 801-807.
  8. S Chu., et al. “From theory to practice: Efficient join query evaluation in a parallel database system”. in Proceedings of ACM SIGMOD International Conference on Management of Data (2015): 63-78.
  9. FA Omara., et al. “Optimum Resource Allocation of Database in Cloud Computing”. Egyptian Informatics Journal 15 (2014): 1-12.
  10. R Akerkar. “Big data computing”. CRC Press (2013).
  11. Gkoulalas-Divanis and A Labbi. “Large-Scale Data Analytics”. Springer (2016).
  12. Thusoo A., et al. “Hive: a warehousing solution over a map-reduce framework”. PVLDB 2 (2009): 1626-1629.
  13. Thusoo A., et al. “Hive-a petabyte scale data warehouse using Hadoop”. in 26th IEEE International Conference on Data Engineering (ICDE) (2010): 996-1005.
  14. Olston B., et al. “Pig latin: a not-so-foreign language for data processing”. in Proceedings of ACM SIGMOD international conference on management of data (2008): 1099-1110.
  15. S Wu., et al. “Query optimization for massively parallel data processing”. in Proceedings of the 2nd ACM Symposium on Cloud Computing (2011): 12.
  16. R Sahal., et al. “SOOM: Sort-Based Optimizer for Big Data Multi-Query”. Big Data 8 (2020): 38-61.
  17. X-Y Gao., et al. “Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink”. Complexity 2020 (2020).
  18. R Sahal., et al. “Exploiting Coarse-grained Reused-based Opportunities in Big Data Multi-Query Optimization”. Journal of Computational Science 26 (2018): 432-452.
  19. R Sahal., et al. “iHOME: Index-based JOIN Query Optimization for Limited Big Data Storage”. Journal of Grid Computing 16 (2018): 345-380.
  20. R Sahal., et al. “Big data multi-query optimisation with Apache Flink”. International Journal of Web Engineering and Technology 13 (2018): 78-97.
  21. R Sahal., et al. “Big data and stream processing platforms for Industry 4.0 requirements mapping for a predictive maintenance use case”. Journal of Manufacturing Systems 54 (2020): 138-151.
  22. R Sahal., et al. “On Evaluating the Impact of Changes in IoT Data Streams Rate over Query Window Configurations”. in Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems (2019): 262-263.
  23. R Sahal., et al. “Industry 4.0 towards Forestry 4.0: Fire Detection Use Case”. Sensors 21 (2021): 694.

Citation

Citation: Radhya Sahal. “Query Optimization for Big Data Batch Processing and Stream Processing". Acta Scientific Computer Sciences 3.1 (2022): 04-07.

Copyright

Copyright: © 2022 Radhya Sahal. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.




Metrics

Acceptance rate35%
Acceptance to publication20-30 days

Indexed In




News and Events


  • Certification for Review
    Acta Scientific certifies the Editors/reviewers for their review done towards the assigned articles of the respective journals.
  • Submission Timeline for Upcoming Issue
    The last date for submission of articles for regular Issues is December 25, 2024.
  • Publication Certificate
    Authors will be issued a "Publication Certificate" as a mark of appreciation for publishing their work.
  • Best Article of the Issue
    The Editors will elect one Best Article after each issue release. The authors of this article will be provided with a certificate of "Best Article of the Issue"

Contact US