An Engineering Approach to Explainable Artificial Intelligence for SQL Query Optimization: Design and Evaluation of the OptiMind Framework

Authors

  • Hanan Abed Alwally Abed Allah Department of Computer Science, College of Science, Mustansiriyah University, Baghdad 10064, Iraq.

DOI:

https://doi.org/10.71229/fxy9xp82

Keywords:

SQL Query Optimization, Explainable Artificial Intelligence, Query Behavior Profiling, Adaptive Feature Intelligence

Abstract

Query optimization remains a critical challenge in Relational Database Management Systems (RDBMSs) because it directly affects execution efficiency, resource utilization, and overall database performance. Traditional cost-based and rule-based optimizers rely on statistical estimation and heuristic plan enumeration methods that often perform poorly when handling correlated predicates, skewed data distributions, and complex multi-join workloads. Although recent learned query optimization approaches have improved cost estimation accuracy, many existing solutions operate as black-box models with limited interpretability, weak workload adaptability, and insufficient support for database administrator (DBA) analysis and decision-making.

This paper presents OptiMind, an explainable intelligent framework for SQL query optimization that integrates semantic SQL analysis, workload-aware behavior profiling, adaptive feature intelligence, and interpretable optimization scoring into a unified recommendation system. Unlike traditional optimization pipelines, OptiMind emphasizes transparency by generating human-readable optimization recommendations through clause dependency analysis, operator behavior profiling, workload-aware feature ranking, and feature-attribution scoring. The framework improves both optimization accuracy and explainability while maintaining adaptability across varying workload environments and query structures.

OptiMind was evaluated using benchmark and enterprise workloads, including JOB, IMDB, SQLShare, Spider, and enterprise traces. Experimental results demonstrate that the framework reduces mean absolute error (MAE) by 18.4%–31.7% compared with PostgreSQL’s native optimizer and achieves execution speedups ranging from 1.31× to 2.14× on complex multi-join queries. Additionally, OptiMind achieved an interpretability score of 4.51/5.0, confirming strong explanation quality, recommendation usefulness, and practical effectiveness for transparent, workload-adaptive, and explainable SQL query optimization in modern database systems.

References

[1] V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann, “How Good Are Query Optimizers, Really?” Proc. VLDB Endowment, vol. 9, no. 3, pp. 1040–1052, 2015, https://www.vldb.org/pvldb/vol9/p204-leis.pdf

[2] R. Marcus, P. Negi, H. Mao, N. Tatbul, M. Alizadeh, and T. Kraska, “Bao: Making Learned Query Optimization Practical,” in Proc. 2022 Int. Conf. Management of Data (SIGMOD), New York, NY, USA: ACM, 2022, pp. 2180–2192. https://dl.acm.org/doi/10.1145/3514221.3526058

[3] A. Pavlo, M. Angulo, A. Arulraj, H. Lin, J. Lin, P. Ma, P. Menon, T. Mowry, M. Perron, I. Quah, S. Santurkar, A. Tomasic, M. Toor, D. Van Aken, Z. Wang, Y. Wu, R. Xian, and T. Zhang, “Self-Driving Database Management Systems,” in Proc. 8th Biennial Conf. Innovative Data Systems Research (CIDR), Chaminade, CA, USA, 2017. https://db.cs.cmu.edu/papers/2017/p42-pavlo-cidr17.pdf [4] L. Woltmann, C. Hartmann, M. Thiele, W. Lehner, and D. Habich, “Learned Cardinality Estimation: An In-Depth Study,” ACM Transactions on Database Systems, vol. 48, no. 2, Art. no. 7, pp. 1–41, Jun. 2023, doi: 10.1145/3589306.

[5] R. Zhu, Z. Wu, Y. Han, K. Zeng, A. Pfadler, Z. Qian, J. Zhou, and B. Cui, “FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation,” Proc. VLDB Endowment, vol. 14, no. 9, pp. 1489–1502, 2021. https://www.vldb.org/pvldb/vol14/p1489-zhu.pdf

[6] A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina, R. Benjamins, R. Chatila, and F. Herrera, “Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI,” Inf. Fusion, vol. 58, pp. 82–115, 2020. https://www.sciencedirect.com/science/article/abs/pii/S1566253519308103

[7] P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price, “Access Path Selection in a Relational Database Management System,” in Proc. ACM SIGMOD Int. Conf. Management of Data, Boston, MA, USA, 1979, pp. 23–34, doi: 10.1145/582095.582099.

[8] P. Negi, Z. Wu, A. Gupta, M. Alizadeh, R. Marcus, T. Kraska, S. Madden, and N. Tatbul, “Robust Query Driven Cardinality Estimation Under Changing Workloads,” Proc. VLDB Endowment, vol. 16, no. 7, pp. 1520–1533, 2023. https://www.vldb.org/pvldb/vol16/p1520-negi.pdf

[9] R. Marcus and O. Papaemmanouil, “Bao: Learning to Steer Query Optimizers,” IEEE Data Eng. Bull., vol. 44, no. 2, pp. 11–23, 2021. doi: 10.1145/3514221.3526058.

[10] D. V. Aken, D. Yang, S. Brillard, A. Fiorino, B. Zhang, C. Bilien, and A. Pavlo, “An Inquiry into Machine Learning-Based Automatic Configuration Tuning Services on Real-World Database Management Systems,” Proc. VLDB Endowment, vol. 14, no. 7, pp. 1241–1253, 2021. https://www.cs.cmu.edu/~pavlo/papers/p1241-aken.pdf

[11] L. Ma, W. Zhang, J. Jiao, W. Wang, M. Butrovich, W. S. Lim, P. Menon, and A. Pavlo, “MB2: Decomposed Behavior Modeling for Self-Driving Database Management Systems,” in Proc. 2021 Int. Conf. Management of Data (SIGMOD), New York, NY, USA: ACM, 2021, pp. 1248–1261. https://dl.acm.org/doi/abs/10.1145/3448016.3457276

[12] E. Triantaphyllou, Multi-Criteria Decision Making Methods: A Comparative Study. Boston, MA, USA: Springer, 2000, doi: 10.1007/978-1-4757-3157-6.

[13] T. Tervonen, Theory and Methods of Multi-Criteria Decision Analysis. Cham, Switzerland: Springer, 2022. https://link.springer.com/book/10.1007/978-3-030-99079-4

[14] D. B. Blumenthal, N. Boria, J. Gamper, S. Bougleux, and L. Brun, “Comparing Heuristics for Graph Edit Distance Computation,” VLDB J., vol. 29, no. 1, pp. 419–458, 2020. https://link.springer.com/article/10.1007/s00778-019-00544-1

[15] S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee, “From Local Explanations to Global Understanding with Explainable AI for Trees,” Nat. Mach. Intell., vol. 2, no. 1, pp. 56–67, 2020. https://www.nature.com/articles/s42256-019-0138-9

[16] V. Leis, P. Boncz, A. Kemper, and T. Neumann, “Query Optimization through the Looking Glass, and What We Found Running the Join Order Benchmark,” VLDB J., vol. 27, no. 5, pp. 643–446, 668. https://link.springer.com/article/10.1007/s00778-017-0480-7

[17] IMDb Dataset of 50K Movie Reviews (Kaggle)

[18] A. Jain, H. Patel, L. Nagalapatti, N. Gupta, S. Mehta, S. Guttula, S. Mujumdar, S. Afzal, R. Sharma Mittal, and V. Munigala, “Overview and Importance of Data Quality for Machine Learning Tasks,” in Proc. 26th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2020, pp. 3561–3562. https://dl.acm.org/doi/abs/10.1145/3394486.3406477

[19] StackOverflow Dataset (Kaggle)

[20] YU, Tao, et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In: Proceedings of the 2018 conference on empirical methods in natural language processing. 2018. p. 3911-3921.‏ https://aclanthology.org/D18-1425.pdf

fig 1

Downloads

Published

2026-06-14

Issue

Section

Original Articles

How to Cite

An Engineering Approach to Explainable Artificial Intelligence for SQL Query Optimization: Design and Evaluation of the OptiMind Framework. (2026). Al-Noor Journal of Engineering Management and Computer Science, 2(1), 87-105. https://doi.org/10.71229/fxy9xp82

Similar Articles

You may also start an advanced similarity search for this article.