An Engineering Approach to Explainable Artificial Intelligence for SQL Query Optimization: Design and Evaluation of the OptiMind Framework

hanan

doi:10.71229/fxy9xp82

Authors

Hanan Abed Alwally Abed Allah Department of Computer Science, College of Science, Mustansiriyah University, Baghdad 10064, Iraq. https://orcid.org/0000-0002-2943-369X

DOI:

https://doi.org/10.71229/fxy9xp82

Keywords:

SQL Query Optimization, Explainable Artificial Intelligence, Query Behavior Profiling, Adaptive Feature Intelligence

Abstract

Query optimization remains a critical challenge in Relational Database Management Systems (RDBMSs) because it directly affects execution efficiency, resource utilization, and overall database performance. Traditional cost-based and rule-based optimizers rely on statistical estimation and heuristic plan enumeration methods that often perform poorly when handling correlated predicates, skewed data distributions, and complex multi-join workloads. Although recent learned query optimization approaches have improved cost estimation accuracy, many existing solutions operate as black-box models with limited interpretability, weak workload adaptability, and insufficient support for database administrator (DBA) analysis and decision-making.

This paper presents OptiMind, an explainable intelligent framework for SQL query optimization that integrates semantic SQL analysis, workload-aware behavior profiling, adaptive feature intelligence, and interpretable optimization scoring into a unified recommendation system. Unlike traditional optimization pipelines, OptiMind emphasizes transparency by generating human-readable optimization recommendations through clause dependency analysis, operator behavior profiling, workload-aware feature ranking, and feature-attribution scoring. The framework improves both optimization accuracy and explainability while maintaining adaptability across varying workload environments and query structures.

OptiMind was evaluated using benchmark and enterprise workloads, including JOB, IMDB, SQLShare, Spider, and enterprise traces. Experimental results demonstrate that the framework reduces mean absolute error (MAE) by 18.4%–31.7% compared with PostgreSQL’s native optimizer and achieves execution speedups ranging from 1.31× to 2.14× on complex multi-join queries. Additionally, OptiMind achieved an interpretability score of 4.51/5.0, confirming strong explanation quality, recommendation usefulness, and practical effectiveness for transparent, workload-adaptive, and explainable SQL query optimization in modern database systems.

References

[1] V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann, “How Good Are Query Optimizers, Really?” Proc. VLDB Endowment, vol. 9, no. 3, pp. 1040–1052, 2015, https://www.vldb.org/pvldb/vol9/p204-leis.pdf DOI: https://doi.org/10.14778/2850583.2850594

[2] R. Marcus, P. Negi, H. Mao, N. Tatbul, M. Alizadeh, and T. Kraska, “Bao: Making Learned Query Optimization Practical,” in Proc. 2022 Int. Conf. Management of Data (SIGMOD), New York, NY, USA: ACM, 2022, pp. 2180–2192. https://dl.acm.org/doi/10.1145/3514221.3526058 DOI: https://doi.org/10.1145/3542700.3542703

[3] A. Pavlo, M. Angulo, A. Arulraj, H. Lin, J. Lin, P. Ma, P. Menon, T. Mowry, M. Perron, I. Quah, S. Santurkar, A. Tomasic, M. Toor, D. Van Aken, Z. Wang, Y. Wu, R. Xian, and T. Zhang, “Self-Driving Database Management Systems,” in Proc. 8th Biennial Conf. Innovative Data Systems Research (CIDR), Chaminade, CA, USA, 2017. https://db.cs.cmu.edu/papers/2017/p42-pavlo-cidr17.pdf [4] L. Woltmann, C. Hartmann, M. Thiele, W. Lehner, and D. Habich, “Learned Cardinality Estimation: An In-Depth Study,” ACM Transactions on Database Systems, vol. 48, no. 2, Art. no. 7, pp. 1–41, Jun. 2023, doi: 10.1145/3589306. DOI: https://doi.org/10.1145/3589306

[5] R. Zhu, Z. Wu, Y. Han, K. Zeng, A. Pfadler, Z. Qian, J. Zhou, and B. Cui, “FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation,” Proc. VLDB Endowment, vol. 14, no. 9, pp. 1489–1502, 2021. https://www.vldb.org/pvldb/vol14/p1489-zhu.pdf DOI: https://doi.org/10.14778/3461535.3461539

[6] A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina, R. Benjamins, R. Chatila, and F. Herrera, “Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI,” Inf. Fusion, vol. 58, pp. 82–115, 2020. https://www.sciencedirect.com/science/article/abs/pii/S1566253519308103 DOI: https://doi.org/10.1016/j.inffus.2019.12.012

[7] P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price, “Access Path Selection in a Relational Database Management System,” in Proc. ACM SIGMOD Int. Conf. Management of Data, Boston, MA, USA, 1979, pp. 23–34, doi: 10.1145/582095.582099. DOI: https://doi.org/10.1145/582095.582099

[8] P. Negi, Z. Wu, A. Gupta, M. Alizadeh, R. Marcus, T. Kraska, S. Madden, and N. Tatbul, “Robust Query Driven Cardinality Estimation Under Changing Workloads,” Proc. VLDB Endowment, vol. 16, no. 7, pp. 1520–1533, 2023. https://www.vldb.org/pvldb/vol16/p1520-negi.pdf DOI: https://doi.org/10.14778/3583140.3583164

[9] R. Marcus and O. Papaemmanouil, “Bao: Learning to Steer Query Optimizers,” IEEE Data Eng. Bull., vol. 44, no. 2, pp. 11–23, 2021. doi: 10.1145/3514221.3526058.

[10] D. V. Aken, D. Yang, S. Brillard, A. Fiorino, B. Zhang, C. Bilien, and A. Pavlo, “An Inquiry into Machine Learning-Based Automatic Configuration Tuning Services on Real-World Database Management Systems,” Proc. VLDB Endowment, vol. 14, no. 7, pp. 1241–1253, 2021. https://www.cs.cmu.edu/~pavlo/papers/p1241-aken.pdf DOI: https://doi.org/10.14778/3450980.3450992

[11] L. Ma, W. Zhang, J. Jiao, W. Wang, M. Butrovich, W. S. Lim, P. Menon, and A. Pavlo, “MB2: Decomposed Behavior Modeling for Self-Driving Database Management Systems,” in Proc. 2021 Int. Conf. Management of Data (SIGMOD), New York, NY, USA: ACM, 2021, pp. 1248–1261. https://dl.acm.org/doi/abs/10.1145/3448016.3457276 DOI: https://doi.org/10.1145/3448016.3457276

[12] E. Triantaphyllou, Multi-Criteria Decision Making Methods: A Comparative Study. Boston, MA, USA: Springer, 2000, doi: 10.1007/978-1-4757-3157-6. DOI: https://doi.org/10.1007/978-1-4757-3157-6

[13] T. Tervonen, Theory and Methods of Multi-Criteria Decision Analysis. Cham, Switzerland: Springer, 2022. https://link.springer.com/book/10.1007/978-3-030-99079-4

[14] D. B. Blumenthal, N. Boria, J. Gamper, S. Bougleux, and L. Brun, “Comparing Heuristics for Graph Edit Distance Computation,” VLDB J., vol. 29, no. 1, pp. 419–458, 2020. https://link.springer.com/article/10.1007/s00778-019-00544-1 DOI: https://doi.org/10.1007/s00778-019-00544-1

[15] S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee, “From Local Explanations to Global Understanding with Explainable AI for Trees,” Nat. Mach. Intell., vol. 2, no. 1, pp. 56–67, 2020. https://www.nature.com/articles/s42256-019-0138-9 DOI: https://doi.org/10.1038/s42256-019-0138-9

[16] V. Leis, P. Boncz, A. Kemper, and T. Neumann, “Query Optimization through the Looking Glass, and What We Found Running the Join Order Benchmark,” VLDB J., vol. 27, no. 5, pp. 643–446, 668. https://link.springer.com/article/10.1007/s00778-017-0480-7 DOI: https://doi.org/10.1007/s00778-017-0480-7

[17] IMDb Dataset of 50K Movie Reviews (Kaggle)

[18] A. Jain, H. Patel, L. Nagalapatti, N. Gupta, S. Mehta, S. Guttula, S. Mujumdar, S. Afzal, R. Sharma Mittal, and V. Munigala, “Overview and Importance of Data Quality for Machine Learning Tasks,” in Proc. 26th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2020, pp. 3561–3562. https://dl.acm.org/doi/abs/10.1145/3394486.3406477 DOI: https://doi.org/10.1145/3394486.3406477

[19] StackOverflow Dataset (Kaggle)

[20] YU, Tao, et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In: Proceedings of the 2018 conference on empirical methods in natural language processing. 2018. p. 3911-3921.‏ https://aclanthology.org/D18-1425.pdf DOI: https://doi.org/10.18653/v1/D18-1425