GaussDB-AISQL: a composable cloud-native SQL system with AI capabilities (original) (raw)
References
Marrandino, Alessandro. Machine Learning with BigQuery ML: Create, execute, and improve machine learning models in BigQuery using standard SQL queries. Packt Publishing Ltd, 2021. Google Scholar
Park K, Saur K, Banda D, Sen R, Interlandi M, Karanasos K. End-to-end optimization of machine learning prediction queries. In: Proceedings of 2022 International Conference on Management of Data, SIGMOD’ 22. 2022, 587–601 Chapter Google Scholar
Huang B, Babu S, Yang J. Cumulon: optimizing statistical data analysis in the cloud. In: Proceedings of 2013 ACM SIGMOD International Conference on Management of Data. 2013, 1–12 MATH Google Scholar
Cohen J, Dolan B, Dunlap M, Hellerstein J M, Welton C. MAD skills: new analysis practices for big data. Proceedings of the VLDB Endowment, 2009, 2(2): 1481–1492 Article Google Scholar
Lin Q, Wu S, Zhao J, Dai J, Li F, Chen G. A comparative study of in-database inference approaches. In: Proceedings of the 38th IEEE International Conference on Data Engineering (ICDE). 2022, 1794–1807 MATH Google Scholar
Wang Y, Yang Y, Zhu W, Wu Y, Yan X, Liu Y, Wang Y, Xie L, Gao Z, Zhu W, Chen X, Yan W, Tang M, Tang Y. SQLFLow: a bridge between SQL and machine learning. 2020, arXiv preprint arXiv: 2001.06846 MATH Google Scholar
Wang D, Andres J, Weisz J D, Oduor E, Dugan C. AutoDS: towards human-centered automation of data science. In: Proceedings of 2021 CHI Conference on Human Factors in Computing Systems. 2021, 79 MATH Google Scholar
Jordan M I, Mitchell T M. Machine learning: trends, perspectives, and prospects. Science, 2015, 349(6245): 255–260 ArticleMathSciNetMATH Google Scholar
Paganelli M, Sottovia P, Park K, Interlandi M, Guerra F. Pushing ML predictions into DBMSs. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(10): 10295–10308 Article Google Scholar
Chai C, Wang J, Tang N, Yuan Y, Liu J, Deng Y, Wang G. Efficient coreset selection with cluster-based methods. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023, 167–178 ChapterMATH Google Scholar
Kumar A, Naughton J, Patel J M. Learning generalized linear models over normalized data. In: Proceedings of 2015 ACM SIGMOD International Conference on Management of Data. 2015, 1969–1984 ChapterMATH Google Scholar
Psallidas F, Zhu Y, Karlas B, Interlandi M, Floratou A, Karanasos K, Wu W, Zhang C, Krishnan S, Curino C, Weimer M. Data science through the looking glass and what we found there. 2019, arXiv preprint arXiv: 1912.09536 Google Scholar
Grinsztajn L, Oyallon E, Varoquaux G. Why do tree-based models still outperform deep learning on typical tabular data? In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 37 MATH Google Scholar
Depoutovitch A, Chen C, Chen J, Larson P, Lin S, Ng J, Cui W, Liu Q, Huang W, Xiao Y, He Y. Taurus database: how to be fast, available, and frugal in the cloud. In: Proceedings of 2020 ACM SIGMOD International Conference on Management of Data. 2020, 1463–1478 Chapter Google Scholar
Ma Y, Xie S, Zhong H, Lee L, Lv K. HiEngine: how to architect a cloud-native memory-optimized database engine. In: Proceedings of 2022 International Conference on Management of Data. 2022, 2177–2190 ChapterMATH Google Scholar
Shen J, Zuo P, Luo X, Su Y, Gu J, Feng H, Zhou Y, Lyu M R. Ditto: an elastic and adaptive memory-disaggregated caching system. In: Proceedings of the 29th Symposium on Operating Systems Principles. 2023, 675–691 ChapterMATH Google Scholar
Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, et al. GPT-4 technical report. 2023, arXiv preprint arXiv: 2303.08774 Google Scholar
Ren X, Zhou P, Meng X, Huang X, Wang Y, Wang W, Li P, Zhang X, Podolskiy A, Arshinov G, Bout A, Piontkovskaya I, Wei J, Jiang X, Su T, Liu Q, Yao J. PanGu-Σ: Towards trillion parameter language model with sparse heterogeneous computing. 2023, arXiv preprint arXiv: 2303.10845 Google Scholar
Khamis M A, Ngo H Q, Nguyen X, Olteanu D, Schleich M. Learning models over relational data using sparse tensors and functional dependencies. ACM Transactions on Database Systems, 2020, 45(2): 7 ArticleMathSciNetMATH Google Scholar
Kadra A, Lindauer M, Hutter F, Grabocka J. Well-tuned simple nets excel on tabular datasets. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 1832 MATH Google Scholar
Bej S, Davtyan N, Wolfien M, Nassar M, Wolkenhauer O. LoRAS: an oversampling approach for imbalanced datasets. Machine Learning, 2021, 110(2): 279–301 ArticleMathSciNetMATH Google Scholar
Kotelnikov A, Baranchuk D, Rubachev I, Babenko A. TabDDPM: modelling tabular data with diffusion models. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 725 Google Scholar
Feurer M, Klein A, Eggensperger K, Springenberg J T, Blum M, Hutter F. Efficient and robust automated machine learning. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 2755–2763 Google Scholar
Yakovlev A, Moghadam H F, Moharrer A, Cai J, Chavoshi N, Varadarajan V, Agrawal S R, Idicula S, Karnagel T, Jinturkar S, Agarwal N. Oracle AutoML: a fast and predictive AutoML pipeline. Proceedings of the VLDB Endowment, 2020, 13(12): 3166–3180 Article Google Scholar
Li Y, Shen Y, Zhang W, Zhang C, Cui B. VolcanoML: speeding up end-to-end AutoML via scalable search space decomposition. The VLDB Journal, 2023, 32(2): 389–413 ArticleMATH Google Scholar
Patki N, Wedge R, Veeramachaneni K. The synthetic data vault. In: Proceedings of 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA). 2016, 399–410 Google Scholar
Pedreira P, Erling O, Karanasos K, Schneider S, McKinney W, Valluri S R, Zait M, Nadeau J. The composable data management system manifesto. Proceedings of the VLDB Endowment, 2023, 16(10): 2679–2685 Article Google Scholar
Wilhite D. GoogleSQL: A SQL language as a component. In: Proceedings of the 1st International Workshop on Composable Data Management Systems. 2022 MATH Google Scholar
Chattopadhyay B, Pedreira P, Agarwal S, Sun Y, Vakharia S, Li P, Liu W, Narayanan S. Shared foundations: modernizing Meta’s data lakehouse. In: Proceedings of the 13th Conference on Innovative Data Systems Research. 2023 Google Scholar
Begoli E, Camacho-Rodríguez J, Hyde J, Mior M J, Lemire D. Apache calcite: a foundational framework for optimized query processing over heterogeneous data sources. In: Proceedings of 2018 International Conference on Management of Data. 2018, 221–230 Chapter Google Scholar
Soliman M A, Antova L, Raghavan V, El-Helw A, Gu Z, Shen E, Caragea G C, Garcia-Alvarado C, Rahman F, Petropoulos M, Waas F, Narayanan S, Krikellas K, Baldwin R. Orca: a modular query optimizer architecture for big data. In: Proceedings of 2014 ACM SIGMOD International Conference on Management of Data. 2014, 337–348 Chapter Google Scholar
Pedreira P, Erling O, Basmanova M, Wilfong K, Sakka L, Pai K, He W, Chattopadhyay B. Velox: Meta’s unified execution engine. Proceedings of the VLDB Endowment, 2022, 15(12): 3372–3384 Article Google Scholar
Microsoft. Microsoft SQL server machine learning services. website, 2024 Google Scholar
Karanasos K, Interlandi M, Psallidas F, Sen R, Park K, Popivanov I, Xin D, Nakandal S, Krishnan S, Weimer M, Yu Y, Ramakrishnan R, Curino C. Extending relational query processing with ML inference. In: Proceedings of the 10th Conference on Innovative Data Systems Research (CIDR 2020). 2020 Google Scholar
Corporation I. IBM db2 machine learning. website, 2024 Google Scholar
Li F. Modernization of databases in the cloud era: building databases that run like Legos. Proceedings of the VLDB Endowment, 2023, 16(12): 4140–4151 ArticleMATH Google Scholar
Hellerstein J M, Ré C, Schoppmann F, Wang D Z, Fratkin E, Gorajek A, Ng K S, Welton C, Feng X, Li K, Kumar A. The MADlib analytics library: or MAD skills, the SQL. Proceedings of the VLDB Endowment, 2012, 5(12): 1700–1711 Article Google Scholar
Del Buono F, Paganelli M, Sottovia P, Interlandi M, Guerra F. Transforming ML predictive pipelines into SQL with MASQ. In: Proceedings of 2021 International Conference on Management of Data. 2021, 2696–2700 ChapterMATH Google Scholar
Schule M, Lang H, Springer M, Kemper A, Neumann T, Gunnemann S. In-database machine learning with SQL on GPUs. In: Proceedings of the 33rd International Conference on Scientific and Statistical Database Management, SSDBM’ 21. 2021, 25–36 Google Scholar
Olteanu D. The relational data Borg is learning. Proceedings of the VLDB Endowment, 2020, 13(12): 3502–3515 ArticleMATH Google Scholar
Gandhi A, Asada Y, Fu V, Gemawat A, Zhang L, Sen R, Curino C, Camacho-Rodríguez J, Interlandi M. The tensor data platform: towards an AI-centric database system. In: Proceedings of the 13th Conference on Innovative Data Systems Research. 2023 Google Scholar
Ghorbani M, Shaikhha A. Demonstration of OpenDBML, a framework for democratizing in-database machine learning. Proceedings of the VLDB Endowment, 2023, 16(12): 3970–3973 ArticleMATH Google Scholar
Miao H, Li A, Davis L S, Deshpande A. Towards unified data and lifecycle management for deep learning. In: Proceedings of the IEEE 33rd International Conference on Data Engineering (ICDE). 2017, 571–582 Google Scholar
Wang X, Dong X L, Meliou A. Data x-ray: a diagnostic tool for data errors. In: Proceedings of 2015 ACM SIGMOD International Conference on Management of Data. 2015, 1231–1245 ChapterMATH Google Scholar
Vartak M, da Trindade J M F, Madden S, Zaharia M. MISTIQUE: a system to store and query model intermediates for model diagnosis. In: Proceedings of 2018 International Conference on Management of Data. 2018, 1285–1300 Chapter Google Scholar