SC-COO: A feedback-based service composition algorithm combining offline and online reinforcement learning (original) (raw)
Abstract
Faced with the current dynamic service environment, rapid and efficient service composition has attracted much attention in recent years. The service composition could complete the reuse of existing services and its ultimate goal is to better satisfy users. However, it is challenging to interact with the service environment to collect data in practical applications due to factors such as high cost and risk. To overcome this limitation, this paper proposes the SC-COO method: A feedback-based service composition algorithm combining offline and online reinforcement learning. The SC-COO method mainly consists of two stages: the offline training module (SC-COO-offline) is the main stage, and the online update module (SC-COO-online) is the auxiliary stage. The SC-COO-offline model is trained through collected offline data, avoiding the drawback of online learning requiring multiple iterations to converge. And online training (SC-COO-online) serves as an auxiliary stage to jointly make decisions and recommend services to users to better adapt to dynamic environments. Furthermore, our SC-COO method offers users’ score preferences in service composition by designing a feedback-based reward mechanism. Continuous interactive feedback with humans can significantly improve the robustness of the service composition system. Finally, some experiments on the RapidAPI dataset demonstrate that SC-COO outperforms other baselines in accuracy, scalability, and convergence. And some results of the ablation experiment also verify the efficiency and applicability of SC-COO.
Access this article
Subscribe and save
- Starting from 10 chapters or articles per month
- Access and download chapters and articles from more than 300k books and 2,500 journals
- Cancel anytime View plans
Buy Now
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Instant access to the full article PDF.
Similar content being viewed by others
Data Availability
Data are available upon request.
Notes
References
- Zhang Y, Wang K, He Q, Chen F, Deng S, Zheng Z, Yang Y (2021) Covering-based web service quality prediction via neighborhood-aware matrix factorization. IEEE Trans Serv Comput 14(5):1333–1344. https://doi.org/10.1109/TSC.2019.2891517
Article Google Scholar - Barkat A, Kazar O, Seddiki I (2021) Framework for web service composition based on qos in the multi cloud environment. Int J Inf Technol 13:459–467
Google Scholar - Sangsanit K, Kurutach W, Phoomvuthisarn S (2018) Rest web service composition: A survey of automation and techniques. In: 2018 International conference on information networking (ICOIN), IEEE, pp 116–121
- Xie F, Chen L, Lin D, Zheng Z, Lin X (2019) Personalized service recommendation with mashup group preference in heterogeneous information network. IEEE Access 7:16155–16167
Article Google Scholar - Yao L, Wang X, Sheng QZ, Benatallah B, Huang C (2021) Mashup recommendation by regularizing matrix factorization with api co-invocations. IEEE Trans Serv Comput 14(2):502–515. https://doi.org/10.1109/TSC.2018.2803171
Article Google Scholar - Cao B, Liu J, Wen Y, Li H, Xiao Q, Chen J (2019) Qos-aware service recommendation based on relational topic model and factorization machines for iot mashup applications. J Parallel Distrib Comput 132:177–189
Article Google Scholar - Wang H, Chen X, Wu Q, Yu Q, Zheng Z, Bouguettaya A (2014) Integrating on-policy reinforcement learning with multi-agent techniques for adaptive service composition. In: Service-oriented computing: 12th international conference, ICSOC 2014, Paris, France, November 3-6, 2014. Proceedings 12, Springer, pp 154–168
- Wang H, Wang X, Hu X, Zhang X, Gu M (2016) A multi-agent reinforcement learning approach to dynamic service composition. Inf Sci 363:96–119
Article Google Scholar - Moustafa A (2021) On learning adaptive service compositions. J Syst Sci Syst Eng 30(4):465–481
Article Google Scholar - Driss M, Ben Atitallah S, Albalawi A, Boulila W (2022) Req-wscomposer: a novel platform for requirements-driven composition of semantic web services. Journal of Ambient Intelligence and Humanized Computing pp 1–17
- Li J, Zhong Y, Zhu S, Hao Y (2022) Energy-aware service composition in multi-cloud. J King Saud Univ-Comput Inf Sci 34(7):3959–3967
Article Google Scholar - Kang G, Liu J, Cao B, Cao M (2020) Nafm: neural and attentional factorization machine for web api recommendation. In: 2020 IEEE international conference on web services (ICWS), IEEE, pp 330–337
- Gu Q, Cao J, Liu Y (2021) Csbr: A compositional semantics-based service bundle recommendation approach for mashup development. IEEE Transactions on Services Computing
- Qi L, He Q, Chen F, Dou W, Wan S, Zhang X, Xu X (2019) Finding all you need: web apis recommendation in web of things through keywords search. IEEE Trans Comput Soc Syst 6(5):1063–1072
Article Google Scholar - Mezni H (2022) Temporal knowledge graph embedding for effective service recommendation. IEEE Trans Serv Comput 15(5):3077–3088. https://doi.org/10.1109/TSC.2021.3075053
Article Google Scholar - Dahan F (2023) Neighborhood search based improved bat algorithm for web service composition. Comput Syst Sci Eng 45:1343–1356
Article Google Scholar - Xiao M, Zhou Q, Zhang Z, Yin J (2024) Real-Time Intrusion Detection in Power Grids Using Deep Learning: Ensuring DPU Data Security. HighTech Innov J 5(3):814–827
- Ha NY, Ong LY, Leow MC (2024) SlowFast-TCN: A Deep Learning Approach for Visual Speech Recognition. Emerg Sci J 8(6):2554–2569
- Moustafa A, Ito T (2018) A deep reinforcement learning approach for large-scale service composition. In: PRIMA 2018: principles and practice of multi-agent systems: 21st international conference, Tokyo, Japan, October 29-November 2, 2018, Proceedings 21, Springer, pp 296–311
- Wang H, Gu M, Yu Q, Fei H, Li J, Tao Y (2017) Large-scale and adaptive service composition using deep reinforcement learning. In: Service- oriented computing: 15th international conference, ICSOC 2017, Malaga, Spain, November 13–16, 2017, Proceedings, Springer, pp 383–391
- Wang H, Gu M, Yu Q, Tao Y, Li J, Fei H, Yan J, Zhao W, Hong T (2019) Adaptive and large-scale service composition based on deep reinforcement learning. Knowl-Based Syst 180:75–90
Article Google Scholar - Hiratsuka N, Ishikawa F, Honiden S (2011) Service selection with combinational use of functionally-equivalent services. In: IEEE interna- tional conference on web services. IEEE vol 2011, pp 97–104
- Wang H, Hu X, Yu Q, Gu M, Zhao W, Yan J, Hong T (2020) Integrating reinforcement learning and skyline computing for adaptive service composition. Inf Sci 519:141–160
Article Google Scholar - Levine S, Kumar A, Tucker G, Fu J (2020) Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv:2005.01643
- Moustafa A, Zhang M (2014) Learning efficient compositions for qos-aware service provisioning. In: 2014 IEEE international conference on web services. IEEE pp 185–192
- Kondo Y, Moustafa A (2022) Service selection for service-oriented architecture using off-line reinforcement learning in dynamic environments.. In: ICAART (1), pp 64–70
- Seno T (2022) d3rlpy: An Offline Deep Reinforcement Learning Library. https://d3rlpy.readthedocs.io/en/v1.1.1/references/gener ated/d3rlpy.algos.DQN.html.d3rlpy.algos.DQN
- Seno T, Imai M (2022) d3rlpy: An offline deep reinforcement learning library. J Mach Learn Res 23(1):14205–14224
MathSciNet Google Scholar - Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar - Haarnoja T, Zhou A, Abbeel P et al (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International conference on machine learning. Pmlr pp 1861–1870
- Research M (2022) Contextual Bandit algorithms. URL https://github.com/VowpalWabbit/vowpal_wabbit/wiki/ Contextual-Bandit-algorithms
- Kumar A, Zhou A, Tucker G, Levine S (2020) Conservative q-learning for offline reinforcement learning. Advances in Neural Information Pro- cessing Systems 33:1179–1191
Google Scholar - Fujimoto S, Meger D, Precup D (2019) Off-policy deep reinforcement learning without exploration. In: International conference on machine learning. PMLR, pp 2052–2062
- Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 30
Acknowledgements
This work is supported by the Science and technology project of State Grid Corporation of China (Funding No. 5108-202218280A-2-402-XG), the State Key Laboratory of Software Development Environment (Funding No. SKLSDE-2020ZX-01), the Project of Beijing Wuzi University (Funding No. 2024XJKY25), and National Key R&D Program of China (Funding No. 2021ZD0110601).
Funding
This work is supported by the Science and technology project of State Grid Corporation of China (Funding No. 5108-202218280A-2-402-XG), the State Key Laboratory of Software Development Environment (Funding No. SKLSDE-2020ZX-01), the Project of Beijing Wuzi University (Funding No. 2024XJKY25), and National Key R&D Program of China (Funding No. 2021ZD0110601).
Author information
Authors and Affiliations
- School of Logistics, Beijing Wuzi University, Beijing, 100000, China
Xiaoming Yu - School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
Wenjun Wu & Jiadong Wang - Big Data Centre of STATE GRID Corporation of China, Beijing, 100191, China
Xin Ji
Authors
- Xiaoming Yu
- Wenjun Wu
- Jiadong Wang
- Xin Ji
Contributions
Xiaoming Yu: Conceptualization, Methodology, Validation, Writing - original draft. Wenjun Wu: Writing - review editing. Jiadong Wang: Visualization, Investigation, Data curation. Xin Ji: Proofread content.
Corresponding author
Correspondence toXiaoming Yu.
Ethics declarations
Conflicts of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Appendix
A Appendix
Given space constraints, we highlight critical excerpts of the core algorithm implementation - specifically the offline and online models. And we only extract a portion of the core code, as shown in Figs. 24 and 25.
Fig. 24
Some initialization of the model
Fig. 25
Screenshots of certain parts of the core code in the SC-COO method
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yu, X., Wu, W., Wang, J. et al. SC-COO: A feedback-based service composition algorithm combining offline and online reinforcement learning.Appl Intell 55, 806 (2025). https://doi.org/10.1007/s10489-025-06683-z
- Accepted: 02 June 2025
- Published: 20 June 2025
- Version of record: 20 June 2025
- DOI: https://doi.org/10.1007/s10489-025-06683-z