SC-COO: A feedback-based service composition algorithm combining offline and online reinforcement learning (original) (raw)

Abstract

Faced with the current dynamic service environment, rapid and efficient service composition has attracted much attention in recent years. The service composition could complete the reuse of existing services and its ultimate goal is to better satisfy users. However, it is challenging to interact with the service environment to collect data in practical applications due to factors such as high cost and risk. To overcome this limitation, this paper proposes the SC-COO method: A feedback-based service composition algorithm combining offline and online reinforcement learning. The SC-COO method mainly consists of two stages: the offline training module (SC-COO-offline) is the main stage, and the online update module (SC-COO-online) is the auxiliary stage. The SC-COO-offline model is trained through collected offline data, avoiding the drawback of online learning requiring multiple iterations to converge. And online training (SC-COO-online) serves as an auxiliary stage to jointly make decisions and recommend services to users to better adapt to dynamic environments. Furthermore, our SC-COO method offers users’ score preferences in service composition by designing a feedback-based reward mechanism. Continuous interactive feedback with humans can significantly improve the robustness of the service composition system. Finally, some experiments on the RapidAPI dataset demonstrate that SC-COO outperforms other baselines in accuracy, scalability, and convergence. And some results of the ablation experiment also verify the efficiency and applicability of SC-COO.

Access this article

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Data Availability

Data are available upon request.

Notes

References

Zhang Y, Wang K, He Q, Chen F, Deng S, Zheng Z, Yang Y (2021) Covering-based web service quality prediction via neighborhood-aware matrix factorization. IEEE Trans Serv Comput 14(5):1333–1344. https://doi.org/10.1109/TSC.2019.2891517
Article Google Scholar
Barkat A, Kazar O, Seddiki I (2021) Framework for web service composition based on qos in the multi cloud environment. Int J Inf Technol 13:459–467
Google Scholar
Sangsanit K, Kurutach W, Phoomvuthisarn S (2018) Rest web service composition: A survey of automation and techniques. In: 2018 International conference on information networking (ICOIN), IEEE, pp 116–121
Xie F, Chen L, Lin D, Zheng Z, Lin X (2019) Personalized service recommendation with mashup group preference in heterogeneous information network. IEEE Access 7:16155–16167
Article Google Scholar
Yao L, Wang X, Sheng QZ, Benatallah B, Huang C (2021) Mashup recommendation by regularizing matrix factorization with api co-invocations. IEEE Trans Serv Comput 14(2):502–515. https://doi.org/10.1109/TSC.2018.2803171
Article Google Scholar
Cao B, Liu J, Wen Y, Li H, Xiao Q, Chen J (2019) Qos-aware service recommendation based on relational topic model and factorization machines for iot mashup applications. J Parallel Distrib Comput 132:177–189
Article Google Scholar
Wang H, Chen X, Wu Q, Yu Q, Zheng Z, Bouguettaya A (2014) Integrating on-policy reinforcement learning with multi-agent techniques for adaptive service composition. In: Service-oriented computing: 12th international conference, ICSOC 2014, Paris, France, November 3-6, 2014. Proceedings 12, Springer, pp 154–168
Wang H, Wang X, Hu X, Zhang X, Gu M (2016) A multi-agent reinforcement learning approach to dynamic service composition. Inf Sci 363:96–119
Article Google Scholar
Moustafa A (2021) On learning adaptive service compositions. J Syst Sci Syst Eng 30(4):465–481
Article Google Scholar
Driss M, Ben Atitallah S, Albalawi A, Boulila W (2022) Req-wscomposer: a novel platform for requirements-driven composition of semantic web services. Journal of Ambient Intelligence and Humanized Computing pp 1–17
Li J, Zhong Y, Zhu S, Hao Y (2022) Energy-aware service composition in multi-cloud. J King Saud Univ-Comput Inf Sci 34(7):3959–3967
Article Google Scholar
Kang G, Liu J, Cao B, Cao M (2020) Nafm: neural and attentional factorization machine for web api recommendation. In: 2020 IEEE international conference on web services (ICWS), IEEE, pp 330–337
Gu Q, Cao J, Liu Y (2021) Csbr: A compositional semantics-based service bundle recommendation approach for mashup development. IEEE Transactions on Services Computing
Qi L, He Q, Chen F, Dou W, Wan S, Zhang X, Xu X (2019) Finding all you need: web apis recommendation in web of things through keywords search. IEEE Trans Comput Soc Syst 6(5):1063–1072
Article Google Scholar
Mezni H (2022) Temporal knowledge graph embedding for effective service recommendation. IEEE Trans Serv Comput 15(5):3077–3088. https://doi.org/10.1109/TSC.2021.3075053
Article Google Scholar
Dahan F (2023) Neighborhood search based improved bat algorithm for web service composition. Comput Syst Sci Eng 45:1343–1356
Article Google Scholar
Xiao M, Zhou Q, Zhang Z, Yin J (2024) Real-Time Intrusion Detection in Power Grids Using Deep Learning: Ensuring DPU Data Security. HighTech Innov J 5(3):814–827
Ha NY, Ong LY, Leow MC (2024) SlowFast-TCN: A Deep Learning Approach for Visual Speech Recognition. Emerg Sci J 8(6):2554–2569
Moustafa A, Ito T (2018) A deep reinforcement learning approach for large-scale service composition. In: PRIMA 2018: principles and practice of multi-agent systems: 21st international conference, Tokyo, Japan, October 29-November 2, 2018, Proceedings 21, Springer, pp 296–311
Wang H, Gu M, Yu Q, Fei H, Li J, Tao Y (2017) Large-scale and adaptive service composition using deep reinforcement learning. In: Service- oriented computing: 15th international conference, ICSOC 2017, Malaga, Spain, November 13–16, 2017, Proceedings, Springer, pp 383–391
Wang H, Gu M, Yu Q, Tao Y, Li J, Fei H, Yan J, Zhao W, Hong T (2019) Adaptive and large-scale service composition based on deep reinforcement learning. Knowl-Based Syst 180:75–90
Article Google Scholar
Hiratsuka N, Ishikawa F, Honiden S (2011) Service selection with combinational use of functionally-equivalent services. In: IEEE interna- tional conference on web services. IEEE vol 2011, pp 97–104
Wang H, Hu X, Yu Q, Gu M, Zhao W, Yan J, Hong T (2020) Integrating reinforcement learning and skyline computing for adaptive service composition. Inf Sci 519:141–160
Article Google Scholar
Levine S, Kumar A, Tucker G, Fu J (2020) Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv:2005.01643
Moustafa A, Zhang M (2014) Learning efficient compositions for qos-aware service provisioning. In: 2014 IEEE international conference on web services. IEEE pp 185–192
Kondo Y, Moustafa A (2022) Service selection for service-oriented architecture using off-line reinforcement learning in dynamic environments.. In: ICAART (1), pp 64–70
Seno T (2022) d3rlpy: An Offline Deep Reinforcement Learning Library. https://d3rlpy.readthedocs.io/en/v1.1.1/references/gener ated/d3rlpy.algos.DQN.html.d3rlpy.algos.DQN
Seno T, Imai M (2022) d3rlpy: An offline deep reinforcement learning library. J Mach Learn Res 23(1):14205–14224
MathSciNet Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Haarnoja T, Zhou A, Abbeel P et al (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International conference on machine learning. Pmlr pp 1861–1870
Research M (2022) Contextual Bandit algorithms. URL https://github.com/VowpalWabbit/vowpal_wabbit/wiki/ Contextual-Bandit-algorithms
Kumar A, Zhou A, Tucker G, Levine S (2020) Conservative q-learning for offline reinforcement learning. Advances in Neural Information Pro- cessing Systems 33:1179–1191
Google Scholar
Fujimoto S, Meger D, Precup D (2019) Off-policy deep reinforcement learning without exploration. In: International conference on machine learning. PMLR, pp 2052–2062
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 30

Download references

Acknowledgements

This work is supported by the Science and technology project of State Grid Corporation of China (Funding No. 5108-202218280A-2-402-XG), the State Key Laboratory of Software Development Environment (Funding No. SKLSDE-2020ZX-01), the Project of Beijing Wuzi University (Funding No. 2024XJKY25), and National Key R&D Program of China (Funding No. 2021ZD0110601).

Funding

Author information

Authors and Affiliations

School of Logistics, Beijing Wuzi University, Beijing, 100000, China
Xiaoming Yu
School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
Wenjun Wu & Jiadong Wang
Big Data Centre of STATE GRID Corporation of China, Beijing, 100191, China
Xin Ji

Authors

Xiaoming Yu
Wenjun Wu
Jiadong Wang
Xin Ji

Contributions

Xiaoming Yu: Conceptualization, Methodology, Validation, Writing - original draft. Wenjun Wu: Writing - review editing. Jiadong Wang: Visualization, Investigation, Data curation. Xin Ji: Proofread content.

Corresponding author

Correspondence toXiaoming Yu.

Ethics declarations

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix

Given space constraints, we highlight critical excerpts of the core algorithm implementation - specifically the offline and online models. And we only extract a portion of the core code, as shown in Figs. 24 and 25.

Fig. 24

Some initialization of the model

Fig. 25

Screenshots of certain parts of the core code in the SC-COO method

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yu, X., Wu, W., Wang, J. et al. SC-COO: A feedback-based service composition algorithm combining offline and online reinforcement learning.Appl Intell 55, 806 (2025). https://doi.org/10.1007/s10489-025-06683-z

Download citation

Accepted: 02 June 2025
Published: 20 June 2025
Version of record: 20 June 2025
DOI: https://doi.org/10.1007/s10489-025-06683-z

SC-COO: A feedback-based service composition algorithm combining offline and online reinforcement learning (original) (raw)

Abstract

Access this article

Buy Now

Similar content being viewed by others

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher's Note

A Appendix

A Appendix

Rights and permissions

About this article

Cite this article

Keywords

SC-COO: A feedback-based service composition algorithm combining offline and online reinforcement learning (original) (raw)

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher's Note

A Appendix

A Appendix

Rights and permissions

About this article

Cite this article

Keywords