SC-COO: A feedback-based service composition algorithm combining offline and online reinforcement learning (original) (raw)

Abstract

Faced with the current dynamic service environment, rapid and efficient service composition has attracted much attention in recent years. The service composition could complete the reuse of existing services and its ultimate goal is to better satisfy users. However, it is challenging to interact with the service environment to collect data in practical applications due to factors such as high cost and risk. To overcome this limitation, this paper proposes the SC-COO method: A feedback-based service composition algorithm combining offline and online reinforcement learning. The SC-COO method mainly consists of two stages: the offline training module (SC-COO-offline) is the main stage, and the online update module (SC-COO-online) is the auxiliary stage. The SC-COO-offline model is trained through collected offline data, avoiding the drawback of online learning requiring multiple iterations to converge. And online training (SC-COO-online) serves as an auxiliary stage to jointly make decisions and recommend services to users to better adapt to dynamic environments. Furthermore, our SC-COO method offers users’ score preferences in service composition by designing a feedback-based reward mechanism. Continuous interactive feedback with humans can significantly improve the robustness of the service composition system. Finally, some experiments on the RapidAPI dataset demonstrate that SC-COO outperforms other baselines in accuracy, scalability, and convergence. And some results of the ablation experiment also verify the efficiency and applicability of SC-COO.

Access this article

Log in via an institution

Subscribe and save

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availability

Data are available upon request.

Notes

  1. https://github.com/PaddlePaddle/PaddleNLP
  2. https://rapidapi.com/hub

References

  1. Zhang Y, Wang K, He Q, Chen F, Deng S, Zheng Z, Yang Y (2021) Covering-based web service quality prediction via neighborhood-aware matrix factorization. IEEE Trans Serv Comput 14(5):1333–1344. https://doi.org/10.1109/TSC.2019.2891517
    Article Google Scholar
  2. Barkat A, Kazar O, Seddiki I (2021) Framework for web service composition based on qos in the multi cloud environment. Int J Inf Technol 13:459–467
    Google Scholar
  3. Sangsanit K, Kurutach W, Phoomvuthisarn S (2018) Rest web service composition: A survey of automation and techniques. In: 2018 International conference on information networking (ICOIN), IEEE, pp 116–121
  4. Xie F, Chen L, Lin D, Zheng Z, Lin X (2019) Personalized service recommendation with mashup group preference in heterogeneous information network. IEEE Access 7:16155–16167
    Article Google Scholar
  5. Yao L, Wang X, Sheng QZ, Benatallah B, Huang C (2021) Mashup recommendation by regularizing matrix factorization with api co-invocations. IEEE Trans Serv Comput 14(2):502–515. https://doi.org/10.1109/TSC.2018.2803171
    Article Google Scholar
  6. Cao B, Liu J, Wen Y, Li H, Xiao Q, Chen J (2019) Qos-aware service recommendation based on relational topic model and factorization machines for iot mashup applications. J Parallel Distrib Comput 132:177–189
    Article Google Scholar
  7. Wang H, Chen X, Wu Q, Yu Q, Zheng Z, Bouguettaya A (2014) Integrating on-policy reinforcement learning with multi-agent techniques for adaptive service composition. In: Service-oriented computing: 12th international conference, ICSOC 2014, Paris, France, November 3-6, 2014. Proceedings 12, Springer, pp 154–168
  8. Wang H, Wang X, Hu X, Zhang X, Gu M (2016) A multi-agent reinforcement learning approach to dynamic service composition. Inf Sci 363:96–119
    Article Google Scholar
  9. Moustafa A (2021) On learning adaptive service compositions. J Syst Sci Syst Eng 30(4):465–481
    Article Google Scholar
  10. Driss M, Ben Atitallah S, Albalawi A, Boulila W (2022) Req-wscomposer: a novel platform for requirements-driven composition of semantic web services. Journal of Ambient Intelligence and Humanized Computing pp 1–17
  11. Li J, Zhong Y, Zhu S, Hao Y (2022) Energy-aware service composition in multi-cloud. J King Saud Univ-Comput Inf Sci 34(7):3959–3967
    Article Google Scholar
  12. Kang G, Liu J, Cao B, Cao M (2020) Nafm: neural and attentional factorization machine for web api recommendation. In: 2020 IEEE international conference on web services (ICWS), IEEE, pp 330–337
  13. Gu Q, Cao J, Liu Y (2021) Csbr: A compositional semantics-based service bundle recommendation approach for mashup development. IEEE Transactions on Services Computing
  14. Qi L, He Q, Chen F, Dou W, Wan S, Zhang X, Xu X (2019) Finding all you need: web apis recommendation in web of things through keywords search. IEEE Trans Comput Soc Syst 6(5):1063–1072
    Article Google Scholar
  15. Mezni H (2022) Temporal knowledge graph embedding for effective service recommendation. IEEE Trans Serv Comput 15(5):3077–3088. https://doi.org/10.1109/TSC.2021.3075053
    Article Google Scholar
  16. Dahan F (2023) Neighborhood search based improved bat algorithm for web service composition. Comput Syst Sci Eng 45:1343–1356
    Article Google Scholar
  17. Xiao M, Zhou Q, Zhang Z, Yin J (2024) Real-Time Intrusion Detection in Power Grids Using Deep Learning: Ensuring DPU Data Security. HighTech Innov J 5(3):814–827
  18. Ha NY, Ong LY, Leow MC (2024) SlowFast-TCN: A Deep Learning Approach for Visual Speech Recognition. Emerg Sci J 8(6):2554–2569
  19. Moustafa A, Ito T (2018) A deep reinforcement learning approach for large-scale service composition. In: PRIMA 2018: principles and practice of multi-agent systems: 21st international conference, Tokyo, Japan, October 29-November 2, 2018, Proceedings 21, Springer, pp 296–311
  20. Wang H, Gu M, Yu Q, Fei H, Li J, Tao Y (2017) Large-scale and adaptive service composition using deep reinforcement learning. In: Service- oriented computing: 15th international conference, ICSOC 2017, Malaga, Spain, November 13–16, 2017, Proceedings, Springer, pp 383–391
  21. Wang H, Gu M, Yu Q, Tao Y, Li J, Fei H, Yan J, Zhao W, Hong T (2019) Adaptive and large-scale service composition based on deep reinforcement learning. Knowl-Based Syst 180:75–90
    Article Google Scholar
  22. Hiratsuka N, Ishikawa F, Honiden S (2011) Service selection with combinational use of functionally-equivalent services. In: IEEE interna- tional conference on web services. IEEE vol 2011, pp 97–104
  23. Wang H, Hu X, Yu Q, Gu M, Zhao W, Yan J, Hong T (2020) Integrating reinforcement learning and skyline computing for adaptive service composition. Inf Sci 519:141–160
    Article Google Scholar
  24. Levine S, Kumar A, Tucker G, Fu J (2020) Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv:2005.01643
  25. Moustafa A, Zhang M (2014) Learning efficient compositions for qos-aware service provisioning. In: 2014 IEEE international conference on web services. IEEE pp 185–192
  26. Kondo Y, Moustafa A (2022) Service selection for service-oriented architecture using off-line reinforcement learning in dynamic environments.. In: ICAART (1), pp 64–70
  27. Seno T (2022) d3rlpy: An Offline Deep Reinforcement Learning Library. https://d3rlpy.readthedocs.io/en/v1.1.1/references/gener ated/d3rlpy.algos.DQN.html.d3rlpy.algos.DQN
  28. Seno T, Imai M (2022) d3rlpy: An offline deep reinforcement learning library. J Mach Learn Res 23(1):14205–14224
    MathSciNet Google Scholar
  29. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
    Article Google Scholar
  30. Haarnoja T, Zhou A, Abbeel P et al (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International conference on machine learning. Pmlr pp 1861–1870
  31. Research M (2022) Contextual Bandit algorithms. URL https://github.com/VowpalWabbit/vowpal_wabbit/wiki/ Contextual-Bandit-algorithms
  32. Kumar A, Zhou A, Tucker G, Levine S (2020) Conservative q-learning for offline reinforcement learning. Advances in Neural Information Pro- cessing Systems 33:1179–1191
    Google Scholar
  33. Fujimoto S, Meger D, Precup D (2019) Off-policy deep reinforcement learning without exploration. In: International conference on machine learning. PMLR, pp 2052–2062
  34. Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 30

Download references

Acknowledgements

This work is supported by the Science and technology project of State Grid Corporation of China (Funding No. 5108-202218280A-2-402-XG), the State Key Laboratory of Software Development Environment (Funding No. SKLSDE-2020ZX-01), the Project of Beijing Wuzi University (Funding No. 2024XJKY25), and National Key R&D Program of China (Funding No. 2021ZD0110601).

Funding

This work is supported by the Science and technology project of State Grid Corporation of China (Funding No. 5108-202218280A-2-402-XG), the State Key Laboratory of Software Development Environment (Funding No. SKLSDE-2020ZX-01), the Project of Beijing Wuzi University (Funding No. 2024XJKY25), and National Key R&D Program of China (Funding No. 2021ZD0110601).

Author information

Authors and Affiliations

  1. School of Logistics, Beijing Wuzi University, Beijing, 100000, China
    Xiaoming Yu
  2. School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
    Wenjun Wu & Jiadong Wang
  3. Big Data Centre of STATE GRID Corporation of China, Beijing, 100191, China
    Xin Ji

Authors

  1. Xiaoming Yu
  2. Wenjun Wu
  3. Jiadong Wang
  4. Xin Ji

Contributions

Xiaoming Yu: Conceptualization, Methodology, Validation, Writing - original draft. Wenjun Wu: Writing - review editing. Jiadong Wang: Visualization, Investigation, Data curation. Xin Ji: Proofread content.

Corresponding author

Correspondence toXiaoming Yu.

Ethics declarations

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix

A Appendix

Given space constraints, we highlight critical excerpts of the core algorithm implementation - specifically the offline and online models. And we only extract a portion of the core code, as shown in Figs. 24 and 25.

Fig. 24

Some initialization of the model

Fig. 25

Screenshots of certain parts of the core code in the SC-COO method

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yu, X., Wu, W., Wang, J. et al. SC-COO: A feedback-based service composition algorithm combining offline and online reinforcement learning.Appl Intell 55, 806 (2025). https://doi.org/10.1007/s10489-025-06683-z

Download citation

Keywords