Agentic Commerce: A Unified Multi-Retrieval Framework for High-Fidelity E-Commerce Chatbots

MD Estihad Faysal; Wenfeng Feng; Esha Mony

doi:10.54097/2wmsj534

Authors

MD Estihad Faysal
Wenfeng Feng
Esha Mony

DOI:

https://doi.org/10.54097/2wmsj534

Keywords:

E-commerce, conversational AI, large language models, retrieval-augmented generation, chain-of-thought, multi-agent systems, agentic AI, reasoning, chatbot evaluation

Abstract

E-commerce chatbots face critical limitations that undermine customer trust: hallucinations from ungrounded responses, poor multi-turn coherence, and an inability to execute real-world actions such as processing refunds or verifying live inventory. Existing retrieval-augmented generation (RAG) and chain-of-thought (CoT) approaches address knowledge grounding and reasoning, respectively, yet remain fundamentally passive—they inform but cannot act. We present a unified agentic framework that integrates RAG for factual grounding, CoT for structured multi-step reasoning, and multi-agent col- laboration for autonomous task execution. The modular architecture encompasses specialized agents for retrieval, reasoning, action generation, and safety enforcement, orchestrated through an LLM-based routing policy. This design enables the system to move beyond answering questions toward completing transactions, coordinating inventory checks, and resolving complex customer inquiries autonomously. Evaluated on a 10K-SKU e-commerce dataset spanning factual, comparative, and multi-turn query types, the framework achieves 96.2% response accuracy and 95.8% grounding reliability. The multi-agent ar- chitecture reduces errors in multi-turn interactions by 18% compared to single-agent baselines. The system operates with a median latency of 3.12 seconds—a deliberate safety-first design choice that pri- oritizes transactional reliability over conversational speed, ensuring business-critical accuracy in high- stakes operations where sub-second responses would compromise correctness.

Downloads

Download data is not yet available.

References

[1] J. Patel, A. Malhotra, A. Pande, P. Caire. A Survey: Information Search Time Optimization Based on RAG (Retrieval Augmentation Generation) Chatbot. PARIPEX Indian Journal of Research, 2025.

[2] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D.M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei. Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 2020, 33: 1877–1901.

[3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin. Attention Is All You Need. Advances in Neural Information Processing Systems, 2017, 30: 5998– 6008.

[4] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova. BERT: Pre-training of Deep Bidirectional Trans- formers for Language Understanding. Proceedings of NAACL-HLT, 2019: 4171–4186.

[5] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Ku¨ttler, M. Lewis, W.-T. Yih, T. Rockta¨schel, S. Riedel, D. Kiela. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 2020, 33: 9459–9474.

[6] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, D. Zhou. Chain-of- Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems, 2022, 35: 24824–24837.

[7] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen. LoRA: Low-Rank Adaptation of Large Language Models. arXiv preprint arXiv:2106.09685, 2021.

[8] J. Johnson, M. Douze, H. Je´gou. Billion-Scale Similarity Search with GPUs. IEEE Transactions on Big Data, 2019, 7(3): 535–547.

[9] R. Ye, Y. Zhang, M. Wang, S. Gao. X-MAS: Towards Building Multi-Agent Systems with Hetero- geneous LLMs. arXiv preprint arXiv:2501.03124, 2025.

[10] S. Hong, X. Zhuge, J. Chen, X. Zheng, Y. Cheng, C. Zhang, J. Wang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, C. Wu. MetaGPT: Meta Programming for Multi-Agent Collaborative Framework. arXiv preprint arXiv:2308.00352, 2023.

[11] H. Chae, J. Kim, S. Kim, K. Oh. Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents. arXiv preprint arXiv:2310.09343, 2023.

[12] Meta AI. The Llama 3 Herd of Models. Technical Report, 2024.

[13] Gemini Team. Gemini 1.5 Technical Report. Technical Report, 2024.

[14] A. Liu, B. Feng, B. Wang, B. Wang, B. Liu, C. Zhao, C. Deng, C. Ruan, D. Dai, D. Guo, et al. DeepSeek-V3 Technical Report. arXiv preprint arXiv:2412.19437, 2024.

[15] C. Qian, Z. Liu, Z. Shao, Y. Qin, B. Hui, Z. Wang, Y. Zheng, J. Li, Y. Zhang, W. Xu, T. Liu, M. Huang. Scaling Large-Language-Model-based Multi-Agent Collaboration. arXiv preprint arXiv:2406.07155, 2024.

[16] N. A. Rohmadin, R. Ferdiana, I. Hidayah. Optimizing Retrieval-Augmented Generation Chatbot with Hyperparameter Tuning. Proceedings of 2025 4th International Conference on Electronics Representation and Algorithm, 2025.

[17] R. Akkiraju, V. Sinha, A. A. Ber, P. Braber, I. Carmeli, B. Corvino, G. Dekel, A. Nus, A. Pillai, A. Sharma, et al. FACTS About Building Retrieval Augmented Generation-based Chatbots. arXiv preprint arXiv:2407.07858, 2024.