A Multimodal Conversational Agent for Co-Creating Teaching Slides and Lesson Plans

Huaxin Zheng

doi:10.54097/x3fxee85

Authors

Huaxin Zheng

DOI:

https://doi.org/10.54097/x3fxee85

Keywords:

Artificial intelligence in education, human-in-the-loop generation, lesson plan generation, multimodal references, retrieval-augmented generation

Abstract

This paper presents a multimodal conversational teaching agent that co-creates classroom slide decks and lesson plans with teachers through a staged review process. The prototype was motivated by a practical teaching-design scenario in which courseware preparation is fragmented across several tools and teacher intent is often captured only superficially. The proposed system combines three capabilities: multi-turn intent clarification, retrieval-augmented generation over a locally curated knowledge base, and optional reference-file grounding from uploaded PDF, DOCX, PPT, image, or video materials. Instead of exporting final artifacts at the first interaction, the agent follows a text-first pipeline. In Round 1 it produces a slide draft and a lesson-plan draft for review. In Round 2 it converts the approved slide draft into a structured slide object and then into PPTX. In Round 3 it converts the approved lesson-plan draft into Markdown and then into DOCX. This design keeps the teacher in control of pedagogical structure, terminology, and difficulty level while preserving the efficiency gains of large language models. The implementation separates chat-oriented intent handling from deterministic formatting and export steps, and it uses explicit state variables to support revision, regeneration, and artifact continuity across rounds. A case-based verification on networking topics such as VLAN, WLAN, and STP shows that the system can align courseware drafts, exported slides, and exported lesson plans under a single interaction loop. The paper contributes a reproducible low-code architecture, a multimodal fusion strategy that does not require writing uploaded references into the knowledge base, and a teacher-centered workflow for controllable educational content generation.

Downloads

Download data is not yet available.

References

[1] O. Zawacki-Richter, V. I. Marin, M. Bond, and F. Gouverneur, "Systematic review of research on artificial intelligence applications in higher education - where are the educators?" Int. J. Educ. Technol. High. Educ., vol. 16, art. 39, 2019.

[2] W. Holmes, M. Bialik, and C. Fadel, Artificial Intelligence in Education: Promises and Implications for Teaching and Learning. Boston, MA: Center for Curriculum Redesign, 2019.

[3] UNESCO, Artificial Intelligence in Education: Challenges and Opportunities for Sustainable Development. Paris: UNESCO, 2019.

[4] L. Ouyang et al., "Training language models to follow instructions with human feedback," arXiv preprint arXiv:2203.02155, 2022.

[5] J. Wei et al., "Chain-of-thought prompting elicits reasoning in large language models," arXiv preprint arXiv:2201.11903, 2022.

[6] P. Lewis et al., "Retrieval-augmented generation for knowledge-intensive NLP tasks," arXiv preprint arXiv:2005.11401, 2020.

[7] V. Karpukhin, B. Oguz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.-t. Yih, "Dense passage retrieval for open-domain question answering," arXiv preprint arXiv:2004.04906, 2020.

[8] H. Liu, C. Li, Q. Wu, and Y. J. Lee, "Visual instruction tuning," arXiv preprint arXiv:2304.08485, 2023.

[9] D. Zhu, J. Chen, X. Shen, X. Li, and M. Elhoseiny, "MiniGPT-4: Enhancing vision-language understanding with advanced large language models," arXiv preprint arXiv:2304.10592, 2023.

[10] E. R. Mollick and L. Mollick, "Assigning AI: Seven approaches for students, with prompts," SSRN Electron. J., 2023, doi: 10.2139/ssrn.4475995.