ENHANCING CONVERSATIONAL AGENTS USING ROTATIONAL ATTENTION AND GATED SPLINE MODULES

VIGNESH ARUMUGAM; MUTHUKUMARAN NARAYANAPERUMAL

doi:10.59277/RRST-EE.2025.3.18

Auteurs

VIGNESH ARUMUGAM Sri Eshwar College of Engineering, Coimbatore – 641202, Tamil Nadu, India. Author
MUTHUKUMARAN NARAYANAPERUMAL Sri Eshwar College of Engineering, Coimbatore – 641202, Tamil Nadu, India. Author https://orcid.org/0000-0003-0592-6630

DOI :

https://doi.org/10.59277/RRST-EE.2025.3.18

Mots-clés :

Compréhension du langage naturel, IA conversationnelle, Modèle T5 amélioré, Attention rotationnelle contextuelle à deux axes, Unités linéaires à porte neuronale spline

Résumé

En compréhension du langage naturel, les modèles de transformateurs comme T5 et GPT ont obtenu d'excellents résultats en générant des réponses contextuellement pertinentes. Cependant, des limitations telles que l'auto-attention statique dans T5 et le contexte unidirectionnel dans GPT entravent leur capacité à capturer des dépendances inter-tokens plus profondes et une sémantique nuancée. Pour relever ces défis, nous proposons une architecture T5 améliorée (ET5) intégrant deux nouveaux modules : l'attention rotationnelle contextuelle à deux axes (CDARA) et les unités linéaires à déclenchement neural-spline (NS-GLU). CDARA facilite l'attention sur les dimensions des tokens et des caractéristiques, tandis que NS-GLU introduit un déclenchement adaptatif activé par spline pour une meilleure représentation non linéaire. Des expériences sur NarrativeQA, SQuAD, MultiWOZ et DailyDialog montrent qu'ET5 surpasse systématiquement PEGASUS, GPT-3 et T5-LSTM FusionNet. ET5 obtient des scores BERTScore supérieurs (jusqu'à 0,971), BLEU (jusqu'à 0,77) et un taux d'erreur de mots (WER) plus faible (jusqu'à 0,13), confirmant son efficacité à générer des réponses fluides, précises et sémantiquement riches. Ces résultats positionnent ET5 comme une avancée prometteuse dans les systèmes d'IA conversationnels basés sur des transformateurs.

Biographie de l'auteur

MUTHUKUMARAN NARAYANAPERUMAL, Sri Eshwar College of Engineering, Coimbatore – 641202, Tamil Nadu, India.

Dr. N. MUTHUKUMARAN was born in Kanniyakumari, Tamil Nadu, India, in 1984. He received the B.E Degree in Electronics and Communication Engineering, M.E Degree in Applied Electronics and the Ph.D. Degree in Information and Communication Engineering from Anna University, Chennai, India in 2007, 2010 and 2015 respectively. He is currently working as a professor in the Centre for Computational Imaging and Machine Vision in the Department of ECE at Sri Eshwar College of Engineering, Affiliated to Anna University Chennai, Coimbatore, Tamil Nadu, India. His major research interests are in the field of Digital Image/ Signal Processing, Multimedia Image/ Video Processing/ Compression, Digital and Analog Very Large-Scale Integration circuit design. Since 2006 he has published more than 73 International Journals like Springer, IEEE, Elsevier and 88 National/International conferences papers. He has published 15 International Books which is related to Engineering Students and 27 Innovation Patents. He has actively participated and organized more than 102 research related events like National and International Workshop, Faculty Development Program, Seminar, Symposium, Conference and Short-Term Courses Delivered & Attended. He has collaborated and life time member of more than 19 various Memberships body Association like IEEE, ISI, WCECS, UACEE etc.

Références

(1) M. Ganga, G. Jasmine, N. Muthukumaran, M. Veluchamy, Red fox-based fractional order fuzzy PID controller for smart LED driver circuit, Rev. Roum. Sci. Techn. – Électrotechn. Et Énerg., 68, pp. 395–400 (2023).

(2) R. Ahmad, D. Siemon, U. Gnewuch, S. Robra-Bissantz, Designing personality-adaptive conversational agents for mental health care, Information Systems Frontiers, 24, pp. 923–943 (2022).

(3) A. Ramaiah, P. Devi Balasubramanian, A. Appathurai, N. Muthukumaran, Génie biomédical Biomedical Engineering detection of Parkinson’s disease via Clifford gradient-based recurrent neural network using multi-dimensional data, Rev. Roum. Sci. Techn. -Électrotechn. et Énerg, 69 (2024).

(4) J. Balakrishnan, Y.K. Dwivedi, Conversational commerce: entering the next stage of AI-powered digital assistants, Ann Oper Res (2021).

(5) A. Appathurai, A.S.I. Tinu, N. Muthukumaran, Meg and Pet images-based brain tumor detection using Kapur’s Otsu segmentation and sooty optimized Mobilenet classification, Rev. Roum. Sci. Techn. – Électrotechn. Et Énerg., 69, 3, pp. 363–368 (2024).

(6) W. Cai et al., Bandit algorithms to personalize educational chatbots, Mach Learn, 110, 9, pp. 2389–2418 (2021).

(7) T.Y. Chen, Y.C. Chiu, N. Bi, R.T.H. Tsai, Multi-modal Chatbot in intelligent manufacturing, IEEE Access (2021).

(8) S. Gong, M. Li, J. Feng, Z. Wu, L. Kong, DiffuSeq: Sequence to sequence text generation with diffusion models, (2022).

(9) L. Grassi, C.T. Recchiuto, A. Sgorbissa, Knowledge-Grounded dialogue flow management for social robots and conversational agents, Int J Soc Robot, 14, 5, pp. 1273–1293, (2022).

(10) H. Honda, M. Hagiwara, Question answering systems with deep learning-based symbolic processing, IEEE Access, 7, pp. 152368–152378 (2019).

(11) C. Hsu, C.C. Chang, Integrating machine learning and open data into social Chatbot for filtering information rumor, J Ambient Intell Humaniz Comput, 12, 1, pp. 1023–1037, (2021).

(12) R.B. Lincy, R. Gayathri, Optimized convolutional neural network for tamil handwritten character recognition, Intern J Pattern Recognit Artif Intell, 36, 11, (2022).

(13) M.M. Mohsan, M.U. Akram, G. Rasool, N.S. Alghamdi, M.A.A. Baqai, M. Abbas, Vision transformer and language model-based radiology report generation, IEEE Access, 11, pp. 1814–1824 (2023).

(14) K. Palasundram, N. Mohd Sharef, K.A. Kasmiran, A. Azman, Enhancements to the sequence-to-sequence-based natural answer generation models, IEEE Access, 8, pp. 45738–45752 (2020).

(15) Y. Park, A. Park, C. Kim, ALSI-Transformer: transformer-based code comment generation with aligned lexical and syntactic information, IEEE Access, 11, pp. 39037–39047 (2023).

(16) J. Zhang, Y. Zhao, M. Saleh, P.J. Liu, PEGASUS: Pre-training with extracted gap-sentences for abstractive summarization, arXiv (2019)

(17) B. Khan, M. Usman, I. Khan, J. Khan, D. Hussain, Y.H. Gu, Next-generation text summarization: A T5-LSTM FusionNet hybrid approach for psychological data, IEEE (2025).

(18) M.V. Namitha, G.R. Manjula, M.C. Belavagi, StegAbb: A cover-generating text steganographic tool using GPT-3 language modeling for covert communication across SDRs, IEEE Access, 12, pp. 82057–82067 (2024).

(19) D. Mylsamy, A. Appathurai, N. Muthukumaran, S. Kuppusamy, Mojo-based fuzzy agglomerative clustering algorithm with Ed 2 Mt strategy for large-scale wireless sensors networks, Rev. Roum. Sci. Techn. -Électrotechn. et Énerg,, 69 (2024).

(20) W.T. Wang, N. Tan, J.A. Hanson, C.A. Crubaugh, A.K. Hara, Initial experience with a COVID-19 screening chatbot before radiology appointments, J Digit Imaging, 35, 5, pp. 1303–1307 (2022).

(21) M.R. Kumar, R. Sundaram, M. Rengasamy, R. Balakrishnan, Effective feature extraction method for unconstrained environment: local binary pattern or local ternary pattern, Rev. Roum. Sci. Techn. – Électrotechn. Et Énerg., 69, 4, pp. 443–448 (2024).

AMÉLIORATION DES AGENTS CONVERSATIONNELS À L'AIDE DE MODULES D'ATTENTION ROTATIONNELLE ET DE SPLINE À PORTES

Auteurs

DOI :

Mots-clés :

Résumé

Biographie de l'auteur

Références

Téléchargements

Publiée

Numéro

Rubrique

Licence

Comment citer

Langue

Informations