Authors :
Jayapratha N; Vijaysurya M; Lingeshwaran G; Vema Naga Karish Gupta; Shivaprasanna
Volume/Issue :
Volume 9 - 2024, Issue 11 - November
Google Scholar :
https://tinyurl.com/26ez98fm
Scribd :
https://tinyurl.com/ta6dp7eh
DOI :
https://doi.org/10.38124/ijisrt/IJISRT24NOV1089
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
This paper presents a novel approach to
multilingual voice translation that integrates speech
emotion recognition, multi-speaker differentiation, and
voice cloning for cross-cultural applications. While
existing translation systems achieve basic linguistic
transformation, they often overlook critical elements like
speaker-specific identity and emotional tone. The
proposed system advances traditional models by
leveraging deep learning to distinguish multiple
speakers and recognize emotional states in multilingual
contexts, preserving vocal nuances across languages.
This study examines our model's architecture, evaluates
its components, and assesses the potential impact on
international communication, providing an innovative,
culturally sensitive translation solution.
References :
- Belkacem, S. (2023). "Speech Emotion Recognition: Recent Advances and Current Trends." Springer. [Detailed discussion on recent SER advancements]
- Scheidwasser et al. (2023). "Decoding Emotions: A Comprehensive Multilingual Study of Speech Models for SER." arXiv
- Ravanelli, M., et al. (2022). "Speaker Separation with Deep Generative Models." IEEE Transactions on Audio
- Babu, A., et al. (2023). "Exploration of Cross-Lingual Emotion Representations in Speech." Proceedings of ACL
- Gao, S., et al. (2023). "Advancements in Speech Models for Robust Multilingual Voice Processing." ACM Transactions
- Li, X., et al. (2023). "Multi-Speaker Voice Synthesis with Transformer Models." Journal of Artificial Intelligence Research
- Wu, H., et al. (2023). "Improved Speaker Embedding Techniques for Multi-Speaker Recognition." IEEE Signal Processing Letters
- Zhang, P., et al. (2022). "Cross-Domain Adaptation in Multilingual Voice Cloning." Transactions of Computational Linguistics
- Tran, M., et al. (2023). "Voice Cloning Fidelity in Multilingual Applications: Advances and Challenges." Speech Communication Review
- Li, J., et al. (2022). "Hybrid Systems for Emotion Recognition and Speaker Identification in Multilingual Settings." IEEE ICASSP
- Kim, T., et al. (2023). "Towards Ethical Voice Cloning: A Framework for Secure Applications." Ethics in AI Journal
- Ramirez, D., et al. (2023). "Real- Time Processing Techniques for SER and Voice Cloning." Journal of Real-Time Systems
- Patel, K., et al. (2023). "End-to-End Multilingual Models for Cross-Cultural Applications." International Journal of Linguistics and AI
- Nguyen, A., et al. (2023). "Speech Processing in Low-Resource Languages: Emotion and Speaker Recognition." Speech and Audio Processing Letters
- Verma, S., et al. (2023). "WaveNet Variants for Improved Multilingual Voice Synthesis." IEEE Journal of Selected Topics in Signal Processing
This paper presents a novel approach to
multilingual voice translation that integrates speech
emotion recognition, multi-speaker differentiation, and
voice cloning for cross-cultural applications. While
existing translation systems achieve basic linguistic
transformation, they often overlook critical elements like
speaker-specific identity and emotional tone. The
proposed system advances traditional models by
leveraging deep learning to distinguish multiple
speakers and recognize emotional states in multilingual
contexts, preserving vocal nuances across languages.
This study examines our model's architecture, evaluates
its components, and assesses the potential impact on
international communication, providing an innovative,
culturally sensitive translation solution.