Abstract: Multimodal speech emotion recognition (SER) has emerged as pivotal for improving human–machine interaction. Researchers are increasingly leveraging both speech and textual information ...