Event Time Slot (UTC+2)
Workshop Introduction 10:30 to 10:50
Keynote Speech: AI for Creators: Pushing Creative Abilities to the Next Level
Dr. Yuki Misufuji (Distinguished Engineer at Sony Group Corporation, Japan)
This talk explores how cutting-edge generative AI is transforming creative workflows in music, cinema, and gaming. Led by Dr. Yuki Mitsufuji, the Music Foundation Model Team at Sony AI has developed multimodal frameworks such as MMAudio, which generate high-quality, synchronized audio from video and text inputs. Their research, recognized at top venues like NeurIPS, ICLR, and CVPR, has contributed to both content creation and protection, with practical demos integrated into commercial products. The session will highlight key innovations, including sound restoration projects and the future of AI-powered media production.
10:50 to 11:35
Coffee Break 11:35 to 11:50
Minus Oral presentations (20 minutes each, including Q&A)
  • Paper#1: Musitopia: Bridging Human Expertise and AI for Digital Music Therapy
  • Paper#2: Parametric analysis of feature specific neural coding during music imagery and perception
11:50 to 12:30
Special Talk: Subhrojyoti Roy Chaudhuri (TCS Research) 12:30 to 13:00
Lunch Break 13:00 to 14:30
Keynote Speech: AI & The Sound of Mental Health: Remixed Feelings
Professor. Björn Schuller (Professor of Artificial Intelligence and the Head of GLAM at Imperial College London/UK)
Mental health has a sound, and AI is beginning to hear and mix it. In the voice and language, we find traces of affect, depression, and recovery that can open new pathways for scalable and personalised care. Starting there, we will explore how speech-based digital psychology is expanding from diagnosis toward intervention: from vocal and linguistic biomarkers of mental well-being to large language models that support rather than simply assess, reaching to generative music systems that enable closed-loop personalised emotional regulation. This convergence of speech AI, interventive language technology, and generative audio suggests a future in which intelligent systems can listen, understand, and respond in psychologically meaningful ways. Realising that future, however, requires more than technical performance. It demands beyond clinical relevance reliability, safety, explainability, and trust. I will discuss the promise, route, and responsibility of building AI that does not merely analyse the mind, but helps care and deejay for it.
14:30 to 15:15
Coffee Break 15:15 to 15:30
Plus Oral presentations (20 minutes each, including Q&A)
  • Paper#3: voice2mode: Phonation Mode Classification in Singing using Self-Supervised Speech Models
  • Paper#4: Induced Mood Shift and Cognitive Adaptation During Free Play in a Changing Tonic Context/li>
15:30 to 16:10
Pit Stop 16:10 to 16:20
Brainstorming Session/Panel Discussion 16:20 to 17:00
Workshop Conclusion and Takeaways 17:00 to 17:20