About


I am a research scientist at ByteDance Seed. Prior to joining ByteDance, I worked as a research intern at Microsoft Research Asia. I received my Ph.D. degree in speech & language processing from Human-Computer Communications Laboratory (HCCL), The Chinese University of Hong Kong (CUHK), supervised by Prof. Helen Meng.

My research interests focus on language modeling for speech synthesis and the integration of speech with large language models. I am also working on speech processing and recognition.

Highlights


[Aug 2025]I started my career at ByteDance Seed as a Research Scientist.
[Jul 2025]I defended my Ph.D. thesis
[Jul 2025]Two papers, [FELLE] & [PALLE], have been accepted to ACM Multimedia 2025
[May 2025]We present [MELLE] (ACL 2025 Main), a pioneer effort in continuous-value token-based language modeling for TTS
[May 2025]Three papers have been accepted to ISCA INTERSPEECH 2025
[Apr 2025]  Our [paper] is the only winner of the 2025 IEEE Ganesh N. Ramaswamy Memorial Student Grant
[Jan 2025]  We present [ARLON] (ICLR 2025), boosting diffusion transformers with autoregressive models for long video generation
[Dec 2024]  Two papers have been accepted to IEEE ICASSP 2025, including one first-authored [paper]
[Dec 2024]  I participated in writing the excellent [survey] on Next Token Prediction Towards Multimodal Intelligence
[Jun 2024]  We propose [WavLLM], a robust and adaptive Speech LLM achieving SOTA performance on various speech-related tasks
[Jun 2024]  Three papers have been accepted to ISCA INTERSPEECH 2024, including one first-authored [paper]
[Jan 2024]  Two papers have been accepted to IEEE ICASSP 2024

Experiences


Selected Publications


[Google Scholar]

- Medical Image Analysis related journal papers:

Challenges


Activities


Serving as a reviewer for top journals and conferences, including IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), IEEE Journal of Selected Topics in Signal Processing (JSTSP), IEEE Journal of Biomedical and Health Informatics (JBHI), Pattern Recognition, International Conference on Learning Representations (ICLR), IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), ISCA INTERSPEECH, ACM Multimedia, among others.

Teaching Assistance


2021 - 2025, SEEM 3440, Operation Research II
2021 - 2025, AIST 3510 / SEEM 3510, Human-Computer Interaction