About


I am a final year Ph.D. candidate at Human-Computer Communications Laboratory (HCCL), The Chinese University of Hong Kong (CUHK), supervised by Prof. Helen Meng. Previously, I received my M.Phil. degree from the Institute of Automation, Chinese Academy of Sciences (CASIA), where I was supervised by Prof. Jie Tian. And I received my Bachelor's degree from Harbin Institute of Technology (HIT). I was a research intern at Microsoft Research Asia.

My research interests focus on language modeling for speech synthesis and the integration of speech with large language models. I am also working on speech processing and recognition.

🟢   I am currently on the job market. Please drop me emails!

Highlights


[May 2025]We present [MELLE] (ACL 2025 Main), a pioneer effort in continuous-value token-based language modeling for TTS
[May 2025]Three papers have been accepted to ISCA INTERSPEECH 2025
[Apr 2025]Our [paper] is the only winner of the 2025 IEEE Ganesh N. Ramaswamy Memorial Student Grant
[Jan 2025]  We present [ARLON] (ICLR 2025), boosting diffusion transformers with autoregressive models for long video generation
[Dec 2024]  Two papers have been accepted to IEEE ICASSP 2025, including one first-authored [paper]
[Dec 2024]  I participated in writing the excellent [survey] on Next Token Prediction Towards Multimodal Intelligence
[Jun 2024]  We propose [WavLLM], a robust and adaptive Speech LLM achieving SOTA performance on various speech-related tasks
[Jun 2024]  Three papers have been accepted to ISCA INTERSPEECH 2024, including one first-authored [paper]
[Jan 2024]  Two papers have been accepted to IEEE ICASSP 2024

Education


2021 - 2025, Ph.D., The Chinese University of Hong Kong (CUHK)
2018 - 2021, M.Phil., Pattern Recognition and Intelligent Systems, Institute of Automation, Chinese Academy of Sciences (CASIA)
2014 - 2018, B.Eng., Electrical Engineering, Harbin Institute of Technology (HIT)

Experiences


Selected Publications


[Google Scholar]

- Medical Image Analysis related journal papers:

Challenges


Activities


Serving as a reviewer for top journals and conferences, including IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), IEEE Journal of Selected Topics in Signal Processing (JSTSP), IEEE Journal of Biomedical and Health Informatics (JBHI), Pattern Recognition, International Conference on Learning Representations (ICLR), IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), ISCA INTERSPEECH, ACM Multimedia, among others.

Teaching Assistance


SEEM 3440, Operation Research II
AIST 3510 / SEEM 3510, Human-Computer Interaction

Miscellaneous


Some awards on control algorithm and circuit design competitions: