Tri Ton

I am a second-year PhD student in Electrical Engineering of KAIST, South Korea, advised by Prof. Chang D Yoo. I received my B.E. degree in Electrical Engineering from the Ho Chi Minh City University of Technology, Vietnam. Email: huutri99.lhp (at) gmail (dot) com


News

  • March 2025: ITA-MDT has been accepted to CVPR 2025.
  • Jan 2025: MDSGen has been accepted to ICLR 2025.
  • April 2024: DualPath has been accepted to CVPR 2024 workshop.

Publications

TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis

TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis

Under Review, 2025

Tackle the video-guided audio synthesis by using adaptive representation alignment and onset conditioning

Advancing Temporal Coherence in Portrait Animation via Progressive Latent Temporal Diffusion Transformers

Advancing Temporal Coherence in Portrait Animation via Progressive Latent Temporal Diffusion Transformers

Under Review, 2025

Tackle the portrait animation by using progressive latent temporal diffusion transformers

ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On

ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On

CVPR, 2025

Tackle the image-based virtual try-on by using global garment context and fine-grained details

MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation

MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation

ICLR, 2025

Tackle the vision-guided open-domain sound generation by using masked diffusion trainsformer

Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation

Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation

CVPR workshop, 2024

Tackle the open-vocabulary 3D instance segmentation by using both 3D point clouds and 2D multi-view images