Shih-Yang (Sean) Liu (original) (raw)

Research

I'm interested in model compression, efficient deep learning. Most of my research is about accelerating either the inference or training of deep learning model. Representative papers are highlighted.

	DoRA: Weight-Decomposed Low-Rank Adaptation Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen Proceedings of the 41th International Conference on Machine Learning (ICML), 2024 project page /arXiv /code We presented DoRA, a new parameter-efficient fine-tuning approach, which consistently outperforms LoRA in fine-tuning LLM without incurring additional inference costs. These improvements are particularly notable for smaller ranks with 37.2% improvement over LoRA for rank 8 and 22.4% improvement for rank 4.
	Oscillation-free Quantization for Low-bit Vision Transformers Shih-Yang Liu, Zechun Liu, Kwang-Ting Cheng Proceedings of the 40th International Conference on Machine Learning (ICML), 2023 Paper /Code In this study, we address weight oscillation in quantization-aware training and its negative impact on model performance. We propose three techniques: statistical weight quantization (StatsQ), confidence-guided annealing (CGA), and query-key reparameterization (QKR). These techniques improve quantization robustness and accuracy in the ViT model. The proposed 2-bit DeiT-T/DeiT-S algorithms outperform the previous state-of-the-art by 9.8% and 7.7%, respectively.
	LLM-FP4: 4-Bit Floating-Point Quantized Transformers Shih-Yang Liu, Zechun Liu, Xijie Huang, Pingcheng Dong, Kwang-Ting Cheng Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP Main), 2023 Paper /Code We introduced LLM-FP4, a novel post-training quantization framework which for the first time is capable of quantizing both the activation and weight of LLM to 4 bits without substantial loss in accuracy, outperforming previous methods by up to 13.1%.

DoRA: Weight-Decomposed Low-Rank Adaptation Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen Proceedings of the 41th International Conference on Machine Learning (ICML), 2024 project page /arXiv /code We presented DoRA, a new parameter-efficient fine-tuning approach, which consistently outperforms LoRA in fine-tuning LLM without incurring additional inference costs. These improvements are particularly notable for smaller ranks with 37.2% improvement over LoRA for rank 8 and 22.4% improvement for rank 4.

Oscillation-free Quantization for Low-bit Vision Transformers Shih-Yang Liu, Zechun Liu, Kwang-Ting Cheng Proceedings of the 40th International Conference on Machine Learning (ICML), 2023 Paper /Code In this study, we address weight oscillation in quantization-aware training and its negative impact on model performance. We propose three techniques: statistical weight quantization (StatsQ), confidence-guided annealing (CGA), and query-key reparameterization (QKR). These techniques improve quantization robustness and accuracy in the ViT model. The proposed 2-bit DeiT-T/DeiT-S algorithms outperform the previous state-of-the-art by 9.8% and 7.7%, respectively.

LLM-FP4: 4-Bit Floating-Point Quantized Transformers Shih-Yang Liu, Zechun Liu, Xijie Huang, Pingcheng Dong, Kwang-Ting Cheng Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP Main), 2023 Paper /Code We introduced LLM-FP4, a novel post-training quantization framework which for the first time is capable of quantizing both the activation and weight of LLM to 4 bits without substantial loss in accuracy, outperforming previous methods by up to 13.1%.

Thanks to Barron's website template.