Abstract: With the prevailing Mixture-of-Experts (MoE) architecture pushing the performance of Large Language Models (LLMs) to new limits, fine-tuning MoE models presents a significant challenge due ...