awq-quantization

Orchestra-Research · Development

4位量化激活感知压缩技术,实现4比特大模型压缩,速度提升3倍且精度损失极小。适用于在有限GPU内存部署大规模模型(7B-70B),需要比GPTQ更快推理且更好保留精度,或用于指令微调与多模态模型。获MLSys 2024最佳论文奖。

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

npx skills add https://github.com/Orchestra-Research/AI-Research-SKILLs --skill awq

星标 9626 · 安装量 1

GitHub · SkillBox 全部技能