AWQ: The Enterprise Format

AWQ is a quantization method that makes it easier to host your model to mulitple users, at once. So, if you have a SOTA model and the resources to provide inference, you can quantize your model to AWQ and set it up on your servers.

To quantize in AWQ, you can access the github page for it (opens in a new tab), it includes the directions for installation, as well as usage of the program.