Exllamav2: Turboderp's Fantastical Quantization Method

This is a quantization method for graphics cards. It is also aimed at individual use. Exllamav2 is a great choice for people who wish to have high speed, and intend to only perform one instance at a time.

Thankfully, Turboderp, the creator, keeps his guide updated and also holds Exllama under an MIT license. You can find resources on how to get it installed and working for yourself, here (opens in a new tab).