And the reason you run AI workloads on super-expensive GPUs is precisely that the GPUs have large quantities of extremely fast RAM.
If your RAM Is running at 70Gbytes per second, which is a pretty good measured performance from fast DDR5 on current desktop platforms, then even in int4 you're not going to get more than twenty tokens a second out of a 7B model; or two a second out of a 70B model which uses more than 32GB of platform memory.
(I don't have a very good idea why the model sizes are 7B, 13B, and 70B, rather than being just below the memory capacities of common GPUs - I'd have guessed that 7B was so you fit the model and a bit of extra data in a 16GB GPU, but the next bigger GPU is 24GB and the one after that 40GB, so I was expecting 11 and 18)