If all you care about is response times, you can easily do that by just using a smaller model. The quality of responses will be poor though, and it's not feasible to self host a model like chatgpt on consumer hardware.
For some quick math, a small Llama model is 7 billion parameters. Unquantized that's 4 bytes per parameter (32 bit floats), meaning it requires 28 billion bytes (28 gb) of memory. You can get that to fit in less memory with quantization, basically reducing quality for lower memory usage (use less than 32 bits per param, reducing both precision and memory usage)
Inference performance will still vary a lot depending on your hardware, even if you manage to fit it all in VRAM. A 5090 will be faster than an iPhone, obviously.
... But with a model competitive with ChatGPT, like Deepseek R1 we're talking about 671 billion parameters. Even if you quantize down to a useless 1 bit per param, that'd be over 83gb of memory just to fit the model in memory (unquantized it's ~2.6TB). Running inference over that many parameters would require serious compute too, much more than a 5090 could handle. This gets into specialized high end architectures to achieve that performance, and it's not something a typical prosumer would be able to build (or afford).
So the TL; DR is no