Okay so I had a meltdown last year. I was staring down a startup that was circling the drain, I knew my time there was limited, and I was being bombarded daily with layoffs and friends not being able to find work, while hearing constantly that I was going to be left behind due to AI. (of course the layoffs were happening because tech CEOs heard AI and started frothing at the idea of getting rid of some of their most expensive staff)
So, I took it on myself to learn AI. I figured well, if it's coming for my job I might as well learn how it works. And oh lorde, did I learn a lot. To the point where I'm running several LLMs now at home, I have them running in k3s, across multiple servers, and have built several apps to interact with them. I've trained finetuned LLMs, I've played with image generation, voices, I dove headfirst in. Eventually I did lose that job, and that gave me a couple months to focus even more before finding my current one.
My biggest learnings, which I'm sure many of you know:
- AI is a very neat technology, and it has several real applications, but those applications are extremely limited by the limitations of AI
- LLMs and AI are incredibly hard to control. You can't just say
if(nsfw) dont()
. You have to spend a lot of time forcing the LLM to not give weight to the users course, and it's to the point that it hardly seems worth it. - Companies love the idea of LLMs for their chatbots, but using the above it's incredibly hard to prevent the chatbot from doing it's own thing. You can say "We only have a return policy of 14 days" but if someone works at it hard enough and tells the LLM that it can still perform the return because they say so... is it really that useful? LLMs have no hard rules
- LLMs have very real hardware restrictions. At home, it's a single GPU. In the cloud they have some clever tricks to share memory, but overall it's still mostly limited by the GPU. We'll see as we're moving forward what shenanigans NVidia comes up with, but LLMs and AI are essentially a brute force approach, and we can see that in how much power they soak up. You can see ChatGPT slows down once your conversation goes on long enough, it's running low on memory
- AI is not new. It's still ML under the hood, it's just coming up with unique ways to reuse it. Again with the brute force, I didn't realize that for every token (word for simplicity), You're entire conversation is passed in to the model, in which it will spit out one more word. Repeat for every word. That's all it is. It's just predicting the next word. Image generation is just predicting the next pixels, and then loops around again until it comes back. There is no consciousness, there's no real nuance to it, that's it. A predictive engine surrounded with a
while
loop.
There's more but this is too long already. It's neat, it's useful, but the hype was just as intense as blockchain. We're going to see some real great usages out of it, like integration with something like Word or a browser to summarize things is honestly a good idea. But there are so so so many pitfalls.
For coding? I think it's a great place to get started, or to get an idea. I would never trust it in production. It will take a very long time for us to get to the point where you can say "Go build this feature" and I would blindly trust what it generated.