this post was submitted on 15 Nov 2023
7 points (100.0% liked)

Self-Hosted Main

511 readers
1 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

For Example

We welcome posts that include suggestions for good self-hosted alternatives to popular online services, how they are better, or how they give back control of your data. Also include hints and tips for less technical readers.

Useful Lists

founded 1 year ago
MODERATORS
 

Hello friends,

I’m pretty deep into self-hosting - especially on the home automation side. I’ve got a couple of options for self-hosted AI, but I don’t think they’ll meet my long term goals:

  • Coral TPUs: I have 2x processing my Frigate data. These seem fine for that purpose, but not useful for generative AIs?

  • Jetson Nano: Near as I can tell nothing supports these things except DeepStack, which appears to be abandoned. Bummed these haven’t gotten broader support in the community.

I’ve got plenty of rack space and my day job is managing thousands of machines, so not afraid of a more technical setup.

The used NVIDIA rack mounted Tesla GPU servers look interesting. What are y’all using?

Requirements:

  • Rack mounted
  • Supports local LLM and GenAI
  • Linux-based
  • Works with Docker
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 1 year ago (1 children)

It depends.

What is your budget? And what hardware/hypervisor do you have?

And what specifically are you looking to do with “generative AI?” Ugh…I hate that term.

There are two key things to keep in mind about rack-mount GPUs. First, you need servers that are specifically built to host most GPUs in the factory. Almost all of NVIDIA’s server-grade GPUs are passively cooled, so the servers need to have a fan configuration to cool the GPUs. And except for the lowest end server GPUs (P4/T4/A2/L4 - all Inference cards and over $1000 per card) which draw less than the 75 watts provided by the PCI slot, all of the GPUs require at least 150 watts, molex power connectors and higher wattage power supplies.

And most of the drivers and docker/kubernetes plugins for these GPUs are locked behind NVIDIA licensing.

You’d want something that is at least Pascal-generation, but the Turing or newer cards are better.

Your better bet is to get a rack-mount workstation (which is basically a server anyway) and stick a higher-end Quadro or GeForce 30x0 card in there.

Edit: I never answered what I have - an R730 factory built for GPUs with a pair of Tesla P4 cards. I originally built it to play with GPUs for VDI.

[–] [email protected] 1 points 11 months ago

Much appreciated — I think the rack mounted desktop GPU approach is best for now. Another commenter suggested we should see better options in 1-2 years and I strongly suspect they’re correct.