sh.itjust.works

28,474 readers
997 users here now

Useful Links

Rules:

Règles :

Fediseer
Fediseer
Matrix

Other UI options (more to come)

Monitoring Services
lemmy-meter.info

founded 2 years ago
ADMINS
1
 
 

https://github.com/vllm-project/vllm

vLLM is a fast and easy-to-use library for LLM inference and serving.

vLLM is fast with:

State-of-the-art serving throughput Efficient management of attention key and value memory with PagedAttention Continuous batching of incoming requests Optimized CUDA kernels vLLM is flexible and easy to use with:

Seamless integration with popular HuggingFace models High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more Tensor parallelism support for distributed inference Streaming outputs OpenAI-compatible API server

YouTube video describing it: https://youtu.be/1RxOYLa69Vw

2
view more: next ›