LocalLLaMA

2269 readers
3 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago
MODERATORS
126
 
 

They still consider it a beta but there we go! It's happening :D

127
 
 

IMO, a very sane and well-informed explanation of the challenges of using generative ai to create specific imagery.

128
 
 

cross-posted from: https://lemmy.world/post/3439370

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions

A wild new GitHub Repo has appeared!

Today we cover Cheetah - an exciting new take on interleaving image and text context & instruction.

For higher quality images, please visit the main projects repo to see their code and approach in all of their glory.

I4 Benchmark

To facilitate research in interleaved vision-language instruction following, we build I4 (semantically Interconnected, Interleaved Image-Text Instruction-Following), an extensive large-scale benchmark of 31 tasks with diverse instructions in a unified instruction-response format, covering 20 diverse scenarios.

I4 has three important properties:

  • Interleaved vision-language context: all the instructions contain sequences of inter-related images and texts, such as storyboards with scripts, textbooks with diagrams.
  • Diverse forms of complex instructions: the instructions range from predicting dialogue for comics, to discovering differences between surveillance images, and to conversational embodied tasks.
  • Vast range of instruction-following scenarios: the benchmark covers multiple application scenarios, including cartoons, industrial images, driving recording, etc.

Cheetor: a multi-modal large language model empowered by controllable knowledge re-injection

Cheetor is a Transformer-based multi-modal large language model empowered by controllable knowledge re-injection, which can effectively handle a wide variety of interleaved vision-language instructions.

Cases

Cheetor demonstrates strong abilities to perform reasoning over complicated interleaved vision-language instructions. For instance, in (a), Cheetor is able to keenly identify the connections between the images and thereby infer the reason that causes this unusual phenomenon. In (b, c), Cheetor can reasonably infer the relations among the images and understand the metaphorical implications they want to convey. In (e, f), Cheetor exhibits the ability to comprehend absurd objects through multi-modal conversations with humans.

Getting Started

1. Installation

Git clone our repository and creating conda environment:

git clone https://github.com/DCDmllm/Cheetah.git
cd Cheetah/Cheetah
conda create -n cheetah python=3.8
conda activate cheetah
pip install -r requirement.txt

2. Prepare Vicuna Weights and Llama2 weights

The current version of Cheetor supports Vicuna-7B and LLaMA2-7B as the language model. Please first follow the instructions to prepare Vicuna-v0 7B weights and follow the instructions to prepare LLaMA-2-Chat 7B weights.

Then modify the llama_model in the Cheetah/cheetah/configs/models/cheetah_vicuna.yaml to the folder that contains Vicuna weights and modify the llama_model in the Cheetah/cheetah/configs/models/cheetah_llama2.yaml to the folder that contains LLaMA2 weights.

3. Prepare the pretrained checkpoint for Cheetor

Download the pretrained checkpoints of Cheetah according to the language model you prepare:

Checkpoint Aligned with Vicuna 7B Checkpoint Aligned with LLaMA2 7B
Download Download

For the checkpoint aligned with Vicuna 7B, please set the path to the pretrained checkpoint in the evaluation config file in Cheetah/eval_configs/cheetah_eval_vicuna.yaml at Line 10.

For the checkpoint aligned with LLaMA2 7B, please set the path to the pretrained checkpoint in the evaluation config file in Cheetah/eval_configs/cheetah_eval_llama2.yaml at Line 10.

Besides, Cheetor reuses the pretrained Q-former from BLIP-2 that matches FlanT5-XXL.

4. How to use Cheetor

Examples of using our Cheetah model are provided in files Cheetah/test_cheetah_vicuna.py and Cheetah/test_cheetah_llama2.py. You can test your own samples following the format shown in these two files. And you can run the test code in the following way (taking the Vicuna version of Cheetah as an example):

python test_cheetah_vicuna.py --cfg-path eval_configs/cheetah_eval_vicuna.yaml --gpu-id 0

And in the near future, we will also demonstrate how to launch the gradio demo of Cheetor locally.


ChatGPT-4 Breakdown:

Imagine a brilliant detective who has a unique skill: they can understand stories told not just through spoken or written words, but also by examining pictures, diagrams, or comics. This detective doesn't just listen or read; they also observe and link the visual clues with the narrative. When given a comic strip without dialogues or a textbook diagram with some text, they can deduce what's happening, understanding both the pictures and words as one unified story.

In the world of artificial intelligence, "Cheetor" is that detective. It's a sophisticated program designed to interpret and respond to a mix of images and texts, enabling it to perform tasks that require both vision and language understanding.

Projects to Try with Cheetor:

Comic Story Creator: Input: A series of related images or sketches. Cheetor’s Task: Predict and generate suitable dialogues or narratives to turn those images into a comic story.

Education Assistant: Input: A page from a textbook containing both diagrams and some accompanying text. Cheetor’s Task: Answer questions based on the content, ensuring it considers both the visual and written information.

Security Analyst: Input: Surveillance footage or images with accompanying notes or captions. Cheetor’s Task: Identify discrepancies or anomalies, integrating visual cues with textual information.

Drive Safety Monitor: Input: Video snippets from a car's dashcam paired with audio transcriptions or notes. Cheetor’s Task: Predict potential hazards or traffic violations by understanding both the visual and textual data.

Art Interpreter: Input: Art pieces or abstract images with associated artist's notes. Cheetor’s Task: Explain or interpret the art, merging the visual elements with the artist's intentions or story behind the work.


This is a really interesting strategy and implementation! A model that can interpret both natural language text with high quality image recognition and computer vision can lead to all sorts of wild new applications. I am excited to see where this goes in the open-source community and how it develops the rest of this year.

129
130
 
 

So I was playing around with some coding models and getting disappointed in the responses. I started using starcoderplus-guanaco-gpt4, and after some tinkering I just wanted to share the importance of formatting your prompt correctly

I asked to provide a way to rate limit a function in python based on the input to the function so that it doesn't repeat identical output too often

I used the following prompt:

Create a python function that takes a string as input and prints that string. The function should be rate limited so that any specific string is not printed more than once every two minutes. This means it must keep track of the last time that it printed a specific string.

However, I used it in the chat-completion UI of text-generation-webui, and this was the useless reply I got:

chat-prompt-starcoder

Obviously completely useless to me

But then I realized that this model expects to follow instructions, not a chat, so I went over to the instruction template so now this was my "prompt":

### Instruction: Create a python function that takes a string as input and prints that string. The function should be rate limited so that any specific string is not printed more than once every two minutes. This means it must keep track of the last time that it printed a specific string.

### Response:

And lo and behold, a very competent useful reply!

As you can see, even if you follow the proper concept for instruct (providing it as instructions 'Create a python function that..' rather than 'I need a function that..'), you still need to be sure to follow the proper template structure.

And most interestingly of all, giving the same prompt to chatgpt gets what I consider to be a worse answer:

It's very similar but to my eye distinctly overengineered, I find the solution from starcoder much more closely answers my question with only a couple lines of code to change my existing function. YMMV, but the TLDR is that you should make sure to follow the proper prompt and template formats to get the best replies from your model

131
132
28
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama
 
 

Places above all 13Bs, as well as above llama1-65b on the HuggingFace leaderboard

133
 
 

Still pretty new to local LLMs, and there's been a lot of development since I dipped my toe in. Suffice to say I'm fairly swamped and looking for guidance to the right model for my use

I want to feed the model sourcebooks, so I can ask it game mechanic questions and it will respond with reasonable accuracy (including page references). I tried this with privateGPT a month or two back, and it kinda worked but it was slow and wonky. It seems like things are a bit cleaner now

134
 
 

Been a lot of good work done the past week by several pivotal members, and now the boss is back and focused on it, going to be a very breaking change but I'm really excited where this will lead us!

135
 
 

I'm really sorry that this is probably out of place, as it's not strictly LLAMA, but I couldn't think of anywhere else to post it where people may be able to help.

**Sadly, my grandma passed away yesterday. **It prompted me to retrieve some old photos that my parents stashed in the loft over a decade ago, and they are just incredible! I've found so many pictures of her, going back to when she was really young, to a point where I'll have to check if they are all definitely her! But there's amazing ones of my dad and uncles when they were little, even my nan with my dad and me when I'd literally just been born!

There's lots of really wonderful family moments and slice-of-life history captured there, and I don't think anyone knows they exist.

Mostly, I have enough funny photos to have birthday cards sorted for the several lifetimes!

I want to get these ALL scanned and digitised. But for now, I'm separating out ones with my nan in them to be top priority. Most of the services available have long turnarounds and, while the prices are far from extortionate, I'm looking for something which prices by the KG. Seriously, there must be over 20Kg of photo in here. They're mostly 6x4, so you'd want to have them done at a high DPI, the cost would be astronomical, especially as I already plan to spend a lot on printing.

So, I'd like to do this myself. I've dabbled with some LLM stuff, but I don't really know where to start with image manipulation, and I don't really have time to figure it out, so I'm asking for some guidance.

The rough idea is:

  1. Scan photos on high DPI flatbed scanner. Fill up the bed each time with multiple photos for highest speed

  2. 'Parse' the scan into multiple image files by identifying where the bounds are and cropping. This should be pretty simple. I think this may be possible in OpenCV? I've never tried before. Otherwise, it has to be one of the simpler jobs for a ML tool. I understand how object recognition works in principle, but not practice.

  3. Run select images through an upscaler. Some we may want to display or print larger, and if it's not unreasonable, if we're going to digitise them, may as well make them as high res as we can (I'll keep the originals). I know the usualy 'zoomify' caveats, but obviously when I'm starting with 6x4 images, I'll take all the help I can get.

  4. Ideally attempt to 'clean up' the images. I don't think I'd want to colourise anything, but it would be nice to remove obvious stains, creases and such from the print.

  5. The plan is to have some albums for people to go through at the wake, print off plenty of extra 6x4 copies so people can just grab them and take them away, but I'll also stick them all on a google drive or something and make the link available so people can download them in higher res, and more people will be able to see and preserve them

My main concern is 2 and 3. 4 would be a bonus. I suspect the shortlist will be 100-200 photos, but processing everything will be an ongoing process. I just want the shortlist done before the funeral.

I'd really rather run locally if viable. I have a two machines that may be able to contribute:

Workstation

  • RTX 3090 -32Gb DDR4 --> 64GB DDR5
  • 5600x. --> 7900x

I'm actually planning an upgrade very soon, which I can push through fast if it will help.

R720XD

  • 128GB DDR3 RAM
  • 2x E5-2630 v2.

Probably not super useful, but has lots of RAM, and can be used as a slow workhorse. My workstation is on CAD for 8 hours a day, and is used for gaming in the evening. I can obviously cut out the gaming, but not the CAD, so there could be some utility in handing off some things to the server, if the workload doesn't require GPU.

If you can point me in the right direction to get set up, that would be awesome. I'm new at this and learning fast, but I don't want to under-deliver. I'm very capable of learning the details, but I don't have enough experience to determine the 'best way' of doing something like this.

Any help would be incredible!

136
137
 
 

PR 3313 has been merged with this commit

This is pretty great for those looking for performant llama.cpp/falcon/mpt offloading, and just in general a good CPU inference tool, wanting to use text-generation-webui

For those unaware, this is the ctransformers repo

And for anyone looking for an updated docker image, I have provided an image here on dockerhub

and as always my git repo with instructions can be found here on github

Happy inferencing :)

138
14
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama
 
 

I am just learning in this space and I could be wrong about this one, but... The GGML and GPTQ models are nice for getting started with AI in Oobabooga. The range of models available is kinda odd to navigate and understand in context as far as how they compare and all the different quantization types, settings, and features. I still don't understand a lot of it. One of the main aspects I didn't (still don't fully) understand are how some models do not have a quantization stated like GGML/GPTQ, but still work using Transformers. I tried some of these by chance at first, and avoided them because they take longer to initially load.

Yesterday I created my first LoRAs and learned through trial and error, the only models I can use to train a LoRA on are the ones that use Transformers, and can be set to 8bit mode. Even using GGML/GPTQ models with 8 bit quantization, I could not use them to make a LoRA. It could be my software setup, but I think there is either a fundamental aspect of these models I haven't learned yet, or it is a limitation in Oobabooga's implementation. Either way, the key takeaway is to try making a LoRA with a Transformers based model loaded in Oobabooga, and be sure the "load in 8 bit" box is checked.

I didn't know what to expect with this, and haven't come across many examples, so I put off trying this until now. I have an 12th gen i7 with 20 logical cores and a 16GBV 3080Ti in a laptop. I can convert an entire novel into a text file and load this as raw text (tab) for training in Oobabooga using the default settings. If my machine has some assistance with cooling, I can create the LoRA in 40 minutes using the default settings and a 7B model. This has a mild effect. IIRC the default weight of the LoRA network is 32. If this is turned up to 96-128, it will have a more noticeable effect on personality. It still won't substantially improve the Q&A accuracy, but it may improve the quality to some extent.

I first tested with a relatively small Wikipedia article on Leto II (Dune character) formatted for this purpose manually. This didn't change anything substantially. Then I tried with the entire God Emperor of Dune e-book as raw text. This had garbage results, probably due to all the nonsense before the book even starts, and the terrible text formatting extracted from an eBook. The last dataset I tried was the book text only, with everything reflowed using a Linux bash script I wrote to alter newline characters, spacing, and remove page gaps. Then I manually edited with find and replace to remove special characters and any formatting oddballs I could find. This was the first LoRA I made where the 7B model's tendency to hallucinate seemed more evident than issues with my LoRA. For instance, picking a random name of an irrelevant character that occurs 3 times in 2 sentences of the LoRA text and prompting about it results in random unrelated output. The overall character identity is also weak despite a strong character profile and a 1.8MB text file for the LoRA.

This is just the perspective from a beginner's first attempt. Actually tuning this with a bit of experience will produce far better results. I'm just trying to say, if you're new to this and just poking around, try making a LoRA. It is quite easy to do.

139
 
 

From the blog:

In this blog, we provide a thorough analysis and a practical guide for fine-tuning. We examine the Llama-2 models under three real-world use cases, and show that fine-tuning yields significant accuracy improvements across the board (in some niche cases, better than GPT-4). Experiments were carried out with this script.

140
 
 

txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows

Seems like a great resource for all things embeddings related, give it a look!

141
18
submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama
 
 

These are the full weights, the quants are incoming from TheBloke already, will update this post when they're fully uploaded

From the author(s):

WizardLM-70B V1.0 achieves a substantial and comprehensive improvement on coding, mathematical reasoning and open-domain conversation capacities.

This model is license friendly, and follows the same license with Meta Llama-2.

Next version is in training and will be public together with our new paper soon.

For more details, please refer to:

Model weight: https://huggingface.co/WizardLM/WizardLM-70B-V1.0

Demo and Github: https://github.com/nlpxucan/WizardLM

Twitter: https://twitter.com/WizardLM_AI

GGML quant posted: https://huggingface.co/TheBloke/WizardLM-70B-V1.0-GGML

GPTQ quant repo posted, but still empty (GPTQ is a lot slower to make): https://huggingface.co/TheBloke/WizardLM-70B-V1.0-GPTQ

142
 
 

Refactored codebase - now a single unified turbopilot binary that provides support for codegen and starcoder style models.

Support for starcoder, wizardcoder and santacoder models

Support for CUDA 11 and 12

Seems interesting, looks like it supports wizardcoder with GPU offloading, if starcoder also has GPU offloading then that would be great but I would need to test. If it also works with the new stabilityAI coding models that would be very interesting

143
4
submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama
 
 

Text from them:

Calling all model makers, or would-be model creators! Chai asked me to tell you all about their open source LLM leaderboard:

Chai is running a totally open LLM competition. Anyone is free to submit a llama based LLM via our python-package 🐍 It gets deployed to users on our app. We collect the metrics and rank the models! If you place high enough on our leaderboard you'll win money 🥇

We've paid out over $10,000 in prizes so far. 💰

Come to our discord and check it out!

https://discord.gg/chai-llm

Link to latest board for the people who don't feel like joining a random discord just to see results:

https://cdn.discordapp.com/attachments/1134163974296961195/1138833170838589471/image1.png

144
 
 

Stability AI released three new 3b models for coding:

  • stablecode-instruct-alpha-3b (context length 4k)
  • stablecode-completion-alpha-3b-4k (context length 4k)
  • stablecode-completion-alpha-3b (context length 16k)

I didn't try any of them yet, since I'm waiting for the GGML files to be supported by llama.cpp, but I think especially the 16k model seems interesting. If anyone wants to share their experience with it, I'd be happy to hear it!

145
146
14
submitted 1 year ago* (last edited 1 year ago) by AsAnAILanguageModel to c/localllama
 
 

I think it's a good idea to share experiences about LLMs here, since benchmarks can only give a very rough overview on how well a model performs.

So please share how much you're using LLMs, what you use them for and how they well they perform at those tasks. For example, here are my answers to these questions:

Usage

I use LLMs daily for work and for random questions that I would previously use web search for.

I mainly use LLMs for reasoning heavy tasks, such as assisting with math or programming. Other frequent tasks include proofreading, helping with bureaucracy, or assisting with writing when it matters.

Models

The one I find most impressive at the moment is TheBloke/airoboros-l2-70B-gpt4-1.4.1-GGML/airoboros-l2-70b-gpt4-1.4.1.ggmlv3.q2_K.bin. It often manages to reason correctly on questions where most other models I tried fail, even though most humans wouldn't. I was surprised that something using only 2.5 bits per weight on average could produce anything but garbage. Downsides are that loading times are rather long, so I wouldn't ask it a question if I didn't want to wait. (Time to first token is almost 50s!). I'd love to hear how bigger quantizations or the unquantized versions perform.

Another one that made a good impression on me is Qwen-7B-Chat (demo). It manages to correctly answer some questions where even some llama2-70b finetunes fail, ~~but so far I'm getting memory leaks when running it on my M1 mac in fp16 mode, so I didn't use it a lot.~~ (this has been fixed it seems!)

All other models I briefly tried where not too useful. It's nice to be able to run them locally, but they were so much worse than chatGPT that it's often not even worth it to consider using them.

147
 
 

I want to train, or more likely fine-tune, a model on about 20 years worth of email and text data that I've collected.

The goal would be to train it how to respond like me in simple cases.

It's there a particular base model I should start with?

I'm also interested in anyone's experience in doing this kind of thing themselves.

148
 
 

You are probably familiar with the long list of various benchmarks that new models are tested on and compared against. These benchmarks are supposedly designed to assess the model's ability to perform in various aspects of language understanding, logical reasoning, information recall, and so on.

However, while I understand the need for an objective and scientific measurement scale, I have long felt that these benchmarks are not particularly representative of the actual experience of using the models. For example, people will claim that a model performs at "some percentage of GPT-3" and yet not one of these models has ever been able to produce correctly-functioning code for any non-trivial task or follow a line of argument/reasoning. Talking to GPT-3 I have felt that the model has an actual in-depth understanding of the text, question, or argument, whereas other models that I have tried always feel as though they have only a superficial/surface-level understanding regardless of what the benchmarks claim.

My most recent frustration, and the one that prompted this post, is regarding the newly-released OpenOrca preview 2 model. The benchmark numbers claim that it performs better than other 13B models at the time of writing, supposedly outperforms Microsoft's own published benchmark results for their yet-unreleased model, and scores an "average" result of 74.0% against GPT-3's 75.7% while the LLaMa model that I was using previously apparently scores merely 63%.

I've used GPT-3 (text-davinci-003), and this model does not "come within comparison" of it. Even giving it as much of a fair chance as I can, giving it plenty of leeway and benefit of the doubt, not only can it still not write correct code (or even valid code in a lot of cases) but it is significantly worse at it than LLaMa 13B (which is also pretty bad). This model does not understand basic reasoning and fails at basic reasoning tasks. It will write a long step-by-step explanation of what it claims that it will do, but the answer itself contradicts the provided steps or the steps themselves are wrong/illogical. The model has only learnt to produce "step by step reasoning" as an output format, and has a worse understanding of what that actually means than any other model does when asked to "explain your reasoning" (at least, for other models that I have tried, asking them to explain their reasoning produces at least a marginal improvement in coherence).

There is something wrong with these benchmarks. They do not relate to real-world performance. They do not appear to be measuring a model's ability to actually understand the prompt/task, but possibly only measuring its ability to provide an output that "looks correct" according to some format. These benchmarks are not a reliable way to compare model performance and as long as we keep using them we will keep producing models that score higher on benchmarks and claim to perform "almost as good as GPT-3" but yet fail spectacularly in any task/prompt that I can think of to throw at them.

(I keep using coding as an example however I have also tried other tasks besides code as I realise that code is possibly a particularly challenging task due to requirements like needing exact syntax. My interpretation of the various models' level of understanding is based on experience across a variety of tasks.)

149
10
submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama
 
 

As some may know I maintain a few docker images of some available tools, and I noticed I was suddenly getting NVML mismatch, and for the life of me I could not figure out what the issue was, tried so many things, finally noticed that the docker image had some special drive 535.86.10 where my host had 535.86.05, after figuring that out I looked into it and added this to my Dockerfile:

RUN apt-get update && apt-get remove --purge -y nvidia-* && \ apt-get install -y --allow-downgrades nvidia-driver-535/jammy-updates

And voila, problem solved! Not sure what driver the docker CUDA was using, might be some special dev driver and it was causing a mismatch between the container and the host

Only started happening as of the latest driver update released late last month

150
9
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama
 
 

I wanted to make this post so we can share all the resources we have with each other on anything machine learning related.

Please feel free to add all of your resources as well even if they are duplicates.

PS: The best way to grow our lemmy community is to produce high quality posts.

Some ideas of things you could share:

  • What people do you follow for AI? Such as on YT, Twitter, etc.
  • What other social media forums provide great information?
  • What GUI do you use for local LLMs?
  • What parameters are "best"?
  • Is there a Wiki you use?
  • Where do you go to learn about LLMs/AI/Machine Learning?
  • How do you find quality models?
  • What Awesome github repositories do you know?
  • What do you think would be useful to share?

General Information - Awesome

LLM Leaderboards:

Places to Find Models

Training & Datasets

There are still many more resources out there I'm sure. Please share what you use to try to keep up with the fast pace of AI development.

I hope some of my resources have helped you! I'm eager to hear what other resources are out there!

view more: ‹ prev next ›