LocalLLaMA

2292 readers
1 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago
MODERATORS
126
 
 

Meta just released a multimodal model for speech translation. It can do speech recognition, translation into text and speech. Supporting nearly 100 input and output languages (35 for speech output). Seamless M4T is released under CC BY-NC 4.0

Abstract

What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems composed of multiple subsystems performing translation progressively, putting scalable and high-performing unified speech translation systems out of reach. To address these gaps, we introduce SeamlessM4T—Massively Multilingual & Multimodal Machine Translation—a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations, dubbed SeamlessAlign. Filtered and combined with human labeled and pseudo-labeled data (totaling 406,000 hours), we developed the first multilingual system capable of translating from and into English for both speech and text. On Fleurs, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous state-of-the-art in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. On CVSS and compared to a 2-stage cascaded model for speech-to-speech translation, SeamlessM4T-Large’s performance is stronger by 58%. Preliminary human evaluations of speech-to-text translation outputs evinced similarly impressive results; for translations from English, XSTS scores for 24 evaluated languages are consistently above 4 (out of 5). For into English directions, we see significant improvement over WhisperLarge-v2’s baseline for 7 out of 24 languages. To further evaluate our system, we developed Blaser 2.0, which enables evaluation across speech and text with similar accuracy compared to its predecessor when it comes to quality estimation. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks (average improvements of 38% and 49%, respectively) compared to the current state-of-the-art model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Compared to the state-of-the-art, we report up to 63% of reduction in added toxicity in our translation outputs. Finally, all contributions in this work—including models, inference code, finetuning recipes backed by our improved modeling toolkit Fairseq2, and metadata to recreate the unfiltered 470,000 hours of SeamlessAlign — are open-sourced and accessible at https://github.com/facebookresearch/seamless_communication.

127
 
 

Hugging Face released IDEFICS, an 80B open-access visual language model replicating DeepMind's unreleased Flamingo. Built entirely on public data, it's the first of its size available openly. Part of its training utilized OBELICS, a dataset with 141M web pages, 353M images, and 115B text tokens from Common Crawl.

128
 
 

With release 0.5.0, PEFT now officially supports fine tuning quantized GPTQ models! This is a pretty big deal as it allows you to download a much smaller model for fine tuning!

129
 
 

They still consider it a beta but there we go! It's happening :D

130
 
 

IMO, a very sane and well-informed explanation of the challenges of using generative ai to create specific imagery.

131
 
 

cross-posted from: https://lemmy.world/post/3439370

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions

A wild new GitHub Repo has appeared!

Today we cover Cheetah - an exciting new take on interleaving image and text context & instruction.

For higher quality images, please visit the main projects repo to see their code and approach in all of their glory.

I4 Benchmark

To facilitate research in interleaved vision-language instruction following, we build I4 (semantically Interconnected, Interleaved Image-Text Instruction-Following), an extensive large-scale benchmark of 31 tasks with diverse instructions in a unified instruction-response format, covering 20 diverse scenarios.

I4 has three important properties:

  • Interleaved vision-language context: all the instructions contain sequences of inter-related images and texts, such as storyboards with scripts, textbooks with diagrams.
  • Diverse forms of complex instructions: the instructions range from predicting dialogue for comics, to discovering differences between surveillance images, and to conversational embodied tasks.
  • Vast range of instruction-following scenarios: the benchmark covers multiple application scenarios, including cartoons, industrial images, driving recording, etc.

Cheetor: a multi-modal large language model empowered by controllable knowledge re-injection

Cheetor is a Transformer-based multi-modal large language model empowered by controllable knowledge re-injection, which can effectively handle a wide variety of interleaved vision-language instructions.

Cases

Cheetor demonstrates strong abilities to perform reasoning over complicated interleaved vision-language instructions. For instance, in (a), Cheetor is able to keenly identify the connections between the images and thereby infer the reason that causes this unusual phenomenon. In (b, c), Cheetor can reasonably infer the relations among the images and understand the metaphorical implications they want to convey. In (e, f), Cheetor exhibits the ability to comprehend absurd objects through multi-modal conversations with humans.

Getting Started

1. Installation

Git clone our repository and creating conda environment:

git clone https://github.com/DCDmllm/Cheetah.git
cd Cheetah/Cheetah
conda create -n cheetah python=3.8
conda activate cheetah
pip install -r requirement.txt

2. Prepare Vicuna Weights and Llama2 weights

The current version of Cheetor supports Vicuna-7B and LLaMA2-7B as the language model. Please first follow the instructions to prepare Vicuna-v0 7B weights and follow the instructions to prepare LLaMA-2-Chat 7B weights.

Then modify the llama_model in the Cheetah/cheetah/configs/models/cheetah_vicuna.yaml to the folder that contains Vicuna weights and modify the llama_model in the Cheetah/cheetah/configs/models/cheetah_llama2.yaml to the folder that contains LLaMA2 weights.

3. Prepare the pretrained checkpoint for Cheetor

Download the pretrained checkpoints of Cheetah according to the language model you prepare:

Checkpoint Aligned with Vicuna 7B Checkpoint Aligned with LLaMA2 7B
Download Download

For the checkpoint aligned with Vicuna 7B, please set the path to the pretrained checkpoint in the evaluation config file in Cheetah/eval_configs/cheetah_eval_vicuna.yaml at Line 10.

For the checkpoint aligned with LLaMA2 7B, please set the path to the pretrained checkpoint in the evaluation config file in Cheetah/eval_configs/cheetah_eval_llama2.yaml at Line 10.

Besides, Cheetor reuses the pretrained Q-former from BLIP-2 that matches FlanT5-XXL.

4. How to use Cheetor

Examples of using our Cheetah model are provided in files Cheetah/test_cheetah_vicuna.py and Cheetah/test_cheetah_llama2.py. You can test your own samples following the format shown in these two files. And you can run the test code in the following way (taking the Vicuna version of Cheetah as an example):

python test_cheetah_vicuna.py --cfg-path eval_configs/cheetah_eval_vicuna.yaml --gpu-id 0

And in the near future, we will also demonstrate how to launch the gradio demo of Cheetor locally.


ChatGPT-4 Breakdown:

Imagine a brilliant detective who has a unique skill: they can understand stories told not just through spoken or written words, but also by examining pictures, diagrams, or comics. This detective doesn't just listen or read; they also observe and link the visual clues with the narrative. When given a comic strip without dialogues or a textbook diagram with some text, they can deduce what's happening, understanding both the pictures and words as one unified story.

In the world of artificial intelligence, "Cheetor" is that detective. It's a sophisticated program designed to interpret and respond to a mix of images and texts, enabling it to perform tasks that require both vision and language understanding.

Projects to Try with Cheetor:

Comic Story Creator: Input: A series of related images or sketches. Cheetor’s Task: Predict and generate suitable dialogues or narratives to turn those images into a comic story.

Education Assistant: Input: A page from a textbook containing both diagrams and some accompanying text. Cheetor’s Task: Answer questions based on the content, ensuring it considers both the visual and written information.

Security Analyst: Input: Surveillance footage or images with accompanying notes or captions. Cheetor’s Task: Identify discrepancies or anomalies, integrating visual cues with textual information.

Drive Safety Monitor: Input: Video snippets from a car's dashcam paired with audio transcriptions or notes. Cheetor’s Task: Predict potential hazards or traffic violations by understanding both the visual and textual data.

Art Interpreter: Input: Art pieces or abstract images with associated artist's notes. Cheetor’s Task: Explain or interpret the art, merging the visual elements with the artist's intentions or story behind the work.


This is a really interesting strategy and implementation! A model that can interpret both natural language text with high quality image recognition and computer vision can lead to all sorts of wild new applications. I am excited to see where this goes in the open-source community and how it develops the rest of this year.

132
133
 
 

So I was playing around with some coding models and getting disappointed in the responses. I started using starcoderplus-guanaco-gpt4, and after some tinkering I just wanted to share the importance of formatting your prompt correctly

I asked to provide a way to rate limit a function in python based on the input to the function so that it doesn't repeat identical output too often

I used the following prompt:

Create a python function that takes a string as input and prints that string. The function should be rate limited so that any specific string is not printed more than once every two minutes. This means it must keep track of the last time that it printed a specific string.

However, I used it in the chat-completion UI of text-generation-webui, and this was the useless reply I got:

chat-prompt-starcoder

Obviously completely useless to me

But then I realized that this model expects to follow instructions, not a chat, so I went over to the instruction template so now this was my "prompt":

### Instruction: Create a python function that takes a string as input and prints that string. The function should be rate limited so that any specific string is not printed more than once every two minutes. This means it must keep track of the last time that it printed a specific string.

### Response:

And lo and behold, a very competent useful reply!

As you can see, even if you follow the proper concept for instruct (providing it as instructions 'Create a python function that..' rather than 'I need a function that..'), you still need to be sure to follow the proper template structure.

And most interestingly of all, giving the same prompt to chatgpt gets what I consider to be a worse answer:

It's very similar but to my eye distinctly overengineered, I find the solution from starcoder much more closely answers my question with only a couple lines of code to change my existing function. YMMV, but the TLDR is that you should make sure to follow the proper prompt and template formats to get the best replies from your model

134
135
28
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama
 
 

Places above all 13Bs, as well as above llama1-65b on the HuggingFace leaderboard

136
 
 

Still pretty new to local LLMs, and there's been a lot of development since I dipped my toe in. Suffice to say I'm fairly swamped and looking for guidance to the right model for my use

I want to feed the model sourcebooks, so I can ask it game mechanic questions and it will respond with reasonable accuracy (including page references). I tried this with privateGPT a month or two back, and it kinda worked but it was slow and wonky. It seems like things are a bit cleaner now

137
 
 

Been a lot of good work done the past week by several pivotal members, and now the boss is back and focused on it, going to be a very breaking change but I'm really excited where this will lead us!

138
 
 

I'm really sorry that this is probably out of place, as it's not strictly LLAMA, but I couldn't think of anywhere else to post it where people may be able to help.

**Sadly, my grandma passed away yesterday. **It prompted me to retrieve some old photos that my parents stashed in the loft over a decade ago, and they are just incredible! I've found so many pictures of her, going back to when she was really young, to a point where I'll have to check if they are all definitely her! But there's amazing ones of my dad and uncles when they were little, even my nan with my dad and me when I'd literally just been born!

There's lots of really wonderful family moments and slice-of-life history captured there, and I don't think anyone knows they exist.

Mostly, I have enough funny photos to have birthday cards sorted for the several lifetimes!

I want to get these ALL scanned and digitised. But for now, I'm separating out ones with my nan in them to be top priority. Most of the services available have long turnarounds and, while the prices are far from extortionate, I'm looking for something which prices by the KG. Seriously, there must be over 20Kg of photo in here. They're mostly 6x4, so you'd want to have them done at a high DPI, the cost would be astronomical, especially as I already plan to spend a lot on printing.

So, I'd like to do this myself. I've dabbled with some LLM stuff, but I don't really know where to start with image manipulation, and I don't really have time to figure it out, so I'm asking for some guidance.

The rough idea is:

  1. Scan photos on high DPI flatbed scanner. Fill up the bed each time with multiple photos for highest speed

  2. 'Parse' the scan into multiple image files by identifying where the bounds are and cropping. This should be pretty simple. I think this may be possible in OpenCV? I've never tried before. Otherwise, it has to be one of the simpler jobs for a ML tool. I understand how object recognition works in principle, but not practice.

  3. Run select images through an upscaler. Some we may want to display or print larger, and if it's not unreasonable, if we're going to digitise them, may as well make them as high res as we can (I'll keep the originals). I know the usualy 'zoomify' caveats, but obviously when I'm starting with 6x4 images, I'll take all the help I can get.

  4. Ideally attempt to 'clean up' the images. I don't think I'd want to colourise anything, but it would be nice to remove obvious stains, creases and such from the print.

  5. The plan is to have some albums for people to go through at the wake, print off plenty of extra 6x4 copies so people can just grab them and take them away, but I'll also stick them all on a google drive or something and make the link available so people can download them in higher res, and more people will be able to see and preserve them

My main concern is 2 and 3. 4 would be a bonus. I suspect the shortlist will be 100-200 photos, but processing everything will be an ongoing process. I just want the shortlist done before the funeral.

I'd really rather run locally if viable. I have a two machines that may be able to contribute:

Workstation

  • RTX 3090 -32Gb DDR4 --> 64GB DDR5
  • 5600x. --> 7900x

I'm actually planning an upgrade very soon, which I can push through fast if it will help.

R720XD

  • 128GB DDR3 RAM
  • 2x E5-2630 v2.

Probably not super useful, but has lots of RAM, and can be used as a slow workhorse. My workstation is on CAD for 8 hours a day, and is used for gaming in the evening. I can obviously cut out the gaming, but not the CAD, so there could be some utility in handing off some things to the server, if the workload doesn't require GPU.

If you can point me in the right direction to get set up, that would be awesome. I'm new at this and learning fast, but I don't want to under-deliver. I'm very capable of learning the details, but I don't have enough experience to determine the 'best way' of doing something like this.

Any help would be incredible!

139
140
 
 

PR 3313 has been merged with this commit

This is pretty great for those looking for performant llama.cpp/falcon/mpt offloading, and just in general a good CPU inference tool, wanting to use text-generation-webui

For those unaware, this is the ctransformers repo

And for anyone looking for an updated docker image, I have provided an image here on dockerhub

and as always my git repo with instructions can be found here on github

Happy inferencing :)

141
14
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama
 
 

I am just learning in this space and I could be wrong about this one, but... The GGML and GPTQ models are nice for getting started with AI in Oobabooga. The range of models available is kinda odd to navigate and understand in context as far as how they compare and all the different quantization types, settings, and features. I still don't understand a lot of it. One of the main aspects I didn't (still don't fully) understand are how some models do not have a quantization stated like GGML/GPTQ, but still work using Transformers. I tried some of these by chance at first, and avoided them because they take longer to initially load.

Yesterday I created my first LoRAs and learned through trial and error, the only models I can use to train a LoRA on are the ones that use Transformers, and can be set to 8bit mode. Even using GGML/GPTQ models with 8 bit quantization, I could not use them to make a LoRA. It could be my software setup, but I think there is either a fundamental aspect of these models I haven't learned yet, or it is a limitation in Oobabooga's implementation. Either way, the key takeaway is to try making a LoRA with a Transformers based model loaded in Oobabooga, and be sure the "load in 8 bit" box is checked.

I didn't know what to expect with this, and haven't come across many examples, so I put off trying this until now. I have an 12th gen i7 with 20 logical cores and a 16GBV 3080Ti in a laptop. I can convert an entire novel into a text file and load this as raw text (tab) for training in Oobabooga using the default settings. If my machine has some assistance with cooling, I can create the LoRA in 40 minutes using the default settings and a 7B model. This has a mild effect. IIRC the default weight of the LoRA network is 32. If this is turned up to 96-128, it will have a more noticeable effect on personality. It still won't substantially improve the Q&A accuracy, but it may improve the quality to some extent.

I first tested with a relatively small Wikipedia article on Leto II (Dune character) formatted for this purpose manually. This didn't change anything substantially. Then I tried with the entire God Emperor of Dune e-book as raw text. This had garbage results, probably due to all the nonsense before the book even starts, and the terrible text formatting extracted from an eBook. The last dataset I tried was the book text only, with everything reflowed using a Linux bash script I wrote to alter newline characters, spacing, and remove page gaps. Then I manually edited with find and replace to remove special characters and any formatting oddballs I could find. This was the first LoRA I made where the 7B model's tendency to hallucinate seemed more evident than issues with my LoRA. For instance, picking a random name of an irrelevant character that occurs 3 times in 2 sentences of the LoRA text and prompting about it results in random unrelated output. The overall character identity is also weak despite a strong character profile and a 1.8MB text file for the LoRA.

This is just the perspective from a beginner's first attempt. Actually tuning this with a bit of experience will produce far better results. I'm just trying to say, if you're new to this and just poking around, try making a LoRA. It is quite easy to do.

142
 
 

From the blog:

In this blog, we provide a thorough analysis and a practical guide for fine-tuning. We examine the Llama-2 models under three real-world use cases, and show that fine-tuning yields significant accuracy improvements across the board (in some niche cases, better than GPT-4). Experiments were carried out with this script.

143
 
 

txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows

Seems like a great resource for all things embeddings related, give it a look!

144
18
submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama
 
 

These are the full weights, the quants are incoming from TheBloke already, will update this post when they're fully uploaded

From the author(s):

WizardLM-70B V1.0 achieves a substantial and comprehensive improvement on coding, mathematical reasoning and open-domain conversation capacities.

This model is license friendly, and follows the same license with Meta Llama-2.

Next version is in training and will be public together with our new paper soon.

For more details, please refer to:

Model weight: https://huggingface.co/WizardLM/WizardLM-70B-V1.0

Demo and Github: https://github.com/nlpxucan/WizardLM

Twitter: https://twitter.com/WizardLM_AI

GGML quant posted: https://huggingface.co/TheBloke/WizardLM-70B-V1.0-GGML

GPTQ quant repo posted, but still empty (GPTQ is a lot slower to make): https://huggingface.co/TheBloke/WizardLM-70B-V1.0-GPTQ

145
 
 

Refactored codebase - now a single unified turbopilot binary that provides support for codegen and starcoder style models.

Support for starcoder, wizardcoder and santacoder models

Support for CUDA 11 and 12

Seems interesting, looks like it supports wizardcoder with GPU offloading, if starcoder also has GPU offloading then that would be great but I would need to test. If it also works with the new stabilityAI coding models that would be very interesting

146
4
submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama
 
 

Text from them:

Calling all model makers, or would-be model creators! Chai asked me to tell you all about their open source LLM leaderboard:

Chai is running a totally open LLM competition. Anyone is free to submit a llama based LLM via our python-package 🐍 It gets deployed to users on our app. We collect the metrics and rank the models! If you place high enough on our leaderboard you'll win money 🥇

We've paid out over $10,000 in prizes so far. 💰

Come to our discord and check it out!

https://discord.gg/chai-llm

Link to latest board for the people who don't feel like joining a random discord just to see results:

https://cdn.discordapp.com/attachments/1134163974296961195/1138833170838589471/image1.png

147
 
 

Stability AI released three new 3b models for coding:

  • stablecode-instruct-alpha-3b (context length 4k)
  • stablecode-completion-alpha-3b-4k (context length 4k)
  • stablecode-completion-alpha-3b (context length 16k)

I didn't try any of them yet, since I'm waiting for the GGML files to be supported by llama.cpp, but I think especially the 16k model seems interesting. If anyone wants to share their experience with it, I'd be happy to hear it!

148
149
14
submitted 1 year ago* (last edited 1 year ago) by AsAnAILanguageModel to c/localllama
 
 

I think it's a good idea to share experiences about LLMs here, since benchmarks can only give a very rough overview on how well a model performs.

So please share how much you're using LLMs, what you use them for and how they well they perform at those tasks. For example, here are my answers to these questions:

Usage

I use LLMs daily for work and for random questions that I would previously use web search for.

I mainly use LLMs for reasoning heavy tasks, such as assisting with math or programming. Other frequent tasks include proofreading, helping with bureaucracy, or assisting with writing when it matters.

Models

The one I find most impressive at the moment is TheBloke/airoboros-l2-70B-gpt4-1.4.1-GGML/airoboros-l2-70b-gpt4-1.4.1.ggmlv3.q2_K.bin. It often manages to reason correctly on questions where most other models I tried fail, even though most humans wouldn't. I was surprised that something using only 2.5 bits per weight on average could produce anything but garbage. Downsides are that loading times are rather long, so I wouldn't ask it a question if I didn't want to wait. (Time to first token is almost 50s!). I'd love to hear how bigger quantizations or the unquantized versions perform.

Another one that made a good impression on me is Qwen-7B-Chat (demo). It manages to correctly answer some questions where even some llama2-70b finetunes fail, ~~but so far I'm getting memory leaks when running it on my M1 mac in fp16 mode, so I didn't use it a lot.~~ (this has been fixed it seems!)

All other models I briefly tried where not too useful. It's nice to be able to run them locally, but they were so much worse than chatGPT that it's often not even worth it to consider using them.

150
 
 

I want to train, or more likely fine-tune, a model on about 20 years worth of email and text data that I've collected.

The goal would be to train it how to respond like me in simple cases.

It's there a particular base model I should start with?

I'm also interested in anyone's experience in doing this kind of thing themselves.

view more: ‹ prev next ›