this post was submitted on 12 Aug 2023
14 points (100.0% liked)

LocalLLaMA

2274 readers
4 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago
MODERATORS
14
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama
 

I am just learning in this space and I could be wrong about this one, but... The GGML and GPTQ models are nice for getting started with AI in Oobabooga. The range of models available is kinda odd to navigate and understand in context as far as how they compare and all the different quantization types, settings, and features. I still don't understand a lot of it. One of the main aspects I didn't (still don't fully) understand are how some models do not have a quantization stated like GGML/GPTQ, but still work using Transformers. I tried some of these by chance at first, and avoided them because they take longer to initially load.

Yesterday I created my first LoRAs and learned through trial and error, the only models I can use to train a LoRA on are the ones that use Transformers, and can be set to 8bit mode. Even using GGML/GPTQ models with 8 bit quantization, I could not use them to make a LoRA. It could be my software setup, but I think there is either a fundamental aspect of these models I haven't learned yet, or it is a limitation in Oobabooga's implementation. Either way, the key takeaway is to try making a LoRA with a Transformers based model loaded in Oobabooga, and be sure the "load in 8 bit" box is checked.

I didn't know what to expect with this, and haven't come across many examples, so I put off trying this until now. I have an 12th gen i7 with 20 logical cores and a 16GBV 3080Ti in a laptop. I can convert an entire novel into a text file and load this as raw text (tab) for training in Oobabooga using the default settings. If my machine has some assistance with cooling, I can create the LoRA in 40 minutes using the default settings and a 7B model. This has a mild effect. IIRC the default weight of the LoRA network is 32. If this is turned up to 96-128, it will have a more noticeable effect on personality. It still won't substantially improve the Q&A accuracy, but it may improve the quality to some extent.

I first tested with a relatively small Wikipedia article on Leto II (Dune character) formatted for this purpose manually. This didn't change anything substantially. Then I tried with the entire God Emperor of Dune e-book as raw text. This had garbage results, probably due to all the nonsense before the book even starts, and the terrible text formatting extracted from an eBook. The last dataset I tried was the book text only, with everything reflowed using a Linux bash script I wrote to alter newline characters, spacing, and remove page gaps. Then I manually edited with find and replace to remove special characters and any formatting oddballs I could find. This was the first LoRA I made where the 7B model's tendency to hallucinate seemed more evident than issues with my LoRA. For instance, picking a random name of an irrelevant character that occurs 3 times in 2 sentences of the LoRA text and prompting about it results in random unrelated output. The overall character identity is also weak despite a strong character profile and a 1.8MB text file for the LoRA.

This is just the perspective from a beginner's first attempt. Actually tuning this with a bit of experience will produce far better results. I'm just trying to say, if you're new to this and just poking around, try making a LoRA. It is quite easy to do.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 1 points 1 year ago (1 children)

I am using the mental model idea of what stable diffusion is capable of generating with a LoRA. It all comes down to the frequency of data in the training data. You can't ask extremely specific reference details but if a character says or does the same thing a dozen times, it is likely the thing will emerge when prompted.

I'm curious how things change by layering LoRAs. I also may try splitting up the text in a linear order without all the separate threads interwoven. It may be possible to pull out all of the conversational aspects of the book too. I just don't know how to add these in an bedded context where the conversation is between the main character the AI will assume and several other characters unrelated to the user. Like I don't know how that can be processed.

[โ€“] [email protected] 2 points 1 year ago* (last edited 1 year ago)

So if I understand, you want to train it to behave and speak like a specific character. I would format it so the instruction is a simple "Respond the conversation as (character)", most of the convo as the input and the last line the character says as the output. That's for the convos.

For the rest, I would first make a neat little package with some of characters quotes and speech mannerism, brief description of him etc. I would give the dune info and the character package to ChatGtp and have it give me question and answers where the answers are given in the characters voice. I would use the same technique to rewrite answers from other datasets to add diversity.

That should give you a dataset that refines the llms dune knowledge while not making it too specific, while also making sure it always keeps the characters mannerism. It also avoids the bogus Chatgtp answers since you are feeding it the info but the API still does most of the work. I would use datasets with quality answers like alpaca-gpt4 or Lima. I would make sure the bigger part of the data is the dune data tho, rewriting all of alpaca is like 150$ anyways, not really worth it.