this post was submitted on 02 Feb 2025

35 points (94.9% liked)

LocalLLaMA

3226 readers

2 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago

MODERATORS

How to run LLaMA (and other LLMs) on Android. (lemmy.dbzer0.com)

submitted 4 months ago* (last edited 4 months ago) by [email protected] to c/localllama

16 comments fedilink hide all child comments

Hello, everyone! I wanted to share my experience of successfully running LLaMA on an Android device. The model that performed the best for me was llama3.2:1b on a mid-range phone with around 8 GB of RAM. I was also able to get it up and running on a lower-end phone with 4 GB RAM. However, I also tested several other models that worked quite well, including qwen2.5:0.5b , qwen2.5:1.5b , qwen2.5:3b , smallthinker , tinyllama , deepseek-r1:1.5b , and gemma2:2b. I hope this helps anyone looking to experiment with these models on mobile devices!

Step 1: Install Termux

Download and install Termux from the Google Play Store or F-Droid

Step 2: Set Up proot-distro and Install Debian

Open Termux and update the package list:
```
pkg update && pkg upgrade
```
Install proot-distro
```
pkg install proot-distro
```
Install Debian using proot-distro:
```
proot-distro install debian
```
Log in to the Debian environment:
```
proot-distro login debian
```
You will need to log-in every time you want to run Ollama. You will need to repeat this step and all the steps below every time you want to run a model (excluding step 3 and the first half of step 4).

Step 3: Install Dependencies

Update the package list in Debian:
```
apt update && apt upgrade
```
Install curl:
```
apt install curl
```

Step 4: Install Ollama

Run the following command to download and install Ollama:
```
curl -fsSL https://ollama.com/install.sh | sh
```
Start the Ollama server:
```
ollama serve &
```
After you run this command, do ctrl + c and the server will continue to run in the background.

Step 5: Download and run the Llama3.2:1B Model

Use the following command to download the Llama3.2:1B model:
```
ollama run llama3.2:1b
```
This step fetches and runs the lightweight 1-billion-parameter version of the Llama 3.2 model .

Running LLaMA and other similar models on Android devices is definitely achievable, even with mid-range hardware. The performance varies depending on the model size and your device's specifications, but with some experimentation, you can find a setup that works well for your needs. I’ll make sure to keep this post updated if there are any new developments or additional tips that could help improve the experience. If you have any questions or suggestions, feel free to share them below!

– llama

all 19 comments

sorted by: hot top controversial new old

[–] [email protected] 4 points 4 months ago (1 children)

I've run a local LLM on my PC for a while, so I'm familiar enough with Ollama to understand what's going on. I've tried this with my Samsung Tracfone, not really expecting a lot. Surprisingly I've gotten all the way to getting a prompt, but then things crash and I'm kicked back to the starting terminal. Pretty sure it's memory, so I'm now trying to use virtual memory to bump it up to the 4GB you've had success with (the phone looks to have 3GB actual memory, plenty of storage though).

If it doesn't work, I'll try some of the others, perhaps they're a bit smaller.

I did get the 0.5 Qwen to run well. I'm surprised how fast it is even using CPU mode, but maybe being smaller also helps with the processing.

Just a tip (maybe obvious to experienced users): while you do have to run the terminal, login to debian, start the server and then run the model, remember that you can use the arrow keys in the terminal to repeat past commands, so it's pretty quick to do. I actually missed the arrow keys the first time around because they aren't very distinct or highlighted, but then when I had to look for how to do CTRL, I realized they were right in front of me.

[–] [email protected] 2 points 4 months ago

I have tried on more or less 5 spare phones. None of them have less than 4 GB of RAM, however.

[–] beastlykings 4 points 4 months ago (1 children)

Very cool! I got it running. Though apparently I didn't need step 6 as it started running after I downloaded it. I was a bit confused, and do was the LLM as it started telling me how the run command works 🤦‍♂️

Good fun. Got me interested in running local LLM for the first time. What type of performance increase should I expect when I spin this up on my 3070 ti?

[–] [email protected] 2 points 4 months ago

Though apparently I didn't need step 6 as it started running after I downloaded it

Hahahha. It really is a little redundant, now that you mention it. I'll remove it from the post. Thank you!

Good fun. Got me interested in running local LLM for the first time.

I'm very happy to hear my post motivated you to run an LLM locally for the first time! Did you manage to run any other models? How was your experience? Let us know!

What type of performance increase should I expect when I spin this up on my 3070 ti?

That really depends on the model, to be completely honest. Make sure to check the model requirements. For llama3.2:2b you can expect a significant performance increase, at least.

[–] [email protected] 1 points 4 months ago (2 children)

Does termux have a significant performance penalty?

[–] [email protected] 10 points 4 months ago (1 children)

None, it's just a terminal emulator like xterm or Konsole or alacritty or whatever else. It doesn't actually emulate anything.

It's all native ARM binaries shoved into a container pretty much the same as Docker. The performance hit is basically zero. Android runs the Linux kernel, it's just a fancy chroot to make it look like regular Linux.

[–] [email protected] 2 points 4 months ago

Great explanation, Max!

[–] [email protected] 1 points 4 months ago* (last edited 4 months ago) (1 children)

The performance may feel somewhat limited, but this is due to Android devices usually having less processing power compared to computers. However, for smaller models like the ones I mentioned, you likely won't notice much of a difference when running them on a computer.

[–] [email protected] 0 points 4 months ago (2 children)

What has me curious is I'm wondering how much of an improvement would occur if the software was running natively. (Kinda like wine vs VMware)

[–] [email protected] 2 points 4 months ago (1 children)

I would argue there would not be any noticeable differences.

[–] [email protected] 0 points 4 months ago (1 children)

Yeah that could be the case. I'm not familiar with the termux project to understand how efficient of an emulator it is.

[–] [email protected] 5 points 4 months ago (1 children)

It is running natively. In comparison to, for example, emulators that emulate old gaming consoles completely, Termux emulates only an input device, so basically a keyboard and a display, but the actual applications are Linux and run on Androids Linux Kernel natively. There is no emulation or virtualization happening.

[–] [email protected] 3 points 4 months ago

Great thanks for the clarification.