Uhh... Tool calling is built into thier tokenizers, but Ollama/Langchain just ignore them because they're spagetti abstractions. To be blunt, langchain and ollama are overhyped, buggy junk trying to reinvent wheels.
For any kind of STEM work, I'd run Llama Nemotron 49B exl3 via TabbyAPI, which exposes a generic openai endpoint anything can use:
https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1
https://huggingface.co/turboderp/Llama-3.3-Nemotron-Super-49B-v1-exl3/tree/3.0bpw
Nemotron models freaking rock at anything STEM-adjacent, and I can get squeeze in 48K+ context on 24GB VRAM (depending on your cache quantization settings).
Otherwise GLM-4 is very good at tool calling, as is Qwen3, and you can more confortable run them as GGUFs if you don't want to leave the llama.cpp ecosystem, or exl2s if you have specific trouble with exl3 in TabbyAPI.