Tether is seeking a AI Inference Engineer to build and optimize the C++ inference layer powering local and edge AI. You’ll enhance engines like llama.cpp, ggml, and ONNX to deliver fast, efficient model performance across diverse hardware.
What you’ll do
- Optimize and deploy LLM inference on edge devices
- Improve model load times, performance, and stability
- Collaborate with researchers to move models from research to production
-
Integrate cutting-edge AI features into Tether products
What we’re looking for
- Strong C++ expertise with hands-on llama.cpp/ggml experience
- Solid understanding of LLMs, transformers, and deep learning
-
Experience with ONNX; JavaScript is a plus
Join Tether and help push the boundaries of efficient, production-ready AI in global digital finance.