Start free. Scale when you need to.
LayerScale is free to use with full inference capabilities. Upgrade to Pro when you need unlimited sessions and production monitoring.
Free
LayerScale
$0
No license key required. Use forever.
- ✓ OpenAI and Anthropic API compatibility
- ✓ Tool calling (both formats)
- ✓ SSE streaming responses
- ✓ 1 persistent session
- ✓ Streaming data ingestion
- ✓ Flash Queries
- ✓ WebSocket support
- ✓ Full GPU acceleration
- ✓ Context length up to 32K tokens
- ✓ KV cache quantization (FP16, Q8_0, Q4_0)
- ✓ 2 context pool slots
License key required
LayerScale Pro
Contact us
30-day free trial available.
- ✓ Everything in LayerScale
- ✓ Unlimited concurrent sessions
- ✓ Unlimited context length (up to 1M tokens)
- ✓ Unlimited context pool
- ✓ Prometheus metrics endpoint
- ✓ Priority decode scheduler
- ✓ Production support
Feature comparison
| Feature | LayerScale | Pro |
|---|---|---|
| Inference | ||
| OpenAI API (/v1/chat/completions) | ✓ | ✓ |
| Anthropic API (/v1/messages) | ✓ | ✓ |
| Tool calling | ✓ | ✓ |
| SSE streaming | ✓ | ✓ |
| GPU acceleration (CUDA, ROCm) | ✓ | ✓ |
| Stateful Inference | ||
| Persistent sessions | 1 | Unlimited |
| Streaming data ingestion | ✓ | ✓ |
| Flash Queries | ✓ | ✓ |
| WebSocket connections | ✓ | ✓ |
| Infrastructure | ||
| Context length | 32,768 | Unlimited |
| Context pool size | 2 | Unlimited |
| KV cache quantization | ✓ | ✓ |
| Prometheus metrics | — | ✓ |
| Priority decode scheduler | — | ✓ |
| Support | ||
| Documentation | ✓ | ✓ |
| Priority support | — | ✓ |
| License key required | No | Yes |