Chat Templates¶
Every instruction-tuned language model expects prompts in a specific format. The model
was trained on conversations structured with special marker tokens, and it will only
follow instructions reliably if the prompt matches that format. A ChatTemplate
tells inference4j how to wrap a user message in the correct markers before passing
it to the tokenizer.
Why models need different templates¶
During training, each model family learns to associate specific token sequences with roles in a conversation. These markers vary across model families:
| Model | User marker | End marker | Assistant marker |
|---|---|---|---|
| Phi-3 | <\|user\|> |
<\|end\|> |
<\|assistant\|> |
| DeepSeek-R1 (Qwen) | <\|User\|> |
— | <\|Assistant\|> |
| Llama / ChatML | <\|im_start\|>user |
<\|im_end\|> |
<\|im_start\|>assistant |
If you send a Phi-3 prompt to a DeepSeek model (or vice versa), the model won't recognize the role markers. It may ignore the instruction entirely, echo the markers back, or produce incoherent output.
How it works¶
ChatTemplate is a functional interface with a single method:
It takes the raw user message and returns the fully formatted prompt string. For Phi-3, the implementation looks like:
For a prompt like "What is Java?", this produces:
The tokenizer then encodes this formatted string into token IDs that the model recognizes.
Preconfigured templates¶
onnxruntime-genai models¶
When you use ModelSources factory methods, the chat template is already configured:
// Chat template is bundled — nothing extra to configure
TextGenerator.builder()
.model(ModelSources.phi3Mini())
.build();
ModelSources.phi3Mini() returns a GenerativeModel that pairs the model source
with the correct Phi-3 chat template. ModelSources.deepSeekR1_1_5B() does the same
for DeepSeek's format.
Native generation models¶
OnnxTextGenerator presets for instruct models (SmolLM2, Qwen2.5) come with a
ChatML template preconfigured. GPT-2 is a base model (not instruction-tuned) so
it has no default template, but you can provide one:
Custom templates¶
To use a model that isn't preconfigured in ModelSources, provide both a ModelSource
and a ChatTemplate:
TextGenerator.builder()
.modelSource(myModelSource)
.chatTemplate(msg -> "<|im_start|>user\n" + msg + "<|im_end|>\n<|im_start|>assistant\n")
.build();
The template format for a given model is defined in its tokenizer_config.json file
on HuggingFace, under the chat_template field. It's typically a Jinja2 template —
you only need to translate the user-message portion into a Java lambda.
Adding support for a new model¶
When a new model family uses a chat template that isn't covered by ModelSources,
you need two things:
- The model files — a
ModelSourcethat resolves the ONNX weights, tokenizer, and config files - The chat template — a
ChatTemplatelambda that formats prompts with the correct markers
Look at the model's tokenizer_config.json for the chat_template field to find the
expected format. For a simple user message (no system prompt, no tools), the relevant
portion is usually a few lines of Jinja that map to a straightforward string
concatenation in Java.