Skip to content

Supported Models

All supported models are hosted under the inference4j HuggingFace organization and are automatically downloaded and cached on first use.

NLP

Capability Wrapper Default Model ID Size API
Text Classification DistilBertTextClassifier inference4j/distilbert-base-uncased-finetuned-sst-2-english ~260 MB TextClassifier
Text Embeddings SentenceTransformerEmbedder inference4j/all-MiniLM-L6-v2 ~90 MB TextEmbedder
Search Reranking MiniLMSearchReranker inference4j/ms-marco-MiniLM-L-6-v2 ~90 MB SearchReranker
Text Generation OnnxTextGenerator.gpt2() inference4j/gpt2 ~500 MB TextGenerator
Text Generation OnnxTextGenerator.smolLM2() inference4j/smollm2-360m-instruct ~700 MB TextGenerator
Text Generation OnnxTextGenerator.smolLM2_1_7B() inference4j/smollm2-1.7b-instruct ~3.4 GB TextGenerator
Text Generation OnnxTextGenerator.tinyLlama() inference4j/tinyllama-1.1b-chat ~2.2 GB TextGenerator
Text Generation OnnxTextGenerator.qwen2() inference4j/qwen2.5-1.5b-instruct ~3 GB TextGenerator
Text Generation OnnxTextGenerator.gemma2() Gated — requires manual download ~5 GB TextGenerator
Summarization / Translation / Grammar FlanT5TextGenerator.flanT5Small() inference4j/flan-t5-small ~300 MB TextGenerator, Summarizer, Translator, GrammarCorrector
Summarization / Translation / Grammar FlanT5TextGenerator.flanT5Base() inference4j/flan-t5-base ~900 MB TextGenerator, Summarizer, Translator, GrammarCorrector
Summarization / Translation / Grammar FlanT5TextGenerator.flanT5Large() inference4j/flan-t5-large ~3 GB TextGenerator, Summarizer, Translator, GrammarCorrector
Text-to-SQL T5SqlGenerator.t5SmallAwesome() inference4j/t5-small-awesome-text-to-sql ~240 MB TextGenerator, SqlGenerator
Text-to-SQL T5SqlGenerator.t5LargeSpider() inference4j/T5-LM-Large-text2sql-spider ~4.6 GB TextGenerator, SqlGenerator
Summarization BartSummarizer.distilBartCnn() inference4j/distilbart-cnn-12-6 ~1.2 GB TextGenerator, Summarizer
Summarization BartSummarizer.bartLargeCnn() inference4j/bart-large-cnn ~1.6 GB TextGenerator, Summarizer
Translation MarianTranslator.builder() User-specified (inference4j/opus-mt-*) varies TextGenerator, Translator
Grammar Correction CoeditGrammarCorrector.coeditBase() inference4j/coedit-base ~900 MB TextGenerator, GrammarCorrector
Grammar Correction CoeditGrammarCorrector.coeditLarge() inference4j/coedit-large ~3 GB TextGenerator, GrammarCorrector

Vision

Capability Wrapper Default Model ID Size API
Image Classification ResNetClassifier inference4j/resnet50-v1-7 ~100 MB ImageClassifier
Image Classification EfficientNetClassifier inference4j/efficientnet-lite4 ~50 MB ImageClassifier
Object Detection YoloV8Detector inference4j/yolov8n ~25 MB ObjectDetector
Object Detection Yolo26Detector inference4j/yolo26n ~25 MB ObjectDetector
Text Detection CraftTextDetector inference4j/craft-mlt-25k ~80 MB TextDetector

Multimodal

Capability Wrapper Default Model ID Size API
Zero-Shot Classification ClipClassifier inference4j/clip-vit-base-patch32 ~595 MB ZeroShotClassifier
Image Embeddings ClipImageEncoder inference4j/clip-vit-base-patch32 ~340 MB ImageEmbedder
Text Embeddings (CLIP) ClipTextEncoder inference4j/clip-vit-base-patch32 ~255 MB TextEmbedder

Audio

Capability Wrapper Default Model ID Size API
Speech-to-Text Wav2Vec2Recognizer inference4j/wav2vec2-base-960h ~370 MB SpeechRecognizer
Voice Activity Detection SileroVadDetector inference4j/silero-vad ~2 MB VoiceActivityDetector

Generative AI

Generative models use a separate module (inference4j-genai) and a different builder pattern. See Generative AI for details.

Capability Wrapper Model ID Size License
Text Generation TextGenerator inference4j/phi-3-mini-4k-instruct ~2.7 GB MIT
Text Generation TextGenerator inference4j/deepseek-r1-distill-qwen-1.5b ~1 GB MIT
Speech-to-Text / Translation (WIP) WhisperSpeechModel inference4j/whisper-small-genai ~500 MB MIT
Vision-Language VisionLanguageModel inference4j/phi-3.5-vision-instruct ~3.3 GB MIT

Model reference

A comprehensive view of all supported models, organized by architecture:

Encoder-only (single-pass)

Model Tokenizer Wrapper Use Cases
DistilBERT SST-2 WordPiece DistilBertTextClassifier Sentiment analysis, text classification
all-MiniLM-L6-v2 WordPiece SentenceTransformerEmbedder Semantic search, embeddings
MiniLM-L-6 MS MARCO WordPiece MiniLMSearchReranker Search reranking

Decoder-only (autoregressive)

Model Tokenizer Wrapper Use Cases
GPT-2 BPE OnnxTextGenerator Text completion
SmolLM2-360M BPE OnnxTextGenerator Chat, instruction following
TinyLlama-1.1B SentencePiece BPE OnnxTextGenerator Chat, instruction following
Qwen2.5-1.5B BPE OnnxTextGenerator Chat, instruction following
Gemma 2-2B SentencePiece BPE OnnxTextGenerator Chat, instruction following

Encoder-decoder (autoregressive)

Model Tokenizer Wrapper Use Cases
Flan-T5 (Small / Base / Large) SentencePiece Unigram FlanT5TextGenerator Summarization, translation, grammar
T5-small-awesome-text-to-sql SentencePiece Unigram T5SqlGenerator Text-to-SQL (lightweight)
T5-LM-Large-text2sql-spider SentencePiece Unigram T5SqlGenerator Text-to-SQL (high accuracy)
DistilBART CNN 12-6 BPE BartSummarizer Summarization
BART Large CNN BPE BartSummarizer Summarization
MarianMT (opus-mt-*) SentencePiece BPE MarianTranslator Translation (fixed language pair)
CoEdIT (Base / Large) SentencePiece Unigram CoeditGrammarCorrector Grammar correction

Vision

Model Tokenizer Wrapper Use Cases
ResNet-50 N/A ResNetClassifier Image classification
EfficientNet-Lite4 N/A EfficientNetClassifier Image classification
YOLOv8n N/A YoloV8Detector Object detection
YOLO26n N/A Yolo26Detector Object detection
CRAFT N/A CraftTextDetector Text detection in images

Multimodal

Model Tokenizer Wrapper Use Cases
CLIP ViT-B/32 BPE ClipClassifier Zero-shot image classification

Audio

Model Tokenizer Wrapper Use Cases
Wav2Vec2 CTC Wav2Vec2Recognizer Speech-to-text
Silero VAD N/A SileroVadDetector Voice activity detection

Model compatibility

YOLOv8 / YOLO11

YoloV8Detector is compatible with both YOLOv8 and YOLO11 models — they share the same output layout ([1, 4+C, N]). It is not compatible with YOLOv5 (different layout with objectness column) or YOLO26 (NMS-free architecture).

EfficientNet variants

EfficientNetClassifier is tested against EfficientNet-Lite4 (TensorFlow origin, softmax built-in). For PyTorch-exported EfficientNet models that output raw logits, override with .outputOperator(OutputOperator.softmax()).

Custom models

Any ONNX-exported model works with the appropriate wrapper, as long as it follows the expected input/output layout. See the Custom Models guide for details.

Cache

Models are cached in ~/.cache/inference4j/ by default. Customize with:

  • System property: -Dinference4j.cache.dir=/path/to/cache
  • Environment variable: INFERENCE4J_CACHE_DIR=/path/to/cache

See Configuration for all options.

Planned models

Domain Model Status
Text TrOCR (text recognition) Planned — enabled by encoder-decoder infrastructure

See the Roadmap for details.