Supported Models
All supported models are hosted under the inference4j HuggingFace organization and are automatically downloaded and cached on first use.
NLP
| Capability |
Wrapper |
Default Model ID |
Size |
API |
| Text Classification |
DistilBertTextClassifier |
inference4j/distilbert-base-uncased-finetuned-sst-2-english |
~260 MB |
TextClassifier |
| Text Embeddings |
SentenceTransformerEmbedder |
inference4j/all-MiniLM-L6-v2 |
~90 MB |
TextEmbedder |
| Search Reranking |
MiniLMSearchReranker |
inference4j/ms-marco-MiniLM-L-6-v2 |
~90 MB |
SearchReranker |
| Text Generation |
OnnxTextGenerator.gpt2() |
inference4j/gpt2 |
~500 MB |
TextGenerator |
| Text Generation |
OnnxTextGenerator.smolLM2() |
inference4j/smollm2-360m-instruct |
~700 MB |
TextGenerator |
| Text Generation |
OnnxTextGenerator.smolLM2_1_7B() |
inference4j/smollm2-1.7b-instruct |
~3.4 GB |
TextGenerator |
| Text Generation |
OnnxTextGenerator.tinyLlama() |
inference4j/tinyllama-1.1b-chat |
~2.2 GB |
TextGenerator |
| Text Generation |
OnnxTextGenerator.qwen2() |
inference4j/qwen2.5-1.5b-instruct |
~3 GB |
TextGenerator |
| Text Generation |
OnnxTextGenerator.gemma2() |
Gated — requires manual download |
~5 GB |
TextGenerator |
| Summarization / Translation / Grammar |
FlanT5TextGenerator.flanT5Small() |
inference4j/flan-t5-small |
~300 MB |
TextGenerator, Summarizer, Translator, GrammarCorrector |
| Summarization / Translation / Grammar |
FlanT5TextGenerator.flanT5Base() |
inference4j/flan-t5-base |
~900 MB |
TextGenerator, Summarizer, Translator, GrammarCorrector |
| Summarization / Translation / Grammar |
FlanT5TextGenerator.flanT5Large() |
inference4j/flan-t5-large |
~3 GB |
TextGenerator, Summarizer, Translator, GrammarCorrector |
| Text-to-SQL |
T5SqlGenerator.t5SmallAwesome() |
inference4j/t5-small-awesome-text-to-sql |
~240 MB |
TextGenerator, SqlGenerator |
| Text-to-SQL |
T5SqlGenerator.t5LargeSpider() |
inference4j/T5-LM-Large-text2sql-spider |
~4.6 GB |
TextGenerator, SqlGenerator |
| Summarization |
BartSummarizer.distilBartCnn() |
inference4j/distilbart-cnn-12-6 |
~1.2 GB |
TextGenerator, Summarizer |
| Summarization |
BartSummarizer.bartLargeCnn() |
inference4j/bart-large-cnn |
~1.6 GB |
TextGenerator, Summarizer |
| Translation |
MarianTranslator.builder() |
User-specified (inference4j/opus-mt-*) |
varies |
TextGenerator, Translator |
| Grammar Correction |
CoeditGrammarCorrector.coeditBase() |
inference4j/coedit-base |
~900 MB |
TextGenerator, GrammarCorrector |
| Grammar Correction |
CoeditGrammarCorrector.coeditLarge() |
inference4j/coedit-large |
~3 GB |
TextGenerator, GrammarCorrector |
Vision
| Capability |
Wrapper |
Default Model ID |
Size |
API |
| Image Classification |
ResNetClassifier |
inference4j/resnet50-v1-7 |
~100 MB |
ImageClassifier |
| Image Classification |
EfficientNetClassifier |
inference4j/efficientnet-lite4 |
~50 MB |
ImageClassifier |
| Object Detection |
YoloV8Detector |
inference4j/yolov8n |
~25 MB |
ObjectDetector |
| Object Detection |
Yolo26Detector |
inference4j/yolo26n |
~25 MB |
ObjectDetector |
| Text Detection |
CraftTextDetector |
inference4j/craft-mlt-25k |
~80 MB |
TextDetector |
Multimodal
| Capability |
Wrapper |
Default Model ID |
Size |
API |
| Zero-Shot Classification |
ClipClassifier |
inference4j/clip-vit-base-patch32 |
~595 MB |
ZeroShotClassifier |
| Image Embeddings |
ClipImageEncoder |
inference4j/clip-vit-base-patch32 |
~340 MB |
ImageEmbedder |
| Text Embeddings (CLIP) |
ClipTextEncoder |
inference4j/clip-vit-base-patch32 |
~255 MB |
TextEmbedder |
Audio
| Capability |
Wrapper |
Default Model ID |
Size |
API |
| Speech-to-Text |
Wav2Vec2Recognizer |
inference4j/wav2vec2-base-960h |
~370 MB |
SpeechRecognizer |
| Voice Activity Detection |
SileroVadDetector |
inference4j/silero-vad |
~2 MB |
VoiceActivityDetector |
Generative AI
Generative models use a separate module (inference4j-genai) and a different builder pattern. See Generative AI for details.
| Capability |
Wrapper |
Model ID |
Size |
License |
| Text Generation |
TextGenerator |
inference4j/phi-3-mini-4k-instruct |
~2.7 GB |
MIT |
| Text Generation |
TextGenerator |
inference4j/deepseek-r1-distill-qwen-1.5b |
~1 GB |
MIT |
| Speech-to-Text / Translation (WIP) |
WhisperSpeechModel |
inference4j/whisper-small-genai |
~500 MB |
MIT |
| Vision-Language |
VisionLanguageModel |
inference4j/phi-3.5-vision-instruct |
~3.3 GB |
MIT |
Model reference
A comprehensive view of all supported models, organized by architecture:
Encoder-only (single-pass)
| Model |
Tokenizer |
Wrapper |
Use Cases |
| DistilBERT SST-2 |
WordPiece |
DistilBertTextClassifier |
Sentiment analysis, text classification |
| all-MiniLM-L6-v2 |
WordPiece |
SentenceTransformerEmbedder |
Semantic search, embeddings |
| MiniLM-L-6 MS MARCO |
WordPiece |
MiniLMSearchReranker |
Search reranking |
Decoder-only (autoregressive)
| Model |
Tokenizer |
Wrapper |
Use Cases |
| GPT-2 |
BPE |
OnnxTextGenerator |
Text completion |
| SmolLM2-360M |
BPE |
OnnxTextGenerator |
Chat, instruction following |
| TinyLlama-1.1B |
SentencePiece BPE |
OnnxTextGenerator |
Chat, instruction following |
| Qwen2.5-1.5B |
BPE |
OnnxTextGenerator |
Chat, instruction following |
| Gemma 2-2B |
SentencePiece BPE |
OnnxTextGenerator |
Chat, instruction following |
Encoder-decoder (autoregressive)
| Model |
Tokenizer |
Wrapper |
Use Cases |
| Flan-T5 (Small / Base / Large) |
SentencePiece Unigram |
FlanT5TextGenerator |
Summarization, translation, grammar |
| T5-small-awesome-text-to-sql |
SentencePiece Unigram |
T5SqlGenerator |
Text-to-SQL (lightweight) |
| T5-LM-Large-text2sql-spider |
SentencePiece Unigram |
T5SqlGenerator |
Text-to-SQL (high accuracy) |
| DistilBART CNN 12-6 |
BPE |
BartSummarizer |
Summarization |
| BART Large CNN |
BPE |
BartSummarizer |
Summarization |
| MarianMT (opus-mt-*) |
SentencePiece BPE |
MarianTranslator |
Translation (fixed language pair) |
| CoEdIT (Base / Large) |
SentencePiece Unigram |
CoeditGrammarCorrector |
Grammar correction |
Vision
| Model |
Tokenizer |
Wrapper |
Use Cases |
| ResNet-50 |
N/A |
ResNetClassifier |
Image classification |
| EfficientNet-Lite4 |
N/A |
EfficientNetClassifier |
Image classification |
| YOLOv8n |
N/A |
YoloV8Detector |
Object detection |
| YOLO26n |
N/A |
Yolo26Detector |
Object detection |
| CRAFT |
N/A |
CraftTextDetector |
Text detection in images |
Multimodal
| Model |
Tokenizer |
Wrapper |
Use Cases |
| CLIP ViT-B/32 |
BPE |
ClipClassifier |
Zero-shot image classification |
Audio
| Model |
Tokenizer |
Wrapper |
Use Cases |
| Wav2Vec2 |
CTC |
Wav2Vec2Recognizer |
Speech-to-text |
| Silero VAD |
N/A |
SileroVadDetector |
Voice activity detection |
Model compatibility
YOLOv8 / YOLO11
YoloV8Detector is compatible with both YOLOv8 and YOLO11 models — they share the same output layout ([1, 4+C, N]). It is not compatible with YOLOv5 (different layout with objectness column) or YOLO26 (NMS-free architecture).
EfficientNet variants
EfficientNetClassifier is tested against EfficientNet-Lite4 (TensorFlow origin, softmax built-in). For PyTorch-exported EfficientNet models that output raw logits, override with .outputOperator(OutputOperator.softmax()).
Custom models
Any ONNX-exported model works with the appropriate wrapper, as long as it follows the expected input/output layout. See the Custom Models guide for details.
Cache
Models are cached in ~/.cache/inference4j/ by default. Customize with:
- System property:
-Dinference4j.cache.dir=/path/to/cache
- Environment variable:
INFERENCE4J_CACHE_DIR=/path/to/cache
See Configuration for all options.
Planned models
| Domain |
Model |
Status |
| Text |
TrOCR (text recognition) |
Planned — enabled by encoder-decoder infrastructure |
See the Roadmap for details.