Supported Models¶
All supported models are hosted under the inference4j HuggingFace organization and are automatically downloaded and cached on first use.
NLP¶
| Capability | Wrapper | Default Model ID | Size | API |
|---|---|---|---|---|
| Text Classification | DistilBertTextClassifier |
inference4j/distilbert-base-uncased-finetuned-sst-2-english |
~260 MB | TextClassifier |
| Named Entity Recognition | BertNerRecognizer |
inference4j/distilbert-NER |
~260 MB | NamedEntityRecognizer |
| Named Entity Recognition | BertNerRecognizer |
inference4j/bert-base-NER |
~431 MB | NamedEntityRecognizer |
| Text Embeddings | SentenceTransformerEmbedder |
inference4j/all-MiniLM-L6-v2 |
~90 MB | TextEmbedder |
| Text Embeddings | SentenceTransformerEmbedder |
inference4j/all-mpnet-base-v2 |
~430 MB | TextEmbedder |
| Text Embeddings | SentenceTransformerEmbedder |
inference4j/bge-base-en-v1.5 |
~430 MB | TextEmbedder |
| Text Embeddings | SentenceTransformerEmbedder |
inference4j/gte-base |
~430 MB | TextEmbedder |
| Search Reranking | MiniLMSearchReranker |
inference4j/ms-marco-MiniLM-L-6-v2 |
~90 MB | SearchReranker |
| Text Generation | OnnxTextGenerator.gpt2() |
inference4j/gpt2 |
~500 MB | TextGenerator |
| Text Generation | OnnxTextGenerator.smolLM2() |
inference4j/smollm2-360m-instruct |
~700 MB | TextGenerator |
| Text Generation | OnnxTextGenerator.smolLM2_1_7B() |
inference4j/smollm2-1.7b-instruct |
~3.4 GB | TextGenerator |
| Text Generation | OnnxTextGenerator.tinyLlama() |
inference4j/tinyllama-1.1b-chat |
~2.2 GB | TextGenerator |
| Text Generation | OnnxTextGenerator.qwen2() |
inference4j/qwen2.5-1.5b-instruct |
~3 GB | TextGenerator |
| Text Generation | OnnxTextGenerator.gemma2() |
Gated — requires manual download | ~5 GB | TextGenerator |
| Summarization / Translation / Grammar | FlanT5TextGenerator.flanT5Small() |
inference4j/flan-t5-small |
~300 MB | TextGenerator, Summarizer, Translator, GrammarCorrector |
| Summarization / Translation / Grammar | FlanT5TextGenerator.flanT5Base() |
inference4j/flan-t5-base |
~900 MB | TextGenerator, Summarizer, Translator, GrammarCorrector |
| Summarization / Translation / Grammar | FlanT5TextGenerator.flanT5Large() |
inference4j/flan-t5-large |
~3 GB | TextGenerator, Summarizer, Translator, GrammarCorrector |
| Text-to-SQL | T5SqlGenerator.t5SmallAwesome() |
inference4j/t5-small-awesome-text-to-sql |
~240 MB | TextGenerator, SqlGenerator |
| Text-to-SQL | T5SqlGenerator.t5LargeSpider() |
inference4j/T5-LM-Large-text2sql-spider |
~4.6 GB | TextGenerator, SqlGenerator |
| Summarization | BartSummarizer.distilBartCnn() |
inference4j/distilbart-cnn-12-6 |
~1.2 GB | TextGenerator, Summarizer |
| Summarization | BartSummarizer.bartLargeCnn() |
inference4j/bart-large-cnn |
~1.6 GB | TextGenerator, Summarizer |
| Translation | MarianTranslator.builder() |
User-specified (inference4j/opus-mt-*) |
varies | TextGenerator, Translator |
| Grammar Correction | CoeditGrammarCorrector.coeditBase() |
inference4j/coedit-base |
~900 MB | TextGenerator, GrammarCorrector |
| Grammar Correction | CoeditGrammarCorrector.coeditLarge() |
inference4j/coedit-large |
~3 GB | TextGenerator, GrammarCorrector |
Vision¶
| Capability | Wrapper | Default Model ID | Size | API |
|---|---|---|---|---|
| Image Classification | ResNetClassifier |
inference4j/resnet50-v1-7 |
~100 MB | ImageClassifier |
| Image Classification | EfficientNetClassifier |
inference4j/efficientnet-lite4 |
~50 MB | ImageClassifier |
| Object Detection | YoloV8Detector |
inference4j/yolov8n |
~25 MB | ObjectDetector |
| Object Detection | Yolo26Detector |
inference4j/yolo26n |
~25 MB | ObjectDetector |
| Text Detection | CraftTextDetector |
inference4j/craft-mlt-25k |
~80 MB | TextDetector |
Multimodal¶
| Capability | Wrapper | Default Model ID | Size | API |
|---|---|---|---|---|
| Zero-Shot Classification | ClipClassifier |
inference4j/clip-vit-base-patch32 |
~595 MB | ZeroShotClassifier |
| Image Embeddings | ClipImageEncoder |
inference4j/clip-vit-base-patch32 |
~340 MB | ImageEmbedder |
| Text Embeddings (CLIP) | ClipTextEncoder |
inference4j/clip-vit-base-patch32 |
~255 MB | TextEmbedder |
Audio¶
| Capability | Wrapper | Default Model ID | Size | API |
|---|---|---|---|---|
| Speech-to-Text | Wav2Vec2Recognizer |
inference4j/wav2vec2-base-960h |
~370 MB | SpeechRecognizer |
| Voice Activity Detection | SileroVadDetector |
inference4j/silero-vad |
~2 MB | VoiceActivityDetector |
Generative AI¶
Generative models use a separate module (inference4j-genai) and a different builder pattern. See Generative AI for details.
| Capability | Wrapper | Model ID | Size | License |
|---|---|---|---|---|
| Text Generation | TextGenerator |
inference4j/phi-3-mini-4k-instruct |
~2.7 GB | MIT |
| Text Generation | TextGenerator |
inference4j/deepseek-r1-distill-qwen-1.5b |
~1 GB | MIT |
| Speech-to-Text / Translation (WIP) | WhisperSpeechModel |
inference4j/whisper-small-genai |
~500 MB | MIT |
| Vision-Language | VisionLanguageModel |
inference4j/phi-3.5-vision-instruct |
~3.3 GB | MIT |
Model reference¶
A comprehensive view of all supported models, organized by architecture:
Encoder-only (single-pass)¶
| Model | Tokenizer | Wrapper | Use Cases |
|---|---|---|---|
| DistilBERT SST-2 | WordPiece | DistilBertTextClassifier |
Sentiment analysis, text classification |
| DistilBERT NER | WordPiece (cased) | BertNerRecognizer |
Named entity recognition |
| BERT Base NER | WordPiece (cased) | BertNerRecognizer |
Named entity recognition |
| all-MiniLM-L6-v2 | WordPiece | SentenceTransformerEmbedder |
Semantic search, embeddings |
| all-mpnet-base-v2 | WordPiece | SentenceTransformerEmbedder |
Semantic search, embeddings |
| BGE Base EN v1.5 | WordPiece | SentenceTransformerEmbedder |
Semantic search, embeddings |
| GTE Base | WordPiece | SentenceTransformerEmbedder |
Semantic search, embeddings |
| MiniLM-L-6 MS MARCO | WordPiece | MiniLMSearchReranker |
Search reranking |
Decoder-only (autoregressive)¶
| Model | Tokenizer | Wrapper | Use Cases |
|---|---|---|---|
| GPT-2 | BPE | OnnxTextGenerator |
Text completion |
| SmolLM2-360M | BPE | OnnxTextGenerator |
Chat, instruction following |
| TinyLlama-1.1B | SentencePiece BPE | OnnxTextGenerator |
Chat, instruction following |
| Qwen2.5-1.5B | BPE | OnnxTextGenerator |
Chat, instruction following |
| Gemma 2-2B | SentencePiece BPE | OnnxTextGenerator |
Chat, instruction following |
Encoder-decoder (autoregressive)¶
| Model | Tokenizer | Wrapper | Use Cases |
|---|---|---|---|
| Flan-T5 (Small / Base / Large) | SentencePiece Unigram | FlanT5TextGenerator |
Summarization, translation, grammar |
| T5-small-awesome-text-to-sql | SentencePiece Unigram | T5SqlGenerator |
Text-to-SQL (lightweight) |
| T5-LM-Large-text2sql-spider | SentencePiece Unigram | T5SqlGenerator |
Text-to-SQL (high accuracy) |
| DistilBART CNN 12-6 | BPE | BartSummarizer |
Summarization |
| BART Large CNN | BPE | BartSummarizer |
Summarization |
| MarianMT (opus-mt-*) | SentencePiece BPE | MarianTranslator |
Translation (fixed language pair) |
| CoEdIT (Base / Large) | SentencePiece Unigram | CoeditGrammarCorrector |
Grammar correction |
Vision¶
| Model | Tokenizer | Wrapper | Use Cases |
|---|---|---|---|
| ResNet-50 | N/A | ResNetClassifier |
Image classification |
| EfficientNet-Lite4 | N/A | EfficientNetClassifier |
Image classification |
| YOLOv8n | N/A | YoloV8Detector |
Object detection |
| YOLO26n | N/A | Yolo26Detector |
Object detection |
| CRAFT | N/A | CraftTextDetector |
Text detection in images |
Multimodal¶
| Model | Tokenizer | Wrapper | Use Cases |
|---|---|---|---|
| CLIP ViT-B/32 | BPE | ClipClassifier |
Zero-shot image classification |
Audio¶
| Model | Tokenizer | Wrapper | Use Cases |
|---|---|---|---|
| Wav2Vec2 | CTC | Wav2Vec2Recognizer |
Speech-to-text |
| Silero VAD | N/A | SileroVadDetector |
Voice activity detection |
Model compatibility¶
YOLOv8 / YOLO11¶
YoloV8Detector is compatible with both YOLOv8 and YOLO11 models — they share the same output layout ([1, 4+C, N]). It is not compatible with YOLOv5 (different layout with objectness column) or YOLO26 (NMS-free architecture).
EfficientNet variants¶
EfficientNetClassifier is tested against EfficientNet-Lite4 (TensorFlow origin, softmax built-in). For PyTorch-exported EfficientNet models that output raw logits, override with .outputOperator(OutputOperator.softmax()).
Custom models¶
Any ONNX-exported model works with the appropriate wrapper, as long as it follows the expected input/output layout. See the Custom Models guide for details.
Cache¶
Models are cached in ~/.cache/inference4j/ by default. Customize with:
- System property:
-Dinference4j.cache.dir=/path/to/cache - Environment variable:
INFERENCE4J_CACHE_DIR=/path/to/cache
See Configuration for all options.
Planned models¶
| Domain | Model | Status |
|---|---|---|
| Text | TrOCR (text recognition) | Planned — enabled by encoder-decoder infrastructure |
See the Roadmap for details.