API Overview¶
Package structure¶
inference4j-core¶
| Package | Contents |
|---|---|
io.github.inference4j |
Core contracts: InferenceTask, Classifier, Detector, ZeroShotClassifier, ZeroShotInput, AbstractInferenceTask, Tensor, TensorType, InferenceSession, InferenceContext |
io.github.inference4j.session |
Session config: SessionConfigurer, SessionOptions |
io.github.inference4j.model |
Model resolution: ModelSource, HuggingFaceModelSource, LocalModelSource |
io.github.inference4j.processing |
Pre/post-processing: Preprocessor, Postprocessor, OutputOperator, MathOps |
io.github.inference4j.exception |
Custom exceptions: ModelLoadException, InferenceException |
io.github.inference4j.tokenizer |
Tokenizer, EncodedInput, WordPieceTokenizer, BpeTokenizer, DecodingBpeTokenizer, SentencePieceBpeTokenizer, UnigramTokenizer, TokenDecoder |
io.github.inference4j.preprocessing.text |
ModelConfig (HuggingFace config.json parser) |
io.github.inference4j.preprocessing.image |
Image transforms pipeline: ImageTransformPipeline, ResizeTransform, CenterCropTransform, ImageLayout, Labels |
io.github.inference4j.preprocessing.audio |
AudioTransformPipeline, AudioTransform, AudioData, AudioLoader, AudioWriter, AudioProcessor |
io.github.inference4j.vision |
ResNetClassifier, EfficientNetClassifier, YoloV8Detector, Yolo26Detector, CraftTextDetector, ImageEmbedder, ImageAnnotator |
io.github.inference4j.audio |
Wav2Vec2Recognizer, SileroVadDetector |
io.github.inference4j.nlp |
DistilBertTextClassifier, SentenceTransformerEmbedder, MiniLMSearchReranker, OnnxTextGenerator, FlanT5TextGenerator, BartSummarizer, MarianTranslator, CoeditGrammarCorrector, T5SqlGenerator, TextGenerator, Summarizer, Translator, GrammarCorrector, SqlGenerator, Language, PoolingStrategy, QueryDocumentPair |
io.github.inference4j.multimodal |
ClipClassifier, ClipImageEncoder, ClipTextEncoder |
io.github.inference4j.generation |
GenerativeTask, GenerationEngine, GenerationResult, GenerativeSession, EncoderDecoderSession, ChatTemplate, GenerativeModel |
io.github.inference4j.sampling |
LogitsProcessor, LogitsSampler, CategoricalSampler, GreedySampler |
inference4j-genai¶
| Package | Contents |
|---|---|
io.github.inference4j.genai |
AbstractGenerativeTask, ModelSources |
io.github.inference4j.genai.nlp |
TextGenerator (onnxruntime-genai backed) |
io.github.inference4j.genai.audio |
WhisperSpeechModel, WhisperTask |
io.github.inference4j.genai.vision |
VisionLanguageModel, VisionInput |
inference4j-runtime¶
| Package | Contents |
|---|---|
io.github.inference4j.routing |
ModelRouter, Route, RoutingStrategy, WeightedRoutingStrategy, RouterMetrics |
inference4j-spring-boot-starter¶
| Package | Contents |
|---|---|
io.github.inference4j.autoconfigure |
Auto-configuration, health indicators, Inference4jProperties |
Interface hierarchy¶
InferenceTask<I, O> // run(I) → O, extends AutoCloseable
├── Classifier<I, C> // classify(I) → List<C>
│ ├── ImageClassifier // classify(BufferedImage/Path) → List<Classification>
│ └── TextClassifier // classify(String) → List<TextClassification>
├── ZeroShotClassifier<I, C> // classify(I, List<String>) → List<C>, run(ZeroShotInput<I>) → List<C>
├── Detector<I, D> // detect(I) → List<D>
│ ├── ObjectDetector // detect(BufferedImage/Path) → List<Detection>
│ ├── TextDetector // detect(BufferedImage/Path) → List<TextRegion>
│ └── VoiceActivityDetector // detect(Path/float[]) → List<VoiceSegment>
├── TextEmbedder // encode(String) → float[]
├── ImageEmbedder // encode(BufferedImage/Path) → float[]
├── SearchReranker // score(String, String) → float
├── SpeechRecognizer // transcribe(Path) → Transcription
├── TextGenerator // generate(String) → GenerationResult
├── Summarizer // summarize(String) → String
├── Translator // translate(String) → String
├── GrammarCorrector // correct(String) → String
└── SqlGenerator // generateSql(String, String) → String
AbstractInferenceTask¶
Most task wrappers extend AbstractInferenceTask<I, O>, which enforces a final run() method with three stages:
- Preprocess:
I → Map<String, Tensor>(viaPreprocessor) - Infer:
Map<String, Tensor> → Map<String, Tensor>(viaInferenceSession) - Postprocess:
InferenceContext<I> → O(viaPostprocessor)
InferenceContext<I> carries data across stages — the original input, preprocessed tensors, and output tensors.
Exceptions: SileroVadDetector (stateful hidden state), MiniLMSearchReranker, ClipClassifier, and ClipTextEncoder do not extend AbstractInferenceTask.
Generative AI hierarchy¶
GenerativeTask<I, O> // generate(I) → O, extends AutoCloseable
└── AbstractGenerativeTask<I, O> // owns the autoregressive loop
├── TextGenerator // generate(String) → GenerationResult
├── WhisperSpeechModel // transcribe(Path) → Transcription
└── VisionLanguageModel // generate(VisionInput) → GenerationResult
GenerativeTask is the generative counterpart to InferenceTask. While InferenceTask
performs a single forward pass, GenerativeTask runs an iterative generate loop
backed by onnxruntime-genai. AbstractGenerativeTask owns the generate loop and
provides hooks for subclasses: prepareGenerator(), createStream(), parseOutput(),
and createParams() (for model-specific generation options like temperature and max length).
See Generative AI for details.
Result types¶
| Type | Fields | Used by |
|---|---|---|
Classification |
label(), index(), confidence() |
ResNetClassifier, EfficientNetClassifier, ClipClassifier |
TextClassification |
label(), classIndex(), confidence() |
DistilBertTextClassifier |
Detection |
label(), classIndex(), confidence(), box() |
YoloV8Detector, Yolo26Detector |
TextRegion |
box(), confidence() |
CraftTextDetector |
BoundingBox |
x1(), y1(), x2(), y2() |
Embedded in Detection, TextRegion |
Transcription |
text(), segments() |
Wav2Vec2Recognizer, WhisperSpeechModel |
VoiceSegment |
start(), end(), duration(), confidence() |
SileroVadDetector |
GenerationResult |
text(), promptTokens(), generatedTokens(), duration() |
OnnxTextGenerator, FlanT5TextGenerator, BartSummarizer, MarianTranslator, CoeditGrammarCorrector, TextGenerator, VisionLanguageModel |
QueryDocumentPair |
query(), document() |
MiniLMSearchReranker, SearchReranker |
VisionInput |
imagePath(), prompt() |
VisionLanguageModel |
Builder pattern¶
All task wrappers follow a consistent builder pattern:
try (var task = SomeWrapper.builder()
.modelId("org/model-name") // HuggingFace model ID
.modelSource(new LocalModelSource(...)) // optional: custom model source
.sessionOptions(opts -> opts.addCoreML()) // optional: ONNX Runtime config
// ... task-specific options ...
.build()) {
task.run(input);
}
The .session(InferenceSession) method exists on all builders but is package-private — the public API uses modelId + modelSource + sessionOptions.
Functional interfaces¶
| Interface | Signature | Description |
|---|---|---|
SessionConfigurer |
void configure(SessionOptions) throws OrtException |
ONNX Runtime session customization |
ModelSource |
Path resolve(String modelId) |
Model file resolution |
Preprocessor<I, O> |
O process(I input) |
Input transformation |
Postprocessor<I, O> |
O process(I input) |
Output transformation |
OutputOperator |
float[] apply(float[] values) |
Activation function (softmax, sigmoid, identity) |
ChatTemplate |
String format(String userMessage) |
Prompt formatting for generative models |