Skip to content

Spring Boot

inference4j provides a Spring Boot starter with opt-in auto-configuration for all supported models.

Setup

Add the starter dependency:

implementation 'io.github.inference4j:inference4j-spring-boot-starter:${inference4jVersion}'
<dependency>
    <groupId>io.github.inference4j</groupId>
    <artifactId>inference4j-spring-boot-starter</artifactId>
    <version>${inference4jVersion}</version>
</dependency>

Configuration

Every model is opt-in — nothing is downloaded until you set enabled: true.

inference4j:
  nlp:
    text-classifier:
      enabled: true

All properties

Property Type Default Description
inference4j.nlp.text-classifier.enabled boolean false Enable DistilBERT text classifier
inference4j.nlp.text-classifier.model-id String inference4j/distilbert-base-uncased-finetuned-sst-2-english Model ID
inference4j.nlp.text-embedder.enabled boolean false Enable SentenceTransformer embedder
inference4j.nlp.text-embedder.model-id String (required) Model ID (no default — must be specified)
inference4j.nlp.search-reranker.enabled boolean false Enable MiniLM cross-encoder reranker
inference4j.nlp.search-reranker.model-id String inference4j/ms-marco-MiniLM-L-6-v2 Model ID
inference4j.vision.image-classifier.enabled boolean false Enable ResNet image classifier
inference4j.vision.image-classifier.model-id String inference4j/resnet50-v1-7 Model ID
inference4j.vision.object-detector.enabled boolean false Enable YOLOv8 object detector
inference4j.vision.object-detector.model-id String inference4j/yolov8n Model ID
inference4j.vision.text-detector.enabled boolean false Enable CRAFT text detector
inference4j.vision.text-detector.model-id String inference4j/craft-mlt-25k Model ID
inference4j.audio.speech-recognizer.enabled boolean false Enable Wav2Vec2 speech recognizer
inference4j.audio.speech-recognizer.model-id String inference4j/wav2vec2-base-960h Model ID
inference4j.audio.vad.enabled boolean false Enable Silero VAD
inference4j.audio.vad.model-id String inference4j/silero-vad Model ID

Usage

Beans are registered by their interface type, so you inject the interface — not the concrete implementation:

@RestController
public class SentimentController {
    private final TextClassifier classifier;

    public SentimentController(TextClassifier classifier) {
        this.classifier = classifier;
    }

    @PostMapping("/analyze")
    public List<TextClassification> analyze(@RequestBody String text) {
        return classifier.classify(text);
    }
}

Overriding beans

All auto-configured beans use @ConditionalOnMissingBean, so you can replace any model with your own implementation:

@Configuration
public class CustomModelConfig {

    @Bean
    public TextClassifier textClassifier() {
        return DistilBertTextClassifier.builder()
            .modelId("your-org/your-model")
            .sessionOptions(opts -> opts.addCoreML())
            .build();
    }
}

When you define your own bean, the auto-configured one is skipped.

Health indicator

An actuator health indicator is included automatically when Spring Boot Actuator is on the classpath. It reports the number of registered InferenceTask beans.

GET /actuator/health
{
  "status": "UP",
  "components": {
    "inference4j": {
      "status": "UP",
      "details": {
        "tasks": 3
      }
    }
  }
}

The health indicator is enabled by default. Disable it with:

inference4j:
  health:
    enabled: false

Example configuration

A typical production setup enabling sentiment analysis and semantic search:

inference4j:
  nlp:
    text-classifier:
      enabled: true
    text-embedder:
      enabled: true
      model-id: inference4j/all-MiniLM-L6-v2
    search-reranker:
      enabled: true
@Service
public class SearchService {
    private final TextEmbedder embedder;
    private final SearchReranker reranker;

    public SearchService(TextEmbedder embedder, SearchReranker reranker) {
        this.embedder = embedder;
        this.reranker = reranker;
    }

    public List<SearchResult> search(String query, List<String> candidates) {
        // Stage 1: embed and retrieve
        float[] queryEmb = embedder.encode(query);
        // ... cosine similarity ranking ...

        // Stage 2: rerank top candidates
        float[] scores = reranker.scoreBatch(query, topCandidates);
        // ... sort by score ...
        return results;
    }
}

Lazy loading

All inference4j beans are lazy by default — models are downloaded and ONNX sessions are created on first use, not at application startup. This keeps startup fast even when multiple models are enabled.

The trade-off is that the first request to each model may be slower due to the model download and session initialization.

Warming up specific models

If you need a model ready immediately (e.g., for latency-sensitive endpoints), inject it eagerly at startup with an ApplicationReadyEvent listener:

@Component
public class ModelWarmup {
    private final TextClassifier classifier;

    public ModelWarmup(TextClassifier classifier) {
        this.classifier = classifier;
    }

    @EventListener(ApplicationReadyEvent.class)
    public void warmup() {
        classifier.classify("warmup");
    }
}

This triggers the lazy bean initialization during startup, so the model is ready when the first real request arrives.

Tips

  • Use model-id to swap models without changing code (e.g., switch from ResNet to EfficientNet).
  • The text embedder has no default model ID — you must specify one via model-id.
  • All beans implement AutoCloseable and are properly cleaned up on application shutdown.