Skip to content

Sentiment Analysis

Classify text as positive or negative (or any custom label set) using a fine-tuned DistilBERT model.

Quick example

try (var classifier = DistilBertTextClassifier.builder().build()) {
    List<TextClassification> results = classifier.classify("This movie was fantastic!");
    // [TextClassification[label=POSITIVE, confidence=0.9998]]
}

Full example

import io.github.inference4j.nlp.DistilBertTextClassifier;
import io.github.inference4j.nlp.TextClassification;
import java.util.List;

public class SentimentAnalysis {
    public static void main(String[] args) {
        try (var classifier = DistilBertTextClassifier.builder().build()) {
            List<String> reviews = List.of(
                "This movie was fantastic!",
                "Terrible experience, would not recommend.",
                "It was okay, nothing special."
            );

            for (String review : reviews) {
                List<TextClassification> results = classifier.classify(review);
                TextClassification top = results.get(0);
                System.out.printf("%-45s → %s (%.2f%%)%n",
                    review, top.label(), top.confidence() * 100);
            }
        }
    }
}
Screenshot from showcase app
Screenshot from showcase app

Builder options

Method Type Default Description
.modelId(String) String inference4j/distilbert-base-uncased-finetuned-sst-2-english HuggingFace model ID
.modelSource(ModelSource) ModelSource HuggingFaceModelSource Model resolution strategy
.sessionOptions(SessionConfigurer) SessionConfigurer default ONNX Runtime session config
.tokenizer(Tokenizer) Tokenizer auto-loaded WordPieceTokenizer Custom tokenizer
.config(ModelConfig) ModelConfig auto-loaded from config.json Model config with labels
.outputOperator(OutputOperator) OutputOperator auto-detected (softmax or sigmoid) Output activation
.maxLength(int) int 512 Maximum token sequence length

Result type

TextClassification is a record with:

Field Type Description
label() String Classification label (e.g., POSITIVE)
classIndex() int Numeric class index
confidence() float Confidence score (0.0 to 1.0)

Using custom models

Any HuggingFace text classification model exported to ONNX will work, as long as it includes vocab.txt and config.json with id2label mappings.

try (var classifier = DistilBertTextClassifier.builder()
        .modelId("your-org/your-model")
        .build()) {
    classifier.classify("Some text");
}

The output activation (softmax vs sigmoid) is auto-detected from config.json:

  • problem_type: "multi_label_classification" → sigmoid
  • Everything else → softmax

Tips

  • The default model is fine-tuned on SST-2 (movie reviews). For other domains (product reviews, support tickets), use a model fine-tuned on relevant data.
  • Use .classify(text, topK) to limit the number of returned classifications.
  • For multi-label classification (where multiple labels can be true simultaneously), use a model with problem_type: "multi_label_classification" in its config.json.