Skip to content

Text Summarization

Summarize long articles and documents into concise text using BART or Flan-T5 encoder-decoder models.

Quick example

try (var summarizer = BartSummarizer.distilBartCnn().build()) {
    String summary = summarizer.summarize("Long article text here...");
    System.out.println(summary);
}

Full example

import io.github.inference4j.generation.GenerationResult;
import io.github.inference4j.nlp.BartSummarizer;

public class Summarization {
    public static void main(String[] args) {
        try (var summarizer = BartSummarizer.distilBartCnn()
                .maxNewTokens(150)
                .build()) {

            String article = """
                The Amazon rainforest, often referred to as the "lungs of the Earth",
                produces about 20% of the world's oxygen. Spanning across nine countries
                in South America, it is the largest tropical rainforest in the world,
                covering approximately 5.5 million square kilometers. The forest is home
                to an estimated 10% of all species on Earth, including over 40,000 plant
                species, 1,300 bird species, and 3,000 types of fish. Deforestation
                remains a critical threat, with an estimated 17% of the forest lost in
                the last 50 years due to logging, agriculture, and urban expansion.
                """;

            GenerationResult result = summarizer.summarize(article, token -> System.out.print(token));
            System.out.println();
            System.out.printf("%d tokens in %,d ms%n",
                    result.generatedTokens(), result.duration().toMillis());
        }
    }
}

Using Flan-T5 as an alternative

FlanT5TextGenerator can also summarize text. It uses a different architecture but implements the same Summarizer interface:

import io.github.inference4j.nlp.FlanT5TextGenerator;
import io.github.inference4j.nlp.Summarizer;

// Both implement Summarizer — swap freely
Summarizer summarizer = FlanT5TextGenerator.flanT5Base()
        .maxNewTokens(150)
        .build();

Model presets

BartSummarizer

Preset Model Parameters Size
BartSummarizer.distilBartCnn() DistilBART CNN 12-6 306M ~1.2 GB
BartSummarizer.bartLargeCnn() BART Large CNN 406M ~1.6 GB

FlanT5TextGenerator

Preset Model Parameters Size
FlanT5TextGenerator.flanT5Small() Flan-T5 Small 77M ~300 MB
FlanT5TextGenerator.flanT5Base() Flan-T5 Base 250M ~900 MB
FlanT5TextGenerator.flanT5Large() Flan-T5 Large 780M ~3 GB

Builder options

Method Type Default Description
.modelId(String) String Preset-dependent HuggingFace model ID
.modelSource(ModelSource) ModelSource HuggingFaceModelSource Model resolution strategy
.sessionOptions(SessionConfigurer) SessionConfigurer default ONNX Runtime session config
.tokenizerProvider(TokenizerProvider) TokenizerProvider Preset-dependent Tokenizer construction strategy
.maxNewTokens(int) int 256 Maximum tokens to generate
.temperature(float) float 0.0 Sampling temperature (higher = more random)
.topK(int) int 0 (disabled) Top-K sampling
.topP(float) float 0.0 (disabled) Nucleus sampling
.eosTokenId(int) int Auto-detected End-of-sequence token ID
.addedToken(String) String Register a special token for atomic encoding

Result type

Both summarize(text, tokenListener) and generate(text, tokenListener) return a GenerationResult record:

Field Type Description
text() String The generated summary
promptTokens() int Number of tokens in the input
generatedTokens() int Number of tokens generated
duration() Duration Wall-clock generation time

The convenience method summarize(text) returns the summary as a plain String.

Tips

  • DistilBART CNN is purpose-built for summarization and produces the best summaries. Use it when summarization is your only task.
  • Flan-T5 is a general-purpose model that also handles translation, grammar correction, and SQL generation. Use it when you need multiple tasks from a single model.
  • Lower maxNewTokens for shorter summaries — the model will still produce coherent output.
  • Use streaming (summarize(text, token -> ...)) for long inputs where generation takes several seconds.
  • Reuse instances across calls — each one holds the model and tokenizer in memory.