Text Summarization¶
Summarize long articles and documents into concise text using BART or Flan-T5 encoder-decoder models.
Quick example¶
try (var summarizer = BartSummarizer.distilBartCnn().build()) {
String summary = summarizer.summarize("Long article text here...");
System.out.println(summary);
}
Full example¶
import io.github.inference4j.generation.GenerationResult;
import io.github.inference4j.nlp.BartSummarizer;
public class Summarization {
public static void main(String[] args) {
try (var summarizer = BartSummarizer.distilBartCnn()
.maxNewTokens(150)
.build()) {
String article = """
The Amazon rainforest, often referred to as the "lungs of the Earth",
produces about 20% of the world's oxygen. Spanning across nine countries
in South America, it is the largest tropical rainforest in the world,
covering approximately 5.5 million square kilometers. The forest is home
to an estimated 10% of all species on Earth, including over 40,000 plant
species, 1,300 bird species, and 3,000 types of fish. Deforestation
remains a critical threat, with an estimated 17% of the forest lost in
the last 50 years due to logging, agriculture, and urban expansion.
""";
GenerationResult result = summarizer.summarize(article, token -> System.out.print(token));
System.out.println();
System.out.printf("%d tokens in %,d ms%n",
result.generatedTokens(), result.duration().toMillis());
}
}
}
Using Flan-T5 as an alternative¶
FlanT5TextGenerator can also summarize text. It uses a different architecture but implements the same Summarizer interface:
import io.github.inference4j.nlp.FlanT5TextGenerator;
import io.github.inference4j.nlp.Summarizer;
// Both implement Summarizer — swap freely
Summarizer summarizer = FlanT5TextGenerator.flanT5Base()
.maxNewTokens(150)
.build();
Model presets¶
BartSummarizer¶
| Preset | Model | Parameters | Size |
|---|---|---|---|
BartSummarizer.distilBartCnn() |
DistilBART CNN 12-6 | 306M | ~1.2 GB |
BartSummarizer.bartLargeCnn() |
BART Large CNN | 406M | ~1.6 GB |
FlanT5TextGenerator¶
| Preset | Model | Parameters | Size |
|---|---|---|---|
FlanT5TextGenerator.flanT5Small() |
Flan-T5 Small | 77M | ~300 MB |
FlanT5TextGenerator.flanT5Base() |
Flan-T5 Base | 250M | ~900 MB |
FlanT5TextGenerator.flanT5Large() |
Flan-T5 Large | 780M | ~3 GB |
Builder options¶
| Method | Type | Default | Description |
|---|---|---|---|
.modelId(String) |
String |
Preset-dependent | HuggingFace model ID |
.modelSource(ModelSource) |
ModelSource |
HuggingFaceModelSource |
Model resolution strategy |
.sessionOptions(SessionConfigurer) |
SessionConfigurer |
default | ONNX Runtime session config |
.tokenizerProvider(TokenizerProvider) |
TokenizerProvider |
Preset-dependent | Tokenizer construction strategy |
.maxNewTokens(int) |
int |
256 |
Maximum tokens to generate |
.temperature(float) |
float |
0.0 |
Sampling temperature (higher = more random) |
.topK(int) |
int |
0 (disabled) |
Top-K sampling |
.topP(float) |
float |
0.0 (disabled) |
Nucleus sampling |
.eosTokenId(int) |
int |
Auto-detected | End-of-sequence token ID |
.addedToken(String) |
String |
— | Register a special token for atomic encoding |
Result type¶
Both summarize(text, tokenListener) and generate(text, tokenListener) return a GenerationResult record:
| Field | Type | Description |
|---|---|---|
text() |
String |
The generated summary |
promptTokens() |
int |
Number of tokens in the input |
generatedTokens() |
int |
Number of tokens generated |
duration() |
Duration |
Wall-clock generation time |
The convenience method summarize(text) returns the summary as a plain String.
Tips¶
- DistilBART CNN is purpose-built for summarization and produces the best summaries. Use it when summarization is your only task.
- Flan-T5 is a general-purpose model that also handles translation, grammar correction, and SQL generation. Use it when you need multiple tasks from a single model.
- Lower
maxNewTokensfor shorter summaries — the model will still produce coherent output. - Use streaming (
summarize(text, token -> ...)) for long inputs where generation takes several seconds. - Reuse instances across calls — each one holds the model and tokenizer in memory.