Grammar Correction¶
Fix grammatical errors in text using CoEdIT or Flan-T5 encoder-decoder models.
Quick example¶
try (var corrector = CoeditGrammarCorrector.coeditBase().build()) {
String corrected = corrector.correct("She don't likes swimming.");
System.out.println(corrected); // She doesn't like swimming.
}
Full example¶
import io.github.inference4j.generation.GenerationResult;
import io.github.inference4j.nlp.CoeditGrammarCorrector;
public class GrammarCorrection {
public static void main(String[] args) {
try (var corrector = CoeditGrammarCorrector.coeditBase()
.maxNewTokens(200)
.build()) {
String[] sentences = {
"She don't likes swimming.",
"Me and him went to the store yesterday.",
"The informations is very useful for we."
};
for (String sentence : sentences) {
GenerationResult result = corrector.correct(sentence,
token -> System.out.print(token));
System.out.println();
System.out.printf(" → %d tokens in %,d ms%n",
result.generatedTokens(), result.duration().toMillis());
}
}
}
}
Using Flan-T5 as an alternative¶
FlanT5TextGenerator can also correct grammar. It implements the same GrammarCorrector interface:
import io.github.inference4j.nlp.FlanT5TextGenerator;
import io.github.inference4j.nlp.GrammarCorrector;
// Both implement GrammarCorrector — swap freely
GrammarCorrector corrector = FlanT5TextGenerator.flanT5Base()
.maxNewTokens(200)
.build();
Model presets¶
CoeditGrammarCorrector¶
| Preset | Model | Parameters | Size |
|---|---|---|---|
CoeditGrammarCorrector.coeditBase() |
CoEdIT Base | 250M | ~900 MB |
CoeditGrammarCorrector.coeditLarge() |
CoEdIT Large | 780M | ~3 GB |
FlanT5TextGenerator¶
| Preset | Model | Parameters | Size |
|---|---|---|---|
FlanT5TextGenerator.flanT5Small() |
Flan-T5 Small | 77M | ~300 MB |
FlanT5TextGenerator.flanT5Base() |
Flan-T5 Base | 250M | ~900 MB |
FlanT5TextGenerator.flanT5Large() |
Flan-T5 Large | 780M | ~3 GB |
Builder options¶
| Method | Type | Default | Description |
|---|---|---|---|
.modelId(String) |
String |
Preset-dependent | HuggingFace model ID |
.modelSource(ModelSource) |
ModelSource |
HuggingFaceModelSource |
Model resolution strategy |
.sessionOptions(SessionConfigurer) |
SessionConfigurer |
default | ONNX Runtime session config |
.tokenizerProvider(TokenizerProvider) |
TokenizerProvider |
SentencePieceBpeTokenizer |
Tokenizer construction strategy |
.maxNewTokens(int) |
int |
256 |
Maximum tokens to generate |
.temperature(float) |
float |
0.0 |
Sampling temperature |
.topK(int) |
int |
0 (disabled) |
Top-K sampling |
.topP(float) |
float |
0.0 (disabled) |
Nucleus sampling |
.eosTokenId(int) |
int |
Auto-detected | End-of-sequence token ID |
.addedToken(String) |
String |
— | Register a special token for atomic encoding |
Result type¶
GenerationResult is a record with:
| Field | Type | Description |
|---|---|---|
text() |
String |
The corrected text |
promptTokens() |
int |
Number of tokens in the input |
generatedTokens() |
int |
Number of tokens generated |
duration() |
Duration |
Wall-clock generation time |
The convenience method correct(text) returns the corrected text as a plain String.
Tips¶
- CoEdIT is specifically trained for grammar correction (using the "Fix grammatical errors" instruction internally). It produces more reliable corrections than general-purpose models.
- Flan-T5 is a general-purpose model that also handles summarization, translation, and SQL generation. Use it when you need multiple tasks from a single model.
- Use greedy decoding (default
temperature=0) for grammar correction — sampling introduces random variations. - CoEdIT automatically prepends the instruction prefix
"Fix grammatical errors in this sentence: "— just pass the raw text tocorrect(). - For batch correction, reuse the same instance — each call to
correct()runs an independent generation.