Skip to content

Grammar Correction

Fix grammatical errors in text using CoEdIT or Flan-T5 encoder-decoder models.

Quick example

try (var corrector = CoeditGrammarCorrector.coeditBase().build()) {
    String corrected = corrector.correct("She don't likes swimming.");
    System.out.println(corrected); // She doesn't like swimming.
}

Full example

import io.github.inference4j.generation.GenerationResult;
import io.github.inference4j.nlp.CoeditGrammarCorrector;

public class GrammarCorrection {
    public static void main(String[] args) {
        try (var corrector = CoeditGrammarCorrector.coeditBase()
                .maxNewTokens(200)
                .build()) {

            String[] sentences = {
                "She don't likes swimming.",
                "Me and him went to the store yesterday.",
                "The informations is very useful for we."
            };

            for (String sentence : sentences) {
                GenerationResult result = corrector.correct(sentence,
                        token -> System.out.print(token));
                System.out.println();
                System.out.printf("  → %d tokens in %,d ms%n",
                        result.generatedTokens(), result.duration().toMillis());
            }
        }
    }
}

Using Flan-T5 as an alternative

FlanT5TextGenerator can also correct grammar. It implements the same GrammarCorrector interface:

import io.github.inference4j.nlp.FlanT5TextGenerator;
import io.github.inference4j.nlp.GrammarCorrector;

// Both implement GrammarCorrector — swap freely
GrammarCorrector corrector = FlanT5TextGenerator.flanT5Base()
        .maxNewTokens(200)
        .build();

Model presets

CoeditGrammarCorrector

Preset Model Parameters Size
CoeditGrammarCorrector.coeditBase() CoEdIT Base 250M ~900 MB
CoeditGrammarCorrector.coeditLarge() CoEdIT Large 780M ~3 GB

FlanT5TextGenerator

Preset Model Parameters Size
FlanT5TextGenerator.flanT5Small() Flan-T5 Small 77M ~300 MB
FlanT5TextGenerator.flanT5Base() Flan-T5 Base 250M ~900 MB
FlanT5TextGenerator.flanT5Large() Flan-T5 Large 780M ~3 GB

Builder options

Method Type Default Description
.modelId(String) String Preset-dependent HuggingFace model ID
.modelSource(ModelSource) ModelSource HuggingFaceModelSource Model resolution strategy
.sessionOptions(SessionConfigurer) SessionConfigurer default ONNX Runtime session config
.tokenizerProvider(TokenizerProvider) TokenizerProvider SentencePieceBpeTokenizer Tokenizer construction strategy
.maxNewTokens(int) int 256 Maximum tokens to generate
.temperature(float) float 0.0 Sampling temperature
.topK(int) int 0 (disabled) Top-K sampling
.topP(float) float 0.0 (disabled) Nucleus sampling
.eosTokenId(int) int Auto-detected End-of-sequence token ID
.addedToken(String) String Register a special token for atomic encoding

Result type

GenerationResult is a record with:

Field Type Description
text() String The corrected text
promptTokens() int Number of tokens in the input
generatedTokens() int Number of tokens generated
duration() Duration Wall-clock generation time

The convenience method correct(text) returns the corrected text as a plain String.

Tips

  • CoEdIT is specifically trained for grammar correction (using the "Fix grammatical errors" instruction internally). It produces more reliable corrections than general-purpose models.
  • Flan-T5 is a general-purpose model that also handles summarization, translation, and SQL generation. Use it when you need multiple tasks from a single model.
  • Use greedy decoding (default temperature=0) for grammar correction — sampling introduces random variations.
  • CoEdIT automatically prepends the instruction prefix "Fix grammatical errors in this sentence: " — just pass the raw text to correct().
  • For batch correction, reuse the same instance — each call to correct() runs an independent generation.