Installation¶

Requirements¶

Java 17 or higher
ONNX Runtime (included transitively)

Add the dependency¶

inference4j-core is the only dependency you need — it includes all task wrappers, preprocessing, and tokenizers.

GradleMaven

implementation 'io.github.inference4j:inference4j-core:${inference4jVersion}'

<dependency>
    <groupId>io.github.inference4j</groupId>
    <artifactId>inference4j-core</artifactId>
    <version>${inference4jVersion}</version>
</dependency>

Generative AI¶

For text generation (Phi-3, DeepSeek-R1, etc.), add inference4j-genai instead:

GradleMaven

implementation 'io.github.inference4j:inference4j-genai'

<dependency>
    <groupId>io.github.inference4j</groupId>
    <artifactId>inference4j-genai</artifactId>
</dependency>

This is a separate module backed by onnxruntime-genai. See the Generative AI guide for details.

JVM flags¶

ONNX Runtime requires native access. Add this flag to your JVM arguments:

--enable-native-access=ALL-UNNAMED

Or, if you're on the module path:

--enable-native-access=com.microsoft.onnxruntime

Setting JVM flags in Gradle¶

tasks.withType(JavaExec).configureEach {
    jvmArgs '--enable-native-access=ALL-UNNAMED'
}

tasks.withType(Test).configureEach {
    jvmArgs '--enable-native-access=ALL-UNNAMED'
}

Spring Boot¶

For Spring Boot applications, use the starter instead:

GradleMaven

implementation 'io.github.inference4j:inference4j-spring-boot-starter:${inference4jVersion}'

<dependency>
    <groupId>io.github.inference4j</groupId>
    <artifactId>inference4j-spring-boot-starter</artifactId>
    <version>${inference4jVersion}</version>
</dependency>

See the Spring Boot guide for configuration details.

GPU support¶

The default dependency includes CPU and CoreML (macOS) support. For CUDA (Linux/Windows), swap the ONNX Runtime dependency:

GradleMaven

implementation('io.github.inference4j:inference4j-core:${inference4jVersion}') {
    exclude group: 'com.microsoft.onnxruntime', module: 'onnxruntime'
}
implementation 'com.microsoft.onnxruntime:onnxruntime_gpu:${onnxruntimeVersion}'

<dependency>
    <groupId>io.github.inference4j</groupId>
    <artifactId>inference4j-core</artifactId>
    <version>${inference4jVersion}</version>
    <exclusions>
        <exclusion>
            <groupId>com.microsoft.onnxruntime</groupId>
            <artifactId>onnxruntime</artifactId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>com.microsoft.onnxruntime</groupId>
    <artifactId>onnxruntime_gpu</artifactId>
    <version>${onnxruntimeVersion}</version>
</dependency>

See the Hardware Acceleration guide for usage details.