Hardware Acceleration¶

inference4j supports GPU and hardware acceleration via ONNX Runtime execution providers. The .sessionOptions() API is available on every model wrapper.

CoreML (macOS)¶

CoreML is bundled in the standard ONNX Runtime dependency on macOS — no additional setup needed.

try (var classifier = ResNetClassifier.builder()
        .sessionOptions(opts -> opts.addCoreML())
        .build()) {
    classifier.classify(Path.of("cat.jpg"));
}

CUDA (Linux/Windows)¶

For NVIDIA GPU acceleration, swap the ONNX Runtime dependency:

GradleMaven

implementation('io.github.inference4j:inference4j-core:${inference4jVersion}') {
    exclude group: 'com.microsoft.onnxruntime', module: 'onnxruntime'
}
implementation 'com.microsoft.onnxruntime:onnxruntime_gpu:${onnxruntimeVersion}'

<dependency>
    <groupId>io.github.inference4j</groupId>
    <artifactId>inference4j-core</artifactId>
    <version>${inference4jVersion}</version>
    <exclusions>
        <exclusion>
            <groupId>com.microsoft.onnxruntime</groupId>
            <artifactId>onnxruntime</artifactId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>com.microsoft.onnxruntime</groupId>
    <artifactId>onnxruntime_gpu</artifactId>
    <version>${onnxruntimeVersion}</version>
</dependency>

Then enable CUDA in the builder:

try (var classifier = ResNetClassifier.builder()
        .sessionOptions(opts -> opts.addCUDA(0))  // device ID 0
        .build()) {
    classifier.classify(Path.of("cat.jpg"));
}

The `sessionOptions` API¶

Every model wrapper exposes .sessionOptions(SessionConfigurer) in its builder. SessionConfigurer is a @FunctionalInterface that receives the ONNX Runtime SessionOptions:

@FunctionalInterface
public interface SessionConfigurer {
    void configure(OrtSession.SessionOptions options) throws OrtException;
}

This gives you full access to ONNX Runtime configuration:

.sessionOptions(opts -> {
    opts.addCoreML();
    opts.setIntraOpNumThreads(4);
})

Common options¶

Method	Description
`opts.addCoreML()`	Enable CoreML (macOS)
`opts.addCUDA(deviceId)`	Enable CUDA (Linux/Windows)
`opts.setIntraOpNumThreads(n)`	Set number of threads for intra-op parallelism
`opts.setInterOpNumThreads(n)`	Set number of threads for inter-op parallelism
`opts.setOptimizationLevel(level)`	Set graph optimization level

Benchmarks on Apple Silicon (M-series)¶

Model	Capability	CPU	CoreML	Speedup
ResNet-50	Image Classification	37 ms	10 ms	3.7x
CRAFT	Text Detection	831 ms	153 ms	5.4x

Measured with 3 warmup runs + 10 timed runs.

Tips¶

CoreML is available on macOS only. The addCoreML() call will fail on other platforms.
CUDA requires the onnxruntime_gpu artifact and a compatible NVIDIA driver + CUDA toolkit.
If the execution provider fails to initialize, ONNX Runtime silently falls back to CPU. Check logs for warnings.
For production workloads, benchmark both CPU and GPU — small models (like MiniLM) may be faster on CPU due to GPU data transfer overhead.
.sessionOptions() is composable — you can set multiple options in a single lambda.