Java gRPC and HTTP Clients for Python AI Inference Service
When deploying AI models in production, we often ask:
Should we call the inference server via gRPC or HTTP?
In this post, I build two Java clients —one using gRPC and one using HTTP— to call a Python AI inference service running a ResNet18 model.
The Python server provides two endpoints:
- A gRPC service for high-performance, structured communication
- An HTTP API for compatibility with tools like
curlor browser requests
Finally, we’ll see a working Java project that can talk to the Python server using either RPC or HTTP.
1. Project Overview
- Build a Java client to send image data to the server and receive Top-K classification predictions.
- Support both gRPC calls and HTTP requests.
Project Structure Below

2. Java Project Setup (Maven Dependencies)
Key libraries:
grpc-netty-shaded,grpc-protobuf,grpc-stubprotobuf-javaokhttp(HTTP client)jackson-databind(JSON parsing)
Example snippet from pom.xml:
<dependencies>
<!-- gRPC -->
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-netty-shaded</artifactId>
<version>1.63.0</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-protobuf</artifactId>
<version>1.63.0</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-stub</artifactId>
<version>1.63.0</version>
</dependency>
<!-- Protobuf runtime -->
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>3.25.3</version>
</dependency>
<!-- HTTP client & JSON -->
<dependency>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
<version>4.12.0</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.17.1</version>
</dependency>
</dependencies>
3. Implementing the gRPC Client in Java
Client.java reads an image file, sends it to the Python gRPC server, and prints predictions.
ManagedChannel channel = ManagedChannelBuilder
.forAddress("127.0.0.1", 50051)
.usePlaintext()
.build();
ClassifierGrpc.ClassifierBlockingStub stub = ClassifierGrpc.newBlockingStub(channel);
ImageRequest request = ImageRequest.newBuilder()
.setImage(ByteString.copyFrom(Files.readAllBytes(Paths.get(imagePath))))
.setTopk(topk)
.build();
PredictionResponse resp = stub.predict(request);
resp.getTopkList().forEach(p ->
System.out.printf("%-30s %.4f%n", p.getLabel(), p.getProb())
);
4. Implementing the HTTP Client in Java
Our FastAPI endpoint expects:
- POST to
/predict/file?topk=5 multipart/form-datawithfile=@image
HttpTester.java:
public void predictMultipart(String imagePath, int topk) throws IOException {
File img = new File(imagePath);
RequestBody fileBody = RequestBody.create(img, MediaType.parse("image/*"));
MultipartBody requestBody = new MultipartBody.Builder()
.setType(MultipartBody.FORM)
.addFormDataPart("file", img.getName(), fileBody)
.build();
String urlWithParam = String.format("%s?topk=%d", endpoint, topk);
Request request = new Request.Builder()
.url(urlWithParam)
.post(requestBody)
.addHeader("accept", "application/json")
.build();
try (Response resp = client.newCall(request).execute()) {
String json = resp.body().string();
System.out.println(json);
}
}
5. Results and Performance Insights
This project implements HTTP and gRPC clients against the same Python inference server.
- gRPC delivered faster responses with smaller payloads
- HTTP was simpler to test and integrate during development
With gRPC and HTTP clients in Java, we can connect to the same AI inference server in whichever way makes the most sense for our scenario.