The published Apache Druid container at Docker is a linux/amd64 only image. Running this on your Apple Silicon (M1 or M2 chipset) is slow.
Fortunately, it is super easy to build your own leveraging the binary distribution and existing docker.sh.
All of this is available in a Dockerfile and build script in the druid-m1 repository.
The build.sh
builds an arm64 image based on the version to be downloaded in the Dockerfile
.
A linux/amd64
container-based deployment of Apache Druid on the Apple M1 Silicon takes 2 minutes
(1:58.58 - test on Apple M1 Max with 64GB memory and 32GB allocated) to start and become available for processing.
An image built with linux/arm64
based linux images only takes 18 seconds (0:17.79) to become available.
Just need an arm64/v8 image, just download the druid-m1 project and run the build.sh
script.
What to know a little bit into how it was put together; continue on.
Updates
Updated to support Druid 30. The Container is now based on a Java17 based image. The distribution also changed required some additional changes. The changes align closely with Druid’s Standard Dockerfile.
Image
The process of creating this image isn’t complicated. Three major pieces went into it’s creation.
OS Architecture
First find and use containers that have an arm64/v8 image. Both “openjdk:17-slim-bullseye” and “busybox” have arm64/v8 images.
Software Installation
The Dockerfile
downloads and installs Druid and downloads and uses the druid.sh
that is being maintained by Apache Druid.
ARG DRUID_VERSION=30.0.1
ADD https://dlcdn.apache.org/druid/${DRUID_VERSION}/apache-druid-${DRUID_VERSION}-bin.tar.gz /tmp
ADD https://raw.githubusercontent.com/apache/druid/${DRUID_VERSION}/distribution/docker/druid.sh /druid.sh
Druid Extensions
Druid extensions are added by pull-deps
operation available with Druid.
For this build, the kafka-emitter
extension is included, but others are easy to add.
RUN \
java -cp "/opt/druid/lib/*" \
-Ddruid.extensions.directory="/opt/druid/extensions/" \
-Ddruid.extensions.hadoopDependenciesDir="/opt/druid/hadoop-dependencies/" \
org.apache.druid.cli.Main tools pull-deps --no-default-hadoop \
-c "org.apache.druid.extensions.contrib:kafka-emitter"
Why The Difference?
The Dockerfile
that is part of Apache Druid is all about building the software.
But since this is being done after a build is released; its approach is to used the fact that the binaries
are available for download.
New To Druid?
If you are new to Druid and want to see what it can do, check out the druid-late
demonstration within dev-local-demos.
It leverages a container-based ecosystem provided at dev-local.
Update the .env
file within the druid folder to point to your individually built arm64
image.
Reach Out
Please contact us if you would like to talk about online analytic processing or event-streaming.