Optimizing the Docker Container Image for C++ Microservices
In previous posts, we covered the basics of a C++ Microservices deployment including:
- Deploying HydraExpress into a Docker container.
- Building a container that exposes a user-defined C++ Microservice with HydraExpress running inside a Docker container.
With those basics in place, this blog will focus on optimization of the container in a C++ Microservices deployment. We'll examine how to structure the Dockerfile and the resulting Docker image to reduce the number of layers and disk space used.
Optimizing a Docker Environment
Start with the Dockerfile that we created in the last post, and build it in a clean environment (no cached items) results in an image (including all of its dependent layers) that takes ~905 MB of disk space.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hydraexpress latest 3626de2aaf27 About a minute ago 905MB
The image is composed of 19 layers, of varying sizes as can be seen with docker history:
$ docker history hydraexpress
IMAGE CREATED CREATED BY SIZE COMMENT
3626de2aaf27 3 minutes ago /bin/sh -c #(nop) ENTRYPOINT ["/entrypoint.… 0B
ffcff24b7483 3 minutes ago /bin/sh -c #(nop) COPY file:3ea4017ec3012e11… 63B
b1d244ce4919 3 minutes ago /bin/sh -c mkdir -p ${RWSF_HOME}/apps/servle… 296B
e9c0c9190a67 3 minutes ago /bin/sh -c mkdir -p ${RWSF_HOME}/apps-lib &&… 16kB
156707926d63 3 minutes ago /bin/sh -c cd /build && cmake3 ../src && make 187kB
7241437ae952 4 minutes ago /bin/sh -c mkdir -p /build 0B 56
d6fc9a947aa2 4 minutes ago /bin/sh -c #(nop) COPY dir:993b3c8a984443a43… 1.52kB
b45339df3de8 4 minutes ago /bin/sh -c yum install -y gcc-c++ make cmake3 287MB
fa30f220f666 4 minutes ago /bin/sh -c yum install -y epel-release 91.6MB
378b34919a3f 4 minutes ago /bin/sh -c /opt/download/hydraexpress.run … 160MB
27b2cc792760 4 minutes ago /bin/sh -c #(nop) ENV RWSF_HOME=/opt/perfor… 0B
4823d877e6a8 4 minutes ago /bin/sh -c #(nop) COPY file:1cdab928b8ba6c2e… 122B
2465aefd3096 4 minutes ago /bin/sh -c chmod a+x /opt/download/hydraexpr… 35.2MB
98d08398c045 4 minutes ago /bin/sh -c wget -q -O /opt/download/hydraexp… 35.2MB
ff84decf28b1 4 minutes ago /bin/sh -c mkdir -p /opt/download 0B
6d662ccc0723 4 minutes ago /bin/sh -c yum install -y wget 92.2MB
7e6257c9f8d8 5 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B
<missing> 5 weeks ago /bin/sh -c #(nop) LABEL org.label-schema.sc… 0B
<missing> 5 weeks ago /bin/sh -c #(nop) ADD file:61908381d3142ffba… 203MB
While some of that would be compressed in transit to and from the hosts where it would be deployed, the distribution size is still pretty large. Let’s see what we can do to reduce the overall size and number of layers in the final image.
Reduce the Size and Number of Layers
Starting at the top of the Dockerfile, we’re downloading and installing HydraExpress into the container, however we aren’t removing the installation media, or even the tool we used to download the installation media, wget. This download is also associated with seven of the layers in the image.
Let’s see if we can shrink this down using a multi-stage build of our Dockerfile. We’ll split the build into two steps.
First, we’ll create an image for downloading and installing HydraExpress. Second, we’ll create a fresh image that copies the HydraExpress installation space from the first image to the second. Since we’re only copying the installed product, all of the artifacts from the installation will be left behind, reducing the space required for the final image.
We’ll start by adjusting our Dockerfile by naming the first image:
Dockerfile |
---|
…
FROM centos:7 AS hydraexpress_install
…
Next, we’ll introduce a second image in our Dockerfile, after HydraExpress has been installed. This image will serve as the basis for the rest of the Dockerfile. We’ll then copy the HydraExpress installation directory from the previous image, and set the appropriate environment variables as we did before:
Dockerfile |
---|
…
RUN /opt/download/hydraexpress.run \
--mode unattended \
--prefix /opt/perforce/hydraexpress \
--license-file /opt/download/license.key
FROM centos:7
ENV RWSF_HOME /opt/perforce/hydraexpress
COPY --from=hydraexpress_install ${RWSF_HOME} ${RWSF_HOME}
RUN yum install -y epel-release
…
Note that the COPY
command in our Dockerfile is specifying the --from
argument, referring back to the label that we assigned to the first image.
With those changes in place, we've optimized the container by reducing:
- Overall image size was reduced to 742 MB.
- Number of layers in the final image dropped to 14.
That’s a nice start, but let's move onto the next stage of the build and see if we can find similar savings there.
Compile and Link Servlet
The next stage compiles and links our servlet instance. This requires a C++ compiler and related tools and produces a number of build artifacts that aren’t needed in the final image. Let’s split this out into its own build stage as well and see how that affects our final image.
First, as we did before we’ll add a label to the second stage so that we can reference it later on in our Dockerfile.
Dockerfile |
---|
…
RUN /opt/download/hydraexpress.run \
--mode unattended \
--prefix /opt/perforce/hydraexpress \
--license-file /opt/download/license.key
FROM centos:7 AS servlet_build
ENV RWSF_HOME /opt/perforce/hydraexpress
…
Second, we’ll introduce a new image after the build is complete. This will serve as the basis for the final image that will be produced. Similar to the build stage, we’ll set up the environment and copy HydraExpress to our new image:
Dockerfile |
---|
…
RUN cd /build && cmake3 ../src && make
FROM centos:7
ENV RWSF_HOME /opt/perforce/hydraexpress
COPY --from=hydraexpress_install ${RWSF_HOME} ${RWSF_HOME}
RUN mkdir -p ${RWSF_HOME}/apps-lib && \
cp -f /build/hello/libhello.so ${RWSF_HOME}/apps-lib
…
Finally, we need to copy the servlet files from the servlet_build stage to our final image. Since these files were already being copied into the appropriate locations under HydraExpress, we’ll simplify the steps by coping the files directly to their final locations:
Dockerfile |
---|
…
COPY --from=hydraexpress_install ${RWSF_HOME} ${RWSF_HOME}
COPY --from=servlet_build /build/hello/libhello.so ${RWSF_HOME}/apps-lib/
COPY --from=servlet_build /src/hello/WEB-INF ${RWSF_HOME}/apps/servlets/hello/WEB-INF/
COPY entrypoint.sh /entrypoint.sh
…
With those changes in place, the optimization provides substantial savings:
- Layer count drop to 9.
- Overall image size reduced to 363 MB.
There’s still more that can be done to optimize the container.
Copy Only Required Files
We’ll focus next on where HydraExpress is copied into the container. Our HydraExpress installation is a full deployment, including debug libraries and development tools that aren’t required for deployment. Instead of copying everything from the HydraExpress installation, let’s target our copies to just those files that are required.
Similar to before, we’ll create a new staging image to pull the components that we need into our final image:
Dockerfile |
---|
…
RUN cd /build && cmake3 ../src && make
FROM centos:7 AS hydraexpress_deploy
FROM centos:7
…
Next, we’ll specify the specific files that are needed from the base HydraExpress installation:
Dockerfile |
---|
…
FROM centos:7 AS hydraexpress_deploy
ENV RWSF_HOME /opt/perforce/hydraexpress
COPY --from=hydraexpress_install ${RWSF_HOME} /tmp
COPY --from=hydraexpress_install ${RWSF_HOME}/bin/rwagent \
${RWSF_HOME}/bin/rwsfserver* \
${RWSF_HOME}/bin/rwsfvars* \
${RWSF_HOME}/bin/
COPY --from=hydraexpress_install ${RWSF_HOME}/conf/loggers.xml \
${RWSF_HOME}/conf/rwagent.xml \
${RWSF_HOME}/conf/
COPY --from=hydraexpress_install ${RWSF_HOME}/conf/locale/ \
${RWSF_HOME}/conf/locale/
COPY --from=hydraexpress_install ${RWSF_HOME}/conf/servlet/ \
${RWSF_HOME}/conf/servlet/
COPY --from=hydraexpress_install ${RWSF_HOME}/lib/libcrypto.so.1.1 \
${RWSF_HOME}/lib/libicu*.so.58.2 \
${RWSF_HOME}/lib/librwsf_agent_methods20012d.so \
${RWSF_HOME}/lib/librwsf_core20012d.so \
${RWSF_HOME}/lib/librwsf_handlers20012d.so \
${RWSF_HOME}/lib/librwsf_icu20012d.so \
${RWSF_HOME}/lib/librwsf_message20012d.so \
${RWSF_HOME}/lib/librwsf_net20012d.so \
${RWSF_HOME}/lib/librwsf_rwagent*20012d.so \
${RWSF_HOME}/lib/librwsf_servlet20012d.so \
${RWSF_HOME}/lib/librwsf_servlet_xml20012d.so \
${RWSF_HOME}/lib/librwsf_ssl20012d.so \
${RWSF_HOME}/lib/librwsf_transport_http20012d.so \
${RWSF_HOME}/lib/librwsf_xmlbinding20012d.so \
${RWSF_HOME}/lib/libssl.so.1.1 \
${RWSF_HOME}/lib/
COPY --from=hydraexpress_install ${RWSF_HOME}/license/ \
${RWSF_HOME}/license/
FROM centos:7
…
Since we’re only exposing the HTTP interface from HydraExpress, we’ll also disable the other protocols. Not only does this reduce the number of libraries that we need to deploy, but it also reduces the startup time for HydraExpress, and reduces the number of potential attack vectors on the container (fewer open ports).
Dockerfile |
---|
…
COPY --from=hydraexpress_install ${RWSF_HOME}/license/ \
${RWSF_HOME}/license/
RUN sed -i '//d' ${RWSF_HOME}/conf/rwagent.xml
FROM centos:7
…
Finally, we’ll update our final image to leverage our new deployment image instead of the original installation image:
Dockerfile |
---|
…
ENV RWSF_HOME /opt/perforce/hydraexpress
COPY --from=hydraexpress_deploy ${RWSF_HOME} ${RWSF_HOME}
COPY --from=servlet_build /build/hello/libhello.so ${RWSF_HOME}/apps-lib/
…
With those changes in place our build time has crept up to ~77s, however our image size has dropped to just 243 MB:
…
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hydraexpress latest 1487eb5f6c68 3 minutes ago 243MB
We’ve also seen our layer count drop from a high of 19 layers down to 9.
$ docker history hydraexpress
IMAGE CREATED CREATED BY SIZE COMMENT
1487eb5f6c68 5 minutes ago /bin/sh -c #(nop) ENTRYPOINT ["/entrypoint.… 0B
f58f257b5410 5 minutes ago /bin/sh -c #(nop) COPY file:3ea4017ec3012e11… 63B
3202357ad0a2 5 minutes ago /bin/sh -c #(nop) COPY dir:188a8bfd4cab69ae4… 296B
fc10d0cfd32a 5 minutes ago /bin/sh -c #(nop) COPY file:9a0d4bd22e3242de… 16kB
4225ac6d2ca6 5 minutes ago /bin/sh -c #(nop) COPY dir:9072495080d86316c… 39.9MB
576d25cd2ccd 6 minutes ago /bin/sh -c #(nop) ENV RWSF_HOME=/opt/perfor… 0B
7e6257c9f8d8 6 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B
<missing> 6 weeks ago /bin/sh -c #(nop) LABEL org.label-schema.sc… 0B
<missing> 6 weeks ago /bin/sh -c #(nop) ADD file:61908381d3142ffba… 203MB
We’ve significantly reduced the size of our docker image, however there’s one part we haven’t tackled, the base OS.
Selecting a Base OS
We originally chose CentOS 7 as it is a supported platform with HydraExpress. Unfortunately, the base image for CentOS is relatively large (203 MB), and other supported operating systems have similarly large footprints.
While deploying on another operating system is technically unsupported (any reported issues will need to be reproducible on a supported OS), there are many distributions that are compatible enough with CentOS that we can deploy our HydraExpress container on them. Since our goal is to reduce our image size as much as possible, let’s try deploying HydraExpress on Alpine Linux, which boasts a base OS image of just 5 MB.
To run HydraExpress on Alpine Linux, we need to deploy some additional packages beyond the base operating system. Namely we’ll need bash (to support the HydraExpress environment scripts), libstdc++, and libc6-compat. We’ll replace the FROM statement in our final image to reflect these changes:
Dockerfile |
---|
…
RUN sed -i '/<rwsf:connector name="AJP/,/<\/rwsf:connector>/d' ${RWSF_HOME}/conf/rwagent.xml
FROM alpine:latest
RUN apk update --no-cache && apk upgrade --no-cache && apk add --no-cache bash libstdc++ libc6-compat
ENV RWSF_HOME /opt/perforce/hydraexpress
…
With those changes in place, we see our build times climb to ~86s, however the size of our image drops to only 51 MB:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hydraexpress latest 88530e028da6 About a minute ago 51MB
We can still start our HydraExpress container and verify that the “hello” service we wrote in the previous blog post is still up and running.
Successfully Reduce Container Size
With the steps outlined above we’ve successfully reduced the size of our HydraExpress container by over 70% on a supported operating system, and if we’re willing to deploy on an unsupported OS, by over 90%.
A similar process can be applied to other Docker containers to reduce their footprint as well. Next time we’ll continue to evolve our HydraExpress deployment, incorporating another C/C++ library.
Want to try this yourself? Contact us for an evaluation version.
For Reference and Further Reading
- Part 1: C++ Microservices in Docker
- Part 2: Building Custom Servlets
- Part 4: Adding Numerical Libraries
For reference, here is the complete Dockerfile after refactoring:
Dockerfile |
---|
FROM centos:7 AS hydraexpress_install
RUN yum install -y wget
RUN mkdir -p /opt/download
RUN wget -q -O /opt/download/hydraexpress.run \
https://dslwuu69twiif.cloudfront.net/hydraexpress/2020/hydraexpress_2020_eval_linux_x86-64_gcc_4.8.run
RUN chmod a+x /opt/download/hydraexpress.run
COPY license.key /opt/download/license.key
ENV RWSF_HOME /opt/perforce/hydraexpress
RUN /opt/download/hydraexpress.run \
--mode unattended \
--prefix /opt/perforce/hydraexpress \
--license-file /opt/download/license.key
FROM centos:7 AS servlet_build
ENV RWSF_HOME /opt/perforce/hydraexpress
COPY --from=hydraexpress_install ${RWSF_HOME} ${RWSF_HOME}
RUN yum install -y epel-release
RUN yum install -y gcc-c++ make cmake3
COPY src/ /src/
RUN mkdir -p /build
RUN cd /build && cmake3 ../src && make
FROM centos:7 AS hydraexpress_deploy
ENV RWSF_HOME /opt/perforce/hydraexpress
COPY --from=hydraexpress_install ${RWSF_HOME} /tmp
COPY --from=hydraexpress_install ${RWSF_HOME}/bin/rwagent \
${RWSF_HOME}/bin/rwsfserver* \
${RWSF_HOME}/bin/rwsfvars* \
${RWSF_HOME}/bin/
COPY --from=hydraexpress_install ${RWSF_HOME}/conf/loggers.xml \
${RWSF_HOME}/conf/rwagent.xml \
${RWSF_HOME}/conf/
COPY --from=hydraexpress_install ${RWSF_HOME}/conf/locale/ \
${RWSF_HOME}/conf/locale/
COPY --from=hydraexpress_install ${RWSF_HOME}/conf/servlet/ \
${RWSF_HOME}/conf/servlet/
COPY --from=hydraexpress_install ${RWSF_HOME}/lib/libcrypto.so.1.1 \
${RWSF_HOME}/lib/libicu*.so.58.2 \
${RWSF_HOME}/lib/librwsf_agent_methods20012d.so \
${RWSF_HOME}/lib/librwsf_core20012d.so \
${RWSF_HOME}/lib/librwsf_handlers20012d.so \
${RWSF_HOME}/lib/librwsf_icu20012d.so \
${RWSF_HOME}/lib/librwsf_message20012d.so \
${RWSF_HOME}/lib/librwsf_net20012d.so \
${RWSF_HOME}/lib/librwsf_rwagent*20012d.so \
${RWSF_HOME}/lib/librwsf_servlet20012d.so \
${RWSF_HOME}/lib/librwsf_servlet_xml20012d.so \
${RWSF_HOME}/lib/librwsf_ssl20012d.so \
${RWSF_HOME}/lib/librwsf_transport_http20012d.so \
${RWSF_HOME}/lib/librwsf_xmlbinding20012d.so \
${RWSF_HOME}/lib/libssl.so.1.1 \
${RWSF_HOME}/lib/
COPY --from=hydraexpress_install ${RWSF_HOME}/license/ \
${RWSF_HOME}/license/
RUN sed -i '/<rwsf:connector name="HTTPS/,/<\/rwsf:connector>/d' ${RWSF_HOME}/conf/rwagent.xml
RUN sed -i '/<rwsf:connector name="AJP/,/<\/rwsf:connector>/d' ${RWSF_HOME}/conf/rwagent.xml
FROM alpine:latest
RUN apk update --no-cache && apk upgrade --no-cache && apk add --no-cache bash libstdc++ libc6-compat
ENV RWSF_HOME /opt/perforce/hydraexpress
COPY --from=hydraexpress_deploy ${RWSF_HOME} ${RWSF_HOME}
COPY --from=servlet_build /build/hello/libhello.so ${RWSF_HOME}/apps-lib/
COPY --from=servlet_build /src/hello/WEB-INF ${RWSF_HOME}/apps/servlets/hello/WEB-INF/
COPY entrypoint.sh /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]