Posted by Neal Brooks on Jun 27, 2019

Compiling wkhtmltopdf for use inside an AWS Lambda function with Bref is easier than you'd think

Here at MyBuilder we recently had the chance to work on a green-field project, which of course meant we were able to play with all the shiny new toys we’ve been dying to try out for a while.

We needed to extract a standalone PDF generation service from some work we’d already done in another project. It was a perfect candidate for running inside AWS Lambda, and thus also finally giving Bref a try in production.

The previous project had used Snappy for generating reports based on templated HTML output, and we have a bunch more projects coming up which would also use the same solution.

Because we knew we wouldn’t be able to predict the volume of traffic to the service at any given time (although night-times are likely to see zero traffic), we knew it would need to be able to quickly scale both up and down. In order to keep things clean and separate, we also wanted to avoid putting this service on our regular web servers.

FaaS / Serverless

FaaS / serverless is a great candidate for these kinds of services (Amazon’s FaaS offering being known as Lambda). With FaaS you push your code into the cloud where it sits waiting for an event of some sort to come in (this could be an HTTP request, or an event such as an item being added to a queue, or any number of things AWS lets you react to). When the event you’re waiting for happens, the FaaS provider invokes your code for you and returns the response.

The key point is that while your code is sitting waiting for an event to happen, you don’t get charged for it. You only pay for the resources your code used while it was actually running. Furthermore, the cost of invocation is tiny. Conveniently, the FaaS provider will also handle scaling for you.

Considering that entire websites tend to just sit idle on servers waiting for the next web request to come in, using this approach means you could effectively shut down all your web servers, largely eliminating your hosting costs. For this use-case, however, we simply needed to react to events coming through Amazon SNS.

Serverless PHP

Bref is an open-source project which aims to assist PHP developers getting their applications running on Lambda. Running PHP on Lambda leverages a system AWS introduced at the end of 2018, called layers.

The layer system sounds confusing at first, but the essential take-away is that a layer acts as a program or binary that you can ‘install’ into your Lambda function as a dependency, simply by including it in your FaaS config. You can include multiple layers within your Lambda function (up to 5).

I’m going to assume a base-level of familiarity or knowledge of Lambda and layers for the rest of this article. If you you’re finding it difficult to follow the concepts I strongly advise you to try deploying a sample app with Bref first to help you get an understanding of how these things play together.

Rolling our own layer

As mentioned above, we were using Snappy in our PHP application for converting HTML to PDF. However, Snappy has a dependency on having the wkhtmltopdf binary installed on the server the PHP application is running on.

Bref provides the base PHP layer which allows us to run our application code, but it doesn’t provide the wkhtmltopdf binary that we needed for doing the actual HTML-to-PDF conversion.

When we checked the wkhtmltopdf downloads page we saw that there was no Amazon Linux native version available. Therefore we had a few options open to us, which were:

  1. Find a pre-compiled Lambda-compatible binary on the internet & include it with our application code
  2. Find a pre-compiled Lambda-compatible binary on the internet & create our own Lambda layer from it
  3. Compile the wkhtmltopdf binary ourselves & create a layer from it

We were nervous about finding pre-compiled binaries on the internet & including those in our code for a couple of reasons:

  1. We couldn’t ever be 100% sure that whoever created the binary was completely trustworthy
  2. The Lambda-compatible pre-compiled binaries we found were all at least one minor version out-of-date

As you’ve probably guessed by the title of this article, we went for option 3. I was excited about this because although I’ve been contributing to Bref for a while, I’d not yet compiled any layers myself (despite it being one of the most-asked questions in the Bref Slack community). I should warn you that the first time we did this it took several hours and we went down the wrong road a few times. However, we got it working in the end. This is how we achieved our goal.

Beginning the journey

wkhtmltopdf has a separate repo for building and packaging the binary, along with instructions. Because we know that we’re building for AmazonLinux, we need to make sure we’ve got Docker installed & running (as-per the wkhtmltopdf instructions).

We also need to make sure we’ve got python-yaml installed (do pip install pyyaml if you’re on a Mac), along with git and p7zip.

The build process requires you to have both the packaging and source repositories available on your host machine, so first create a wkhtmltopdf-build directory to keep everything together.

mkdir wkhtmltopdf-build && cd wkhtmltopdf-build

Next, clone both repositories into the wkhtmltopdf-build parent directory.

git clone git@github.com:wkhtmltopdf/packaging.git
git clone git@github.com:wkhtmltopdf/wkhtmltopdf.git

This should result in the following directory structure:

wkhtmltopdf-build/
    |- packaging
    |- wkhtmltopdf 

Making sure it’s self-contained

We need to create a static build, which means that all of the dependencies for wkhtmltopdf are self-contained within the binary itself (basically, we can’t rely on the AmazonLinux OS to provide anything for us). These are provided by wkhtmltopdf as ‘linked’ Git repositories (submodules). To install them, cd into the wkhtmltopdf directory and run:

git submodule update

This pulls the qt library into the project.

Convincing wkhtmltopdf/packaging to recognise Amazon Linux

By far the hardest part of creating this whole thing was figuring out how to use the ./build script to build our own binary. Once we’d figured out the rules we were able to boil it down to a few simple steps:

Everything we do from now-on will be within the packaging directory, so let’s just go straight into there:

cd packaging

Now we need to edit build.yaml and add our own AmazonLinux config. I know that AmazonLinux is based upon RedHat / CentOS, and luckily there’s already a centos64 config in the file, so I should just be able to copy & modify that.

So first I’ll create an entry called AmazonLinux64. I can leave the source directive the same as the centos64 source, as the packages should be compatible.

However, I need to change the from container to be a version of the AmazonLinux OS. The Lambda Docker images are provided, so we can just include those instead.

  amazonLinux64:
    source: docker/Dockerfile.centos64
    args:
      from: lambci/lambda:build

Everything else can be left as it is, so my final amazonLinux64 config should be as follows:

  amazonLinux64:
    source: docker/Dockerfile.centos64
    args:
      from: lambci/lambda:build
    output: rpm
    arch:   x86_64
    depend: >
      ca-certificates
      fontconfig
      freetype
      glibc
      libjpeg
      libpng
      libstdc++
      libX11
      libXext
      libXrender
      openssl
      xorg-x11-fonts-75dpi
      xorg-x11-fonts-Type1
      zlib   

Getting the engine running

Next we need to create the Docker container which will do the actual compilation of the wkthmltopdf binary, which the build script provides for us:

./build docker-images amazonLinux64

Kicking off the build

Now we’re ready to actually make our binary. The build script provides us with a compile-docker option which takes the source code and compiles it inside our AmazonLinux container. Note that there is a package-docker command which would create a .rpm file for us (or .deb if it was a Debian / Ubuntu system), but we want the actual binary rather than an installer.

It asks for the name of the build target (amazonLinux64), the relative path to the source code (../wkhtmltopdf), and where you want the binary to be put. I chose to call it wkhtmltopdf-bin.

./build compile-docker amazonLinux64 ../wkhtmltopdf wkhtmltopdf-bin

What happens next is that Docker spins up an instance, installs all the dependencies, and runs the compiler. This could take some time, so it’s a good idea to kick this off just before you go to lunch or to bed…

Creating a layer from the binary

Once the wkhtmltopdf binary has been produced (you should be able to find it in wkhtmltopdf-bin/wkthtmltox/bin/), you will need to package it up into a layer for inclusion in your Lambda. We don’t care about the wkhtmltoimage binary it also produced, so let’s just zip up the single PDF binary.

zip -r wkhtmltopdf.zip wkhtmltopdf

Now we need to upload the zip file to AWS for inclusion in our Lambda functions. You can use the Lambda console if you prefer a visual interface, but considering we’ve done everything else through the command line here we might as well use the AWS CLI

aws lambda publish-layer-version --layer-name mybuilder-wkhtmltopdf --description "standalone wkhtmltopdf binary" --zip-file wkhtmltopdf.zip

The command will send you back a response in JSON. The thing to pay attention to is the LayerVersionArn:

{
  "LayerVersionArn": "arn:aws:lambda:eu-west-1:<redacted>:layer:mybuilder-wkhtmltopdf:1"
}

(Note that there’s also a LayerArn, but we need the LayerVersionArn because it points to the actual version of the layer we created)

Now, in my template.yaml file I can include the layer we just published under my layers configuration along with the base PHP layer provided by Bref:

Resources:
    PdfService:
        Type: AWS::Serverless::Function
        Properties:
            # ...
            Layers:
                - 'arn:aws:lambda:eu-west-1:<redacted>:layer:mybuilder-wkhtmltopdf:1'
                - 'arn:aws:lambda:eu-west-1:209497400698:layer:php-73:7'

How to call the binary

AWS makes the content of your layers accessible in /opt. So we need to tell Snappy where to find the binary by pointing it to our own compiled version:

<?php

use Knp\Snappy\Pdf;

$snappy = new Pdf('/opt/wkhtmltopdf');

What you do with the PDF after it has been generated is up-to-you. One idea might be to push it to an S3 bucket so you can download it later on, or maybe you could email it directly to your clients from within your Lambda service (AWS SES might come in handy for that).

Finishing up

Here we’ve delved into our first attempt at compiling a binary package specifically for the AmazonLinux environment, and then creating a layer for that package which we can include and use within our Lambda functions. Packaging the layer was by-far less complicated than compiling it was.

We found this to be a really useful exercise for understanding how to create layers for Lambda, and how to use them in serverless PHP applications on top of Bref. This use-case was about as complicated as we could make it, and of course I’ve omitted implementation details like how we received the HTML we wanted to convert to PDF, and decoding the event. These aspects are more well covered in general serverless articles and AWS documentation.

Obviously for this project we needed to include the wkhtmltopdf program, but a lot of the use-cases for extra layers don’t require any extra software, they might just need to alter the runtime configuration of how PHP is invoked. Those cases are much simpler than we’ve gone into here, because you can skip the entire compilation step and just zip up your own runtime configuration (we’ll follow up with another post about that!).

Go forth, and PDF-ify.

Jobs at MyBuilder and Instapro

We need experienced software engineers who love their craft and want to share their hard-earned knowledge.

View vacancies
comments powered by Disqus