Posted by Neal Brooks on Jun 27, 2019
Compiling wkhtmltopdf for use inside an AWS Lambda function with Bref is easier than you'd think
Here at MyBuilder we recently had the chance to work on a green-field project, which of course meant we were able to play with all the shiny new toys we’ve been dying to try out for a while.
We needed to extract a standalone PDF generation service from some work we’d already done in another project. It was a perfect candidate for running inside AWS Lambda, and thus also finally giving Bref a try in production.
The previous project had used Snappy for generating reports based on templated HTML output, and we have a bunch more projects coming up which would also use the same solution.
Because we knew we wouldn’t be able to predict the volume of traffic to the service at any given time (although night-times are likely to see zero traffic), we knew it would need to be able to quickly scale both up and down. In order to keep things clean and separate, we also wanted to avoid putting this service on our regular web servers.
FaaS / Serverless
FaaS / serverless is a great candidate for these kinds of services
(Amazon’s FaaS offering being known as Lambda). With FaaS you push your code into the
cloud where it sits waiting for an event of some sort to come in (this could be an HTTP
request, or an event such as an
item being added to a queue, or any number of things AWS lets you react to). When the event you’re waiting for happens,
the FaaS provider invokes your code for you and returns the response.
The key point is that while your code is sitting waiting for an event to happen, you don’t get charged for it. You only pay for the resources your code used while it was actually running. Furthermore, the cost of invocation is tiny. Conveniently, the FaaS provider will also handle scaling for you.
Considering that entire websites tend to just sit idle on servers waiting for the next web request to come in, using this approach means you could effectively shut down all your web servers, largely eliminating your hosting costs. For this use-case, however, we simply needed to react to events coming through Amazon SNS.
Serverless PHP
Bref is an open-source project which aims to assist PHP developers getting their applications running on Lambda. Running PHP on Lambda leverages a system AWS introduced at the end of 2018, called layers.
The layer system sounds confusing at first, but the essential take-away is that a layer acts as a program or binary that you can ‘install’ into your Lambda function as a dependency, simply by including it in your FaaS config. You can include multiple layers within your Lambda function (up to 5).
I’m going to assume a base-level of familiarity or knowledge of Lambda and layers for the rest of this article. If you you’re finding it difficult to follow the concepts I strongly advise you to try deploying a sample app with Bref first to help you get an understanding of how these things play together.
Rolling our own layer
As mentioned above, we were using Snappy in our PHP application for converting HTML to PDF. However, Snappy has a dependency on having the wkhtmltopdf binary installed on the server the PHP application is running on.
Bref provides the base PHP layer which allows us to run
our application code, but it doesn’t provide the wkhtmltopdf
binary that we needed for doing the actual HTML-to-PDF
conversion.
When we checked the wkhtmltopdf downloads page we saw that there was no
Amazon Linux
native version available.
Therefore we had a few options open to us, which were:
- Find a pre-compiled Lambda-compatible binary on the internet & include it with our application code
- Find a pre-compiled Lambda-compatible binary on the internet & create our own Lambda layer from it
- Compile the
wkhtmltopdf
binary ourselves & create a layer from it
We were nervous about finding pre-compiled binaries on the internet & including those in our code for a couple of reasons:
- We couldn’t ever be 100% sure that whoever created the binary was completely trustworthy
- The Lambda-compatible pre-compiled binaries we found were all at least one minor version out-of-date
As you’ve probably guessed by the title of this article, we went for option 3. I was excited about this because although I’ve been contributing to Bref for a while, I’d not yet compiled any layers myself (despite it being one of the most-asked questions in the Bref Slack community). I should warn you that the first time we did this it took several hours and we went down the wrong road a few times. However, we got it working in the end. This is how we achieved our goal.
Beginning the journey
wkhtmltopdf
has a separate repo for building and packaging the binary,
along with instructions. Because we know that we’re building for AmazonLinux
, we need to make sure we’ve got Docker
installed & running (as-per the wkhtmltopdf
instructions).
We also need to make sure we’ve got python-yaml
installed (do pip install pyyaml
if you’re on a Mac), along with
git
and p7zip
.
The build process requires you to have both the packaging and source repositories available on your host machine, so
first create a wkhtmltopdf-build
directory to keep everything together.
mkdir wkhtmltopdf-build && cd wkhtmltopdf-build
Next, clone both repositories into the wkhtmltopdf-build
parent directory.
git clone git@github.com:wkhtmltopdf/packaging.git
git clone git@github.com:wkhtmltopdf/wkhtmltopdf.git
This should result in the following directory structure:
wkhtmltopdf-build/
|- packaging
|- wkhtmltopdf
Making sure it’s self-contained
We need to create a static
build, which means that all of the dependencies for wkhtmltopdf
are self-contained
within the binary itself (basically, we can’t rely on the AmazonLinux
OS to provide anything for us). These are
provided by wkhtmltopdf
as ‘linked’ Git repositories (submodules). To install them, cd
into the wkhtmltopdf
directory and run:
git submodule update
This pulls the qt
library into the project.
Convincing wkhtmltopdf/packaging
to recognise Amazon Linux
By far the hardest part of creating this whole thing was figuring out how to use the ./build
script to build our own
binary. Once we’d figured out the rules we were able to boil it down to a few simple steps:
Everything we do from now-on will be within the packaging
directory, so let’s just go straight into there:
cd packaging
Now we need to edit build.yaml
and add our own AmazonLinux
config. I know that AmazonLinux
is based upon
RedHat / CentOS, and luckily there’s already a centos64
config in the file, so I should just be able to copy &
modify that.
So first I’ll create an entry called AmazonLinux64
. I can leave the source
directive the same as the
centos64
source
, as the packages should be compatible.
However, I need to change the from
container to be a version of the AmazonLinux
OS.
The Lambda Docker images are provided, so we can just include those instead.
amazonLinux64:
source: docker/Dockerfile.centos64
args:
from: lambci/lambda:build
Everything else can be left as it is, so my final amazonLinux64
config should be as follows:
amazonLinux64:
source: docker/Dockerfile.centos64
args:
from: lambci/lambda:build
output: rpm
arch: x86_64
depend: >
ca-certificates
fontconfig
freetype
glibc
libjpeg
libpng
libstdc++
libX11
libXext
libXrender
openssl
xorg-x11-fonts-75dpi
xorg-x11-fonts-Type1
zlib
Getting the engine running
Next we need to create the Docker container which will do the actual compilation of the wkthmltopdf
binary, which the
build
script provides for us:
./build docker-images amazonLinux64
Kicking off the build
Now we’re ready to actually make our binary. The build
script provides us with a compile-docker
option which takes
the source code and compiles it inside our AmazonLinux
container. Note that there is a package-docker
command which
would create a .rpm
file for us (or .deb
if it was a Debian / Ubuntu system), but we want the actual binary rather
than an installer.
It asks for the name of the build target (amazonLinux64
), the relative path to the source code (../wkhtmltopdf
), and
where you want the binary to be put. I chose to call it wkhtmltopdf-bin
.
./build compile-docker amazonLinux64 ../wkhtmltopdf wkhtmltopdf-bin
What happens next is that Docker spins up an instance, installs all the dependencies, and runs the compiler. This could take some time, so it’s a good idea to kick this off just before you go to lunch or to bed…
Creating a layer from the binary
Once the wkhtmltopdf
binary has been produced (you should be able to find it in wkhtmltopdf-bin/wkthtmltox/bin/
), you
will need to package it up into a layer for inclusion in your Lambda. We don’t care about the wkhtmltoimage
binary it
also produced, so let’s just zip up the single PDF binary.
zip -r wkhtmltopdf.zip wkhtmltopdf
Now we need to upload the zip file to AWS for inclusion in our Lambda functions. You can use the Lambda console if you prefer a visual interface, but considering we’ve done everything else through the command line here we might as well use the AWS CLI
aws lambda publish-layer-version --layer-name mybuilder-wkhtmltopdf --description "standalone wkhtmltopdf binary" --zip-file wkhtmltopdf.zip
The command will send you back a response in JSON
. The thing to pay attention to is the LayerVersionArn
:
{
"LayerVersionArn": "arn:aws:lambda:eu-west-1:<redacted>:layer:mybuilder-wkhtmltopdf:1"
}
(Note that there’s also a LayerArn
, but we need the LayerVersionArn
because it points to the actual version of the
layer we created)
Now, in my template.yaml
file I can include the layer we just published under my layers
configuration along with the
base PHP layer provided by Bref:
Resources:
PdfService:
Type: AWS::Serverless::Function
Properties:
# ...
Layers:
- 'arn:aws:lambda:eu-west-1:<redacted>:layer:mybuilder-wkhtmltopdf:1'
- 'arn:aws:lambda:eu-west-1:209497400698:layer:php-73:7'
How to call the binary
AWS makes the content of your layers accessible in /opt
. So we need to
tell Snappy where to find the binary by pointing it to our own compiled version:
<?php
use Knp\Snappy\Pdf;
$snappy = new Pdf('/opt/wkhtmltopdf');
What you do with the PDF after it has been generated is up-to-you. One idea might be to push it to an S3 bucket so you can download it later on, or maybe you could email it directly to your clients from within your Lambda service (AWS SES might come in handy for that).
Finishing up
Here we’ve delved into our first attempt at compiling a binary package specifically for the AmazonLinux
environment,
and then creating a layer for that package which we can include and use within our Lambda functions. Packaging the layer
was by-far less complicated than compiling it was.
We found this to be a really useful exercise for understanding how to create layers for Lambda, and how to use them in serverless PHP applications on top of Bref. This use-case was about as complicated as we could make it, and of course I’ve omitted implementation details like how we received the HTML we wanted to convert to PDF, and decoding the event. These aspects are more well covered in general serverless articles and AWS documentation.
Obviously for this project we needed to include the wkhtmltopdf
program, but a lot of the use-cases for extra layers
don’t require any extra software, they might just need to alter the runtime configuration of how PHP is invoked. Those
cases are much simpler than we’ve gone into here, because you can skip the entire compilation step and just zip up your
own runtime configuration (we’ll follow up with another post about that!).
Go forth, and PDF-ify.