June 27, 2023

How to create AWS Lambda Layer for Pandas with Docker + Python on MacOS

Pandas is an extremely useful and popular library for data manipulation and processing. In this post, I would like to highlight a simple and straight-forward method for generating a layer

AWS Lambda Layers provide a robust framework for dependency management, such that instead of bundling the dependencies with the lambda code, which could also increase the size of the lambda itself, but also allows you to manage the dependencies effectively

What is a Lambda Layer

A layer is just a set of libraries that are attached with the lambda. This allows for the libraries to exist independently of the lambda code. Also, it allows for 02 different lambdas, which are using the same set of dependencies, to utilise a single layer, instead of bundling the dependencies along with each of the lambda

You can read more on Lambda Layers here: https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html

Requirements

For this, we need to have docker installed on our system. Thus, if not already pls visit Docker's website and download the relevant package for your system

I personally find blogs with steps and screenshots to be the most helpful when you are following along, thus for this I'll try to keep the same methodology

Step 1: Create a Docker file

Now since we will be using a docker container for creating the layer, thus we first need to build the container

You can choose to use any pre-existing containers, but its better to have a container that has all the pre-requisites in place.

Thus go ahead and create the a new directory and copy the following Dockerfile:

Now lets go through the Contents of the Dockerfile one by one

FROM amazonlinux:2.0.20230515.0

This block is fetching the base container image, that is amazonlinux:2.0.20230515.0 . This base image is the one that is recommended by AWS while building the layers for Lambda

RUN ulimit -n 1024 && yum -y update && yum -y install \
    amazon-linux-extras 

RUN amazon-linux-extras enable python3.8
RUN ulimit -n 1024 && yum -y update && yum -y install \
    python38 \
    python38-pip \
    python38-devel \
    zip \
    && yum clean all

RUN rm -rf /python/lib/python3.8/site-packages/__pycache__
RUN python3.8 -m pip install --no-cache-dir pip==22.1.2
RUN pip install --no-cache-dir virtualenv==20.14.1

These are all the commands to install python3.8 and install the relevant pip version for creating the layer

While prepping the container, you can change the commands depending upon the version of Python and pip you require

Now, we need to specify the libraries we want to add to the layer. Thus, for this example there is only one library that we need and that is Pandas

Thus, go ahead and add it to a simple text file

Now I have added the version of the library to control which version gets installed. This might help in catering the dependency issues

Step 3: Create a bash file

Now you can go ahead and generating the docker image using the following command:

docker build -t test_image .

But to automate the process of installing the dependencies and giving you a nice little zip file that you can just upload to Lambda console, the following script is available:

Now on line 8, this depends upon another script by the name of docker_install.sh

Thus, here is the code associated with the script

Now, as you can see the docker_install.sh is used for actually running the commands within the container that will download and install the library dependencies. Also, once the libraries are downloaded, it will bundle them in a nice little zip file with the name python.zip which is what AWS Lambda requires.

Step 4: Execute!

Now, make sure that you have all the files in the same folder

The directory structure should be something like this

layer_folder - Dockerfile - docker_install.sh - requirements.txt - runner.sh

Once you have the files in place, just execute the runner.sh through the following command:

./runner.sh

You should not need sudo privileges for this

Now, you should see the output of the execution in stdout that is in your terminal session

If you want to have logs for the execution stored in a logfile, you can use the following command:

./runner.sh >> logs.txt

This will append the execution events in the log file

Assuming you have successfully executed the scripts, you should have python.zip folder in the same directory

We need that zip file to upload to AWS Lambda console

Now head on over to the AWS management Console and on to the AWS Lambda console

We need to navigate to the Layers dashboard

As you can see there are no layers as of yet. We will be creating one in just a moment

Click on Create Layer and lets upload the layer

Now while uploading, you can either upload the zip file here. Or have it in an S3 bucket and then share the S3 URL of the object

For the sake of example, we are going to upload it directly here.

Thus, select the file in your system and enter the following:

Since we created the layers using the x86_64 architecture and it only supports python3.8 thus, we specified the required in the fields

Finally, press Create and this will create you first version of the layer

Now, you just need to add this layer to your lambda function and its good to be used :)

Hope this post was helpful to you !

Feel free to comment down below whatever you think might be missing and stay tuned for further blogs such as this :)