[T] AWS Lambda is Not Always the Right Answer! 5 Things to Consider Before Developing Your Function

01 Feb 2019 by Justin Ramsey

If you want to run a server that is always available to make a minimum number of requests, we have this product you might have heard of called EC2 ~ u/Scarface74

After reading the Benefits and Use Cases of AWS Lambda, the Lambda Team might have you believing that there is very little AWS Lambda is not a good candidate for.

Although AWS Lambda is an extremely powerful tool, it is not always the right tool for the job. Make sure you take the following 5 things into consideration when developing your lambda function to prevent falling for the AWS Lambda hype.

1. If your function is response time sensitive

Although a “hot” AWS Lambda function can have runtimes of as little as 18ms, “cold starts” are a serious problem for Lambda functions.

Similar to your car on a cold Canadian morning, a cold start occurs when your AWS Lambda function has not been run for a period time and your function’s container is removed from the internal lambda cache.

When you try to start your function again after that period of time, lambda has to load your function into a new container which can add hundreds of milliseconds (and $$) to your function’s runtime.

This cold start delay can become deadly if:

Your function is run inside of a VPC: The cold start time to attach your lambda function to a VPC can be in the 5-10 second range
Your function is written in Java or C#: The cold start time to initialize the JVM or CLR can add seconds if not running on a high-memory lambda function

If your client is waiting 10 seconds for your cold lambda function to finally turn over, they may decide to click the back button and go somewhere else with their time and money.

You can “pre-warm” your function by automatically calling it periodically, but make sure you understand that you’re only pre-warming one instance.

2. If your function runs for more than 15 minutes

AWS Lambda can an extremely powerful tool to manage background processing with. Simply add a CloudWatch Cron Event Trigger (or Schedule Event in Serverless) and AWS will automatically run your function every XX minutes/hours/days/weeks.

I’ve found this to be extremely useful for ETL jobs:

Move data from this database to this database
Create this long-running results table
Transform this csv file to json
Check accuracy between these data sources

As well as misc maintenance tasks:

Turn off these services during the redshift maintenance window
Check free disk space in a certain folder on the instance
Taking and checking backups of various instances and servers

You have to be careful, however, that your function does not need to run for more than 15 minutes at a time. This is a hard limit that cannot be increased by contacting support.

3. If your API Gateway function runs for more than 29s

A much tighter timeout that you will run into if developing a AWS Lambda function behind an API Gateway is API Gateway’s 29 second timeout.

Although 29 seconds is usually enough to handle most CRUD requests, sometimes you want to be able to run an analytics query from your website dashboard that takes 32 seconds instead of 29.

The proper way to handle web requests that take longer than 29 seconds would be to setup some sort of job management framework where the client submits a job, gets a token for that job, and then can use that token to continually poll the service for the results of the job once it finishes in the background.

Setting up all of that infrastructure code to manage the long-running jobs can be a pain – especially if you are only barely over the 29 second timeout.

4. If your function’s Request or Response Payload is > 4MB

Another common use-case for a website is to securely manage uploading, downloading, and processing files from a client.

A straight-forward solution to this would be to have your client send a multi-part form of the file to your lambda function, and then have your lambda function authenticate the request, process it, and upload it to S3.

Similarly to download a file the lambda function would authenticate the request, download the file from S3, and return it to the client.

This becomes an issue, however, when you are dealing with files that are > 4MB (although the official limit is 6MB, headers can take up 2MB of the payload so 4MB is usually a safe upper limit)

To get around this you can used S3 Signed URLs but that will complicate your architecture and client.

5. If your function needs more than 512mb of local disk space

I’ve found while running ETL tasks it’s not uncommon to be working with files greater than 512mb. This, however, presents a problem in AWS Lambda as the scratch disk space is only 512mb large.

If the file you are working with is streamable (such as a CSV file), you can get around this by only downloading and processing little chunks of the file at a time.

But if you need to work with a non-streamable format (such as JSON), you’re kind of S.O.L. and should look into using AWS Batch or Fargate.

Next Steps

So next time you are considering AWS Lambda for your project, make sure that you keep the following limitations in mind.

Stay tuned for my next article about serverless alternatives to AWS Lambda and when they are a better choice (AWS Batch, Fargate, Kubernetes)!

Python in the Cloud