Recently, I’ve noticed there’s an alarming, yet subtle, trend in the industry that characterizes serverless as something you absolutely must adopt in your engineering practices.
If you’re unfamiliar with serverless, I refer you to the excellent Serverless Architectures article by Mike Roberts. All of the big cloud providers (AWS, Azure, Google Cloud Platform) offer some variety of serverless products. For this post, we’re specifically examining AWS Lambda to orient our discussion.
While I think some advocates for serverless mean well enough, I believe this exuberance needs to be tempered with cautious optimism. Furthermore, I think we need to start analyzing the purported benefits of serverless and distill them into one of two categories—either true technical benefit, or marketing fluff provided to us by cloud providers.
The arguments for moving to serverless are almost always one of the following, in no particular order:
- Serverless is cheap.
- Serverless is easy.
- Serverless reduces, or in some cases, eliminates the need for internal DevOps / SRE.
Fascinatingly, if your motivation to move to serverless falls into one of these categories, a quick search seems to only further validate your rationale for moving to serverless.
Cloud providers are happy to provide pricing models. Developer advocates for said providers generate content that will show you how easy it is to get started in your own serverless deployments. And, if you’re still feeling unsure that serverless is a good move, head over to the twitterverse and you’ll hear plenty of loud voices claiming that the serverless revolution is NOW.
The unfortunate reality around these loud voices, though, is that almost none of them understand the specific demands your application, business, or service offering will require from a serverless environment. Yes, the cloud providers have done a great job in providing tools with relatively low ramp-up time, but the question remains—have they offered something of value to you, specifically?
It’s also easy to misinterpret the marketing speak and the obvious enthusiasm shared by those who are amped about serverless as absolute truth. The other unfortunate reality is that the marketing messages are so well formulated that you might think you’re actually doing something wrong if you don’t embrace serverless, and right away!
Before you cry foul, or assume I’m a fuddy-duddy who isn’t hip to the new way of doing things, I should mention that I work in serverless (specifically AWS lambda) on a daily basis.
For the work I’m doing, it has provided most of the benefits supporters and aficionados claimed that it would. But the move to using lambda functions wasn’t something my team leapt into. We determined the fit for serverless with a considerable amount of effort, caution, and planning.
Can an engineer objectively determine if serverless is something worth using, while navigating the marketing traps? I think the answer lies in unpacking the purported benefits of serverless with a healthy dose of skepticism.
Let’s take a look at the “Serverless Manifesto” and examine the implications. For all of our examples and discussion below, we’re constraining our analysis to the AWS variety of serverless offerings, which typically entails API Gateway and Lambda Functions.
Destructuring the serverless manifesto
In 2016, David Potes and Ajay Nair presented “Building Complex Serverless Applications” at re:Invent of the same year, and a sort of “Serverless Manifesto” appeared, based on the main concepts of that talk.
The manifesto goes something like this:
- Functions are the unit of deployment and scaling.
- No machines, VMs, or containers visible in the programming model.
- Permanent storage lives elsewhere.
- Scales per request; Users cannot over- or under-provision capacity.
- Never pay for idle (no cold servers/containers or their costs).
- Implicitly fault-tolerant because functions can run anywhere.
- BYOC - Bring Your Own Code.
- Metrics and logging are a universal right.
So, what parts of this message are market doublespeak, and what parts are technically sound? Let’s take a look at the assumptions in the Serverless Manifesto, and see where they are applicable, and where they might fall flat.
Functions are the unit of deployment and scaling
The pessimist in me wants to reword this bullet point to read, “Anything can be a function if you try hard enough”. The assumption of this tenet is that your code can and should be organized as units of functions.
While it is certainly the case that yes, you can easily slam as much code as possible into a unit of functional work, the question remains as to whether that is really a net benefit for your codebase, your team, and your product or business on the whole.
This is exactly why serverless works well for web services and event-driven architectures. Functional units of work just make intuitive sense. Provided an input, a function can provide an output, and the ability to scale said functions without really having to worry about tough concepts like concurrency do indeed make it relatively easy to stand up a distributed, fault-tolerant system.
The tricky parts though, are when your code begins to cut across multiple functional units. I’m not saying this is a deal breaker; yes, you can engineer around this problem for some use cases. But not all of them.
No machines, VMs, or containers visible in the programming model
I’m a member of the excellent AWS Developers Slack community, it’s a great place to get community support around your AWS questions. One interesting question that came up recently:
How can I see files in the /opt
directory in AWS lambda?
Some community members responded with a few blog posts by the author of lamdash, while others proposed writing a dedicated lambda function to walk the file system tree and report back with the necessary information.
Lamdash as a tool allows you to execute shell commands inside your AWS lambda environment. From trivial commands like figuring out your present working directory, to being able to inspect the kernel version in your lambda’s runtime, lambdash provides visibility into what is obfuscated by design.
I find the existence of these “hacks” to be extremely telling about the state of serverless in general. Invisible machines, VMs, and containers definitely seem like a good practice…for cloud providers. Eventually, though, a developer is going to need, or want, to understand what exactly is going on under the covers of the application code itself, and the existence of these sorts of hacks seem to only reinforce that abstracting away the lower level mechanisms of an environment will only go so far.
What these hacks have found is that you’re also not really “serverless”, so maybe we should start calling the entire serverless thing, “rented cpu cycles on someone else’s infrastructure”.
The question remains, though: if we’re already pressing against the limits of this invisible layer in our serverless environments, will cloud providers give us tools to inspect and understand the environment our units of functional work live in? I’m not convinced this tenet of the manifesto will last forever, and my sense is that as serverless offerings mature, cloud providers will need to make the underlying system available to devs.
Permanent storage lives elsewhere
Your serverless environment is ephemeral, which means after execution cycles, your underlying system from the point above is gone. You don’t have access to long term machine, VM, or container storage. File system storage is temporary, so if you can remain as stateless as possible in your lambda application code, you’ll be able to leverage the service relatively well.
Conveniently, cloud providers have designed several persistence mechanisms to fit your needs, allowing you to cobble together any and all persistence layers you might need for your application. Need a long term file system? Use S3. Need a database? Push data to Aurora, or even better, use the serverless version of it.
But let’s not forget that you still have to pay for the addition of persistent storage. S3, Aurora, Dynamo, etc. are all ready to be used, at an additional cost.
This tenet of the manifesto is, in my mind, a great way to fully vendor lock users into whatever flavor of the cloud you’re working in. Sure, compute can be made ephemeral and rented, like a utility. The real gold mine of applications, though, is data, and the data your application generates and uses is unique to your specific business. This commodity needs to be protected—if the function can be abstracted to a unit of work, persistence of data turns into an absolute necessity for any viable application.
To wit, your applications, your businesses, and your success in building meaningful work absolutely requires long term storage. You have to have data. And when you move to serverless, you need to factor the costs of storing that data into the overall cost required to run a serverless ecosystem.
Scales per request; Users cannot over- or under-provision capacity
This is an undeniable advantage when using AWS lambda. Not having to fret about scaling up lambda functions to meet demand is definitely helpful, and it’s how adopters of serverless have been able to meet increasing demand without much up-front work to attempt to predict bursts in traffic to their lambda functions.
For AWS specifically, though, if your function is connected to a Virtual Private Cloud (VPC), a lambda that scales unchecked can quickly overwhelm all available elastic IP addresses in your VPC- which means any other services running in said VPC are out of luck for network connections should your lambdas become swamped.Given that you’re going to need a VPC in AWS if you’re running a production level system, this gotcha can easily bite you if you’re unaware of it. As an operator, you’ll need to dial down and introduce throttling based on…you guessed it, your attempts to predict bursts in traffic to your lambda functions via your VPC.
We’ll see if this changes in the future; AWS does try to identify bottlenecks like this and improve them. But the fact remains that you’re still at the mercy of the provider in cases of scale. I think this tenet of the manifesto suggests that capacity and scale are fire and forget in serverless, but in practice—that just isn’t the case.
Never pay for idle (no cold servers/containers or their costs).
No, you won’t pay for a function that isn’t running, but you will pay for a function that is starting from a cold state. Cold starts introduce a frustrating level of latency, and there are several approaches to combating those cold starts.
A typical approach is to keep your lambda services warm, which will reduce your cold start time, but you’ll still pay for warming up your functions.
This tenet frustrates me because it feels a bit slimy. Sure, you’re not going to pay extra for a function that isn’t running. But in order to meet reasonable expectations for response times in your serverless layer, you’re going to have to pay your cloud provider to keep your lambdas warm. Which means your lambdas will usually be running. So where are the purported cost savings?
Warm ups aren’t the only option available to you. You can definitely increase your function’s memory size, but doing so will also cause you to pay more for your infrastructure. It seems to me that this tenet needs a bit of rewording:
Never pay for idle, definitely pay for starting from idle.
“Implicitly fault-tolerant because functions can run anywhere” and “BYOC - Bring Your Own Code”
I’m tackling these two bullet points together because I can’t seem to make sense of them. Perhaps someone smarter than I will be able to provide a more pithy analysis of what these two tenets are advocating.
Intuitively, my guess is that the former of the two is stating that should a lambda environment fail (e.g. the underlying whatever that is running your lambda code) it’s easy to swap it out for a duplicate of the same environment, since your application code is organized as a functional unit. I think this makes sense, but I’d feel a bit better if “anywhere” was better qualified.
Having said that, no, you’re not going to be able to just pluck your lambda functions out and move them to a container without some significant refactoring.
The other tenet, “Bring Your Own Code” is frankly just baffling to me. I’ve asked a few peers what they think this means in the context of serverless, and I have yet to really understand the motivation for this as a bullet point in the serverless manifesto.
Metrics and logging are a universal right
Lambdas in AWS automatically emit CloudWatch logs, and the lambda web console provides some handy monitoring graphs and metrics for your functions.
The complexity occurs when you have multiple services that are required for your entire system to run. So, yes, lambda logging is relatively simple and straightforward, but having insight into your entire infrastructure continues to be a challenge.
To be fair, this isn’t necessarily something that your favorite cloud provider should mandate. I appreciate the flexibility offered in tools like CloudWatch, and although there’s a barrier to entry to actually understanding it, I think it’s time well spent to try to figure it out.
The challenge remains though—how do I configure these tools to get a holistic view of what’s going on when I’m working with many different layers of my cloud platform? Logging and metrics will always be a challenge as your services continue to break up. A simple heuristic, the more service layers you enable and leverage, the more sophisticated your logging and metrics strategy needs to be.
I’d say tackling this issue is something serverless doesn’t necessarily need to solve, so I’m ok with this bullet point as it stands.
A final warning around serverless
Hopefully by now you’ve learned that embracing serverless is more complicated than standing up some functions in the cloud.
My hope is that this post encourages you to dig deeper into the claims your favorite cloud provider is making about serverless, and begins to arm you with the ability to discern between really well-designed marketing speak, and what might actually benefit your team and services.
I’d be remiss if I didn’t mention a final warning about the messaging around serverless. I’ve encountered several folks who believe that by embracing serverless, they’ll be able to effectively outsource their DevOps or SRE teams to their cloud provider. I can’t emphasize enough that if you currently have a DevOps or SRE team, changing your infrastructure to move to serverless will only increase your reliance on their expertise. T
he challenge of creating meaningful metrics and logging alone, combined with the absolute necessity to include other infrastructure besides serverless to create a functional application, means you’re still going to want those folks around.