Scalable Kiosk AWS Infrastructure with Terraform and Terragrunt

Connor Verret
3 hours ago
4 min read

About Zenblen

Zenblen is a startup based out of Chicago that has engineered a beautifully designed automated smoothie kiosk, complete with technologically impressive onboard systems for blending and vending, and a 32-inch touch screen display. Their kiosks are loved throughout the universities, hospitals, and office buildings where they're deployed across the city of Chicago.

The Challenge

Zenblen's infrastructure had grown organically. Core resources like the database and compute were created manually in the AWS console, while some Lambda functions lived in AWS SAM. This patchwork approach made it difficult to recreate their environment in a new account for automated testing—a prerequisite for confidently scaling from ~10 kiosks to 100+.

Security gaps in the legacy architecture:

PostgreSQL database deployed in public subnets, accessible to the internet
Database access relied on IP whitelisting
ECS services exposed directly to the internet

The scaling problem:

The Zenblen team wanted to start fresh and eliminate these issues while building a foundation that could support their expansion plans. Without reproducible infrastructure, the prospect of new kiosk rollouts created anxiety. The team needed a foundation that would allow them to simulate the load that an eventual scale up to 100+ kiosks would create.

Solutions and Outcomes

Commerce Architects designed and implemented a complete AWS infrastructure platform using reusable Terraform modules orchestrated by Terragrunt, following Infrastructure as Code best practices. The solution is fully production-ready, templatized, re-usable across environments, and scalable from their current sandbox workloads to large-scale capacity through configuration alone.

What We Delivered

Secure 3-tier VPC architecture

The legacy architecture had the database and ECS services exposed directly to the internet. We replaced this with a 3-tier VPC network model: An API Gateway serves as the entry point to the VPC, connecting through a VPC Link to an internal load balancer, which routes to ECS services running in private subnets. The database sits in an isolated data subnet with access only permitted from ECS.

The API Gateway enforces rate limiting. A Lambda authorizer is attached to the API Gateway to validate API keys before requests ever even reach the VPC.

Infrastructure-as-Code with isolated stacks

On the infrastructure-as-code side, we organized the environment into independent Terragrunt stacks, each with isolated state. This keeps plans small and drift visible. A change to the API Gateway can't accidentally affect the database. A failed deployment impacts one stack, not the entire environment. When needed, the Terragrunt workflow is still flexible enough to orchestrate the full environment with proper dependency ordering.

Operational resilience

Every ECS service includes deployment circuit breakers. If a new version fails health checks, it rolls back automatically. CloudWatch alarms monitor CPU and memory across ECS services, with SNS notifications wired up to notify the team if thresholds are breached.

Built to scale

The architecture spans three availability zones and supports read replicas, auto-scaling, and multi-AZ deployment without requiring changes to the module code. What runs cost-optimized today can scale to handle 10x the traffic by adjusting input variables.

Why This Approach

Our team has spent years contributing to AWS infrastructure and championing DevOps practices. Throughout these experiences, we've developed an intuition for what organizations need to succeed with cloud infrastructure. It comes down to three principles:

1. Reusable Infrastructure Templates Across Environments

The same infrastructure should match across environments, but not necessarily in terms of cost. Lower environments can run leaner: cheaper compute via spot instances, single-AZ database and NAT gateway deployments, and shorter log retention, to name a few. The key is to design modules to be configurable to serve both a development and production context while remaining structurally identical and tailored to the client’s needs. This ensures consistency across environments. Ephemeral environments become easier too for load testing or disaster recovery drills. Some teams even tear down non-production environments on nights and weekends just to save costs.

2. Minimization of Blast Radius When Modifying Infrastructure

This is where Terragrunt comes in. We've seen it time and again: a team correctly creates reusable Terraform modules, neatly broken down by category (network, compute, data). But when it's time to provision an environment, these modules all get initialized within a monolithic configuration called "staging" or "production."

Now you're constantly dealing with drift. Want to deploy a new container image? You have to apply Terraform that spans your entire environment:

$ terraform plan

# module.elasticache.aws_elasticache_replication_group.main will be updated
  ~ engine_version = "6.2" -> "7.0"

# module.rds.aws_db_instance.main will be updated
  ~ parameter_group_name = "default.postgres14" -> "default.postgres15"

# module.ecs.aws_ecs_service.api will be updated
  ~ task_definition = "api:42" -> "api:43"
 
Plan: 0 to add, 3 to change, 0 to destroy.

With Terragrunt, you can be more specific (while retaining the flexibility to apply the entire environment):

 $ cd stacks/ecs-api && terragrunt plan

  # module.ecs_service.aws_ecs_service.api will be updated
    ~ task_definition = "api:42" -> "api:43"

  Plan: 0 to add, 1 to change, 0 to destroy.

Why this matters:

For experienced engineers, a monolithic state is painful to work with. But for new team members, it can be downright paralyzing. Nobody wants their first infrastructure change to accidentally trigger a database engine upgrade. This fear leads to avoidance. Engineers route around IaC entirely, making manual changes in the console, which defeats the purpose. Junior engineers never build confidence with infrastructure, and suddenly your bus factor is one.

3. Infrastructure as Living Documentation

Terraform doesn't just provision infrastructure, it serves as living documentation for it. And unlike a wiki page that's outdated the moment it’s written, this documentation is the infrastructure. When drift happens, terraform plan catches it.

Questions this approach answers instantly:

Which IAM permissions do your ECS tasks have? → Defined in the code
Why did our AWS bill spike last month? → Analyze the git commits
How to add a new environment variable to your ECS tasks? → There's already a well-defined pattern
Need to prove database access is restricted to application services only? → Grep security group rules (port 5432 should reference the ECS security group, not a CIDR)
Is data encrypted in transit between the ALB and ECS? → Check the target group protocol (HTTPS or HTTP)

These questions touch cost, security, and operations. With infrastructure-as-code, the answers live in version-controlled code: auditable, reviewable, and current.

Outcomes

“Your team has gone above and beyond, I have no complaints.”

- Aleksandra Kukielka, Zenblen

The Takeaway

By implementing Infrastructure as Code with Terraform and Terragrunt, Commerce Architects delivered a complete, production-ready infrastructure platform that enables Zenblen to focus on what they do best: building great smoothie kiosks.

The client received +150 tailored AWS resources with the flexibility to run cost-optimized today and scale to full redundancy (read replicas, multi-AZ failover, cross-AZ subnets, and auto-scaling) when demands require it.