Automatic cloud lab build/teardown – AWS

Welcome to another cloud focused multi-part guide! Today we start out with the following objectives:

  • We want a cloud lab environment at the top 3 cloud providers (AWS, GCP, Azure)
  • There should be an option to either reset to an empty lab, or spawn a small pre-configured lab
  • Lab cleanup should be automatic, periodically and/or on demand
  • We want to be able to use tags to preserve lab environments from automatic teardown

That sounds fairly straight forward. We will target one cloud provider at a time, starting with AWS.

As with many other projects, let’s create a concept drawing that visualizes our objectives:

Before we do anything else we need to create the actual lab account. In AWS we will accomplish this by using AWS Organizations, creating a sub-account under our regular production account. Because the setup and account creation is straight forward, and may change in the future, I will only post a few screenshots of the process without further details.

After switching AWS roles to the newly created account we can clearly see that we are in the lab account, as no instances are currently running:

OK, so with the lab account created we now need a good way to “reset” the environment by wiping out all resources and start from scratch. Luckily for us someone else already thought of a solution for this problem so we can use the appropriately named “aws-nuke” software (GitHub) for this purpose. I tried to get it working in AWS Lambda, but ran into problems and decided to build a small VM to run the script instead. Our scheduled task will prompt the VM to boot, the VM runs the script after boot and then shut down again as its final task.

We launch a t2.micro instance in our production AWS account (not the newly created lab account) using the default “Amazon Linux” AMI:

Once the VM is running we SSH to it and download the latest binary of aws-nuke:

We then rename the file and make it executable:

Next up we need to create an IAM user in the lab account with Administrative rights. We will use this account to run the aws-nuke script:

We give the account full Administrative access:

Make sure to download the .csv file containing the credentials, as we will be needing them in the next step:

In the VM run “aws configure” and enter the secret keys from the previous step.

Next we will need our aws-nuke config file. I will be using this file. The config file allows us to filter resources to preserve by tags, in this case “Permanent: True” means that the resource will not be deleted. Download the config file to the VM and modify the account IDs as needed. One important step is to add an Alias to our lab account. We do this under IAM > Dashboard > Customize.

We can now do a test run of aws-nuke. By default nothing gets deleted unless we pass the “–no-dry-run” flag.

If everything looks good then we can proceed to create a simple script to automatically run aws-nuke and then shut down the server when completed. This script  (boot.sh) consists of a few lines:

We add the five minute sleep timer to allow us to log in to the server and troubleshoot in case there is a problem with the script. We make the script executable, add it to /etc/rd.d/rc.local, and reboot the VM to test. Once the VM comes back online we see the script is now running:

All the AWS resources got cleaned up and the lab account is now reset. For our next task we simply need to schedule the boot of this aws-nuke VM, as well as setup the manual activation method. Let’s start with the manual way.

To build or tear down the AWS lab manually it seems appropriate to use an AWS IoT Dash Button:

I used a pre-configured Lambda function in our production account found during setup of the IoT button that I modified slightly to launch our aws-nuke VM. We will also receive a text message confirming that we pushed the button. Make sure to give the Lambda function at least the “ec2:Start*” permission.

To create a scheduled task to launch the VM we can utilize CloudWatch (again, in our production account):

We create a rule to run our nuke-VM every day at 7 PM EST, Monday – Friday. We select the Lambda function we created earlier and configure input to send the “SINGLE” variable to our function, which will tear down the lab.

We now reach the final task, which is to configure a pre-baked lab environment that we can spin up at will. We will re-visit this step at a later date, as it will be a large undertaking to configure the launch scripts and the CloudFormation template for all those servers.

In part 2 we will look at automating a lab environment for Google Cloud!