Friday, March 12, 2010

Azure: The true cost of web and worker roles

Since February 1, Azure customers are now being billed for their consumption of Azure resources. If you were an early adopter, you might have been spoiled by the free usage during CTP over the past year, or even during January of this year, when “mock bills” were generated but no actual costs accrued.
Recently I’ve been fielding questions about the true expense of running Azure web roles and worker roles, including questions about Microsoft’s “free” account for MSDN Premium developers. Let me share a few tidbits here that will, hopefully, help you manage your Azure costs.

Roles and Virtual Machines

First things first: web and worker roles are hosted in virtual machines (VMs). Each role deploys to its own VM. So, if you have a web role for your Silverlight application, and a worker role for some background tasks (like periodically fetching data from an external source and storing it in SQL Azure), you will actually consume two VMs, one for each role.

Virtual Machines and Instances

Ok, next up: instances. At the most basic level, you could view a VM as synonymous with an instance, and in the example above, you’d have two instances. Things take a twist, though, when you consider that Azure offers multiple VM sizes. The smallest size sets the baseline, and is equivalent to 1 instance:
Type Instance count Cost per hour
Small 1 0.12
Medium 2 0.24
Large 4 0.48
Extra Large 8 0.96
Each of the VMs vary by number of CPU cores, RAM and disk space. The price scales along with the instance count, so an Extra Large VM costs 8x what a Small VM costs.
Instances running ‘round-the-clock effectively cost 24 hours x 0.12 x 30 days = $86 monthly for a Small VM, up to about $690 monthly for an Extra large VM.
Microsoft bills in instance-hours. So you might be billed 48 instance-hours on a given 24-hour period, which either means you had two Small VM running for 24 hours or one Medium VM running for 24 hours.

Instances and Lifetimes

Now we’re getting to the fun part. Let’s say you just created a brand new Azure service, setting up a new name (like MyCoolApp.cloudapp.net). You just finished your first code iteration, and you publish your code to that new service of yours. Just two roles, both set up with one Small VM each.
You might notice that it takes a few minutes to deploy your code to that new service. Why? Because Azure is spinning up new VMs for you. Why is that important? Hold that thought a moment…
Ok, you’re done uploading and deploying, and finally, the Azure dashboard tells you your app is running. You are officially on the clock. You are now being billed for your two instances.
Now let’s get back to the point about your two new VMs. You might decide that, after your first set of tests, you’re going to suspend your new service. That is, leave everything uploaded (so you can show off your new app to the Boss later today). Guess what: You are still officially on the clock. The simple truth is, Azure is still hosting two VMs for you, each taking up a “slot” in the cloud fabric. So you’re still being billed for it, whether the web and worker roles are running or suspended.

Lifetimes and Clock time

Alright – if you’ve gotten this far, you’ve realized that your newly-deployed app, even in a suspended state, is going to burn through hours pretty quickly. And if you’re an MSDN Premium subscriber, with 750 free compute-hours monthly, you might have just done some quick math:
2 instances x 24 hours x 30 days = 1,440 hours !!!
Essentially, your MSDN account is good for one Small VM (equal to 1 instance) running ‘round the clock, as that 1 instance burns just under 750 compute-hours.
So you quickly scramble to delete your deployment from your service (don’t worry, your service definition, and fancy URL, stay in place; just your code and VMs get deleted). Lucky for you, your app was only live for about 10 minutes before you discovered this. So You can afford to deploy several times daily with little-to-no risk, as long as you delete the deployment after each test / demo cycle. Right?
Well, not quite right. When you deploy a VM, you are billed hourly. So even if the VM is up for 10 minutes, you are billed for a compute-hour. Since our example is using two instances, we’ve consumed 2 compute-hours in a very short time.

Clock time vs CPU time

This final per-instance tidbit might be obvious by now, but I’ll call it out anyway, just for good measure. Each VM, whether suspended or active, is costing you in compute-hours. And to take that one step further: Even VMs sitting around mostly idle still consume compute-hours at the same rate. Let’s go back and think about that worker role we have. Maybe it wakes up every 24 hours and fetches temperature data, and stores it in your local database. Maybe that service call takes, oh, 10 seconds to retrieve all needed data, and another few seconds to store it. Then it goes to sleep for 23.99 hours. You are still running at 0.12 cents per instance-hour.

Cost-based architecture

What to do today

So here we are, fully armed with the true cost of these web and worker roles. And here we are, with our elegant Azure application architecture, with a web role for our Silverlight app, a WCF web role for our service tier, and one worker role for each 3rd-party service we want to interact with (that way, we have a very modular system, with each role doing only what it needs to). You probably see where I’m going with this: While our web role is going to be super-busy (and we might need to scale it up to handle our web traffic), and our WCF web role might have to be scaled up as well, our worker roles will likely remain at 1 instance per 3rd-party service.
From a cost perspective, each of those worker roles, sitting almost-completely idle, is burning $86 monthly. Just three external services and you’re up to $250 per month.
Going back to the MSDN Premium accounts, you would require 5 instances running, in order to test your application. That means you’d burn through your 750 hours in about 6 days.
You will want to consider Azure’s pricing model when developing your architecture. The cost structure associated with instances and their lifetimes may mean making compromises such as rolling all of your 3rd-party integration code into a single worker role, and managing your polling intervals accordingly.

Future-thinking

The Azure team is reaching out to the community, asking for input about future ideas, where you can suggest a new idea or vote on someone else’s idea  (check out the voting site here). I want to draw your attention to a few ideas that could really help reduce cost:

What other great ideas would you add, to help create a more cost-effective model for your Azure rollout?

4 comments:

  1. Thank you David. This article is just what I needed. I'm in the middle of developing a simple sql azure website and just got our Bill with overage hours because I was running two storage accounts with 3 instances each of the webrole with our "FREE" developer account. Yikes! This also gives me some amunition to present to our customer about why we need to change the direction of their project in the middle - it's not going to be cost effective for them!
    Thanks for the great article.
    Cheryl Estrella
    Simetra Systems, Inc

    ReplyDelete
  2. Totally agree with your future thinking.
    What I do is have one worker role and many threads or backgrounds tasks for each process. The code gets ugly but I save money.

    ReplyDelete
  3. Just an update on this article. You can now run a worker instance as "Extra small" which is only $14.40/month. If you have a bunch of services that only need to run once a day, you could run all of them on a single extra small worker. So, effectively your maximum cost for background services is $14.40/month.

    ReplyDelete
  4. Alex - agreed! It's been over nearly 3 years since I wrote this (wow - 3 days shy of 3 years) and lots has changed since. Indeed, Extra Small instances are $0.02 per hour, and are terrific for running timer tasks, scaling tasks, etc. The only thing I caution people about, regarding Extra Small instances: They have very limited network bandwidth (5Mbps, vs, 100Mbps per core on Small through Extra Large). So bandwidth-intensive tasks tend to not get along well with XS instances.

    ReplyDelete