the blog for developers

The Dark Side of Virtualized Servers and The Cloud

Virtual servers and the cloud both abstract on hardware and commoditize hardware. Both have benefits and good reasons to use. But both have a dark side, mostly noone talks about. This is a dissonance I feel for some years, people only praising the cloud and virtual servers, while my own experience with those technologies shows significant down sides. To solve this dissonance I decided to write this blog post. After the NoSQL dark side post it seems I run into the dark side of things more often.

Virtual Servers

There is a trend towards virtualization. Most companies change their hardware setup to a setup running their servers in virtual machines on an abstraction layer across their hardware (like VMWare). There are many benefits of using virtual machines:

  • Easier to setup, faster to setup
  • Better utilization for low performance services
  • You can put many servers on the same hardware
  • Easier to provide 24/7
  • Migration to new hardware without downtime
  • and some more

But there is a dark side to virtual machines: Cost and performance. As I’ve seen many times, everything is put on virtual machines. But virtual machines are not as powerful as running native, especially I/O intensive servers run much slower. From my own experience high traffic application servers running with XEN are 20% slower than running native. So you will need more hardware for IO heavy servers to achieve the same, which means more cost. Also if you do not use an open source setup – which few people do for windows servers or in the enterprise – you will need to pay a lot for licenses. The marketing machine runs full steam and promises nearly no impact when virtualizing servers. This simply is not true. The Riak server configuration description writes:

Like most datastores, Riak will run best when not virtualized. Virtual machines (VMs) can suffer from poor I/O and network performance, depending on how they are configured and the environment in which they run.

Also some admins think they magically can put more VMs on the same hardware without adding resources. Seen this over and over again, admins putting more and more VMs on the same hardware, often to go green and reduce power consumtion, and users complaining more and more about performance. This fails as all VMs will run slow. It gets worse the more VMs share the same memory or CPUs. It is possible to put several low resource VMs on one piece of physical hardware instead of putting them on their own hardware. Also if you have VMs which need the most power at night (e.g. map reduce) and VMs which need the most power during the day, then you can easily share them. But VMs are not a magic bullet to reduce your hardware costs. You need to balance the benefits of virtual servers with the costs. And never put more VMs on physical hardware than it can take.

The Cloud

Everyone seems to move to the cloud, Reddit is one example:

Last week we also decommissioned the last of our physical servers. We are now operating our entire website “in the cloud” as the kids would say. Specifically, we are using Amazon Web Services. If all went well, you didn’t notice a thing. If you want to, you can Ask me Anything about the move or our servers.

There is a deeper discussion on Reddit with some numbers about their move:

218 Virtual CPUs 380GB of RAM
9TB of Block Storage
2TB of S3 Storage
6.5 TB of Data Out / mo
2TB of Data In / mo

and says about the costs – which sound rather expensive for 156M cachable pageviews:

Right now about $15K/mo.

and

Yes, it lowered the cost by about 30%, and with their new lower prices, should make it even cheaper still.

which made me curious.

It’s cheaper, it’s cheaper the CIOs, CTOs and other IT managers chant in unison with cloud providers. But the dark side of cloud is the costs, for most users. The cloud is more expensive.

So I wanted to compare the cloud, in this case Amazon EC2 (there might be cheaper offerings from Linode and Rackspace) with a provider I use. The hoster sells i7 machines for $95 or $120 per month. According to a HN thread, one commentator estimates an i7 to be 12 – 15 CU. Comparing

  • EQ6, i7-920, 12 CU, 12 GB, 5TB traffic, $95 month with $200 setup costs
  • EQ8, i7-920, 12 CU, 24 GB, 5TB traffic, $120 month with $200 setup costs
  • L, 4 CU, 7.5GB, 4TB OUT, 1TB
  • XL, 8 CU, 15GB, 4TB OUT, 1TB

For comparing the costs, I’ve calculated three models. 5 servers 100% of the time, 5 servers on demand, and 15 servers on demand with a short peak.

5 Application Servers Example

Model 0
5 app servers, 100% usage on EC2.

Model 1
5 app servers, which get you quite somewhere as a startup from my experience.

Distribution 5 for 5 hour peak, 2 for 10 hours, 1 for 9 hours which means

  • 2 server 42%
  • 1 server 100%
  • 2 server 21%

Chart with 5 servers

15 Application Servers Example

The third model (Model 2) is adding 10 servers for a 1h peak (5%).

Chart with 15 servers

Comparing all prices get us this (no load balancer etc taken into account, no storage, different payment models):

Table with different pricings

You can see that if you utilize the servers 100%, then EC2 is between 2x and 3.3x more expensive than renting servers (Model 0). Additionally looking at the CUs the EC2 images are less powerful than rented hardware, so you probably need more of them. If you run a diverse utilization setup (Model 1), then the costs are between 1.75x and 2.6x higher. But the comparison also shows (Model 2), if you have extreme peaks (500% normal operations, 15 vs. 3 servers) then adding those 10 servers only costs you $1400 more per year and for the L version is much cheaper than renting 15 servers. But the margin is much smaller with the XL setup, it’s only around 3% cheaper (but with more flexibility). Only if you have even more extrem models, running in the cloud gets significantly cheaper than renting servers. And perhaps you need less people, so TCO is lower with the cloud – depends on how many OPS you do not need. And you need to take databse servers, load balancer and others into account which need to run 24h, so 1 server might not be enough for 24h service. The same when you run a world wide business, as you need the power mostly all the time, 24 hours.
(and some claim, buying servers and colocate them is the cheapest option, but with the biggest upfront costs).

If you grow fast, from 5 to 10 to 50 servers in one year, it’s much easier to add capacity if you’re in the cloud, instead of on rented servers. Most providers need some days for your hardware to be set up – which can work for you with a partially predictable business model and a little bit of planning.

And perhaps Joel York is right when he writes:

Let’s face it. CIO’s don’t win awards for saving money. CIO’s win the praise of their companies, their colleagues and their user communities when they deliver better capabilities faster than the competition. It is time savings, not cost savings that is driving cloud adoption.

Conclusion

Calculate hard what you do and if it’s cost effective for you. For many companies the cloud is the future due to the benefits. Don’t lie to yourself, virtualization and the cloud are no magic bullet. If you do not need it, don’t buy it. Do not go to the cloud just because it’s hip.

If you have corrections, find errors in my calculations, have different calculations, can explain the 30% Reddit cost reduction, have different opinions or insights, I would appreciate if you leave a comment. Thanks.

About the author

stephan Stephan Schmidt has been working with internet technologies for the last 20 years. He was head of development, consultant and CTO and is a speaker, author and blog writer. He specializes in organizing and optimizing software development helping companies by increasing productivity with lean software development and agile methodologies. Want to know more? All views are only his own. You can find him on Google +

Discuss on Hacker News Vote on HN