Tag Archives: ec2

AWS EC2 / RDS Pricing and Performance

AWS EC2 pricing seems compilcated, and many times I have tried to figure it out. Recently I was there again, looking at it from the RDS perspective (same pricing model for the underlying EC2 instances), so here we go.

Amazon/AWS calls their virtual machine Elastic Compute Cloud (EC2). I used to think about the "elastic" part in terms of being able to scale your infrastructure by adding and removing VM’s as needed. But I guess scaling is also relevant in terms of allocating more or less compute on the same VM, or considering how many compute tasks at the same time a single hardware host in AWS can handle. Let’s see why…

Basic Terminology

What is compute in EC2? AWS used to use a term called Elastic Compute Unit (ECU). As far as I can tell, this has been largely phased out. Now they measure their VM performance in terms of virtual CPU (vCPU) units.

So what is a vCPU? At the time of writing this, it is defined as "a thread of either an Intel Xeon core or an AMD EPYC core, except for M6g instances, A1 instances, T2 instances, and m3.medium.". A1 and M6g use AWS Gravitron and Gravitron 2 (ARM) processors, which I guess is a different architecture (no hyperthreading?). T2 is not described in so much detail except as an Intel 3.0 GHz or 3.3 GHz processors (older, no hyperthreading?). Anyway, I go with vCPU meaning a (hyper)thread allocated on an actual CPU host. Usually this would not even be a full core but a hyperthread.

There are different types of vCPU’s on the instances, as they use different physical CPU’s. But that is a minor detail. The instance type are more relevant here:

  • burstable standard,
  • burstable unlimited, and
  • fixed performance.

OK then, what are they?

Fixed Performance

Fixed performance instance type is the simplest. It is always allocated the vCPU’s in full. A fixed performance instance with 2 vCPU instances can run those 2 vCPUs (hyperthreads) at up to 100% CPU load with no extra charge at all times. The price is always fixed. If you don’t need the full 100% CPU power at all times, a burstable instance can be cheaper. But only if you dont "burst" too much, in which case burstable type becomes more expensive.

Burstable Standard

The concept of a burstable instance is what I find a bit complex. There is something called the baseline performance. This is what you always get, and is included in the price.

On top of the baseline performance, for burstable instances, there is something called CPU credits. Different instance types have different number of credits. Here are a few example (at the time of writing this..):

Instance type Credits / h Max. creds. vCPUs Mem. Baseline perf.
T2.micro 6 144 1 1GB 10%
T2.small 12 288 1 2GB 20%
T2.large 36 864 2 8GB 20% * 2
T3.micro 12 288 2 1GB 10% * 2
T3.small 24 576 2 2GB 20% * 2
T3.large 36 864 2 8GB 30% * 2
M4.large 2 8GB 200%

Baseline performance

I will use the T2.micro from above table as an example. The same concepts apply to other instance types as well, just change the numbers.

T2.micro baseline performance is 10%, and there is a single vCPU instance allocated, referring to a single hyperthread. The 10% baseline refers to being able to use 10% of the maximum performance of this hyperthread (vCPU).

CPU credits

Every hour, a T2.micro gets 6 CPU credits. If the instance runs at or below the baseline performance (10% here), it saves these credits for later use. For a maximum of 144 credits saved for a T2.micro. These credits are always awarded, but if your application load is such that the instance can use more than the 10% baseline performance, it will spike to that higher load as soon as the CPU credit is allocated, and consume the credit immediately.

The credit is used up in full if the instance runs at 100%, and in parts if it runs higher than baseline but lower than maximum 100% performance. If multiple vCPUs are allocated to an instance, and they all run on higher than baseline, they will use multipe amounts of the CPU credits.

Well, that is what the sites I linked above say. But here is an example, where I ran a task on a T2.micro instance after it had been practically idle for more than 24 hours. So it should have had the full 144 CPU credits at this point.

T2 load chart

In the above chart, the initial spike around midnight is about 144 minutes, although the chart timeline is too coarse to show it. It is from an RDS T2.micro instance, under heavy write load (I was writing as much as I could all the time, from another EC2 T2.micro instance). So the timeline of 144 minutes seems consistent with the credit numbers. But the CPU percentage shown here is not, since 10% should be the baseline.. uh. It could also be that the EC2 instance responsible for loading the data into the above RDS instance is having the same CPU credit limit and thus the size of data injected for writing is also limited. Will have to investigate more later, but the shape illustrates the performance throttling and CPU credit concepts.

Considering the baseline, practically the T2.micro is an instance running at 10% of a single thread performance of a modern server processor. Does not seem much. To me, the 1 vCPU definition actually seems rather misleading, as you don’t really get a vCPU but rather 10% of one. Given 60 minutes in an hour, and 6 CPU credits awarder to a T2.micro per hour, you get about one credit every 60/6 = 10 minutes. If you save up and run in low performance load for 24 hours (144*10=1440minutes = 24 hours), you can then run for the 144 minutes (2 hours 24 minutes) at 100% CPU load. In spikes of about 10 minutes you can run for one minute equivalent of 100% load.

T2.micro instances are described as "High frequency Intel Xeon processors", with "up to 3.3 GHz Intel Scalable Processor". So the EC2 T2.micro instance is actually 10% of a single hyper-thread on a 3.3GHz processor. About equal to 330Mhz single hyperthread.

The bigger instances can have multiple vCPU’s allocated as shown in the table above. They also get a bit more credits, and have a higher baseline performance %. The performance percentage is per vCPU, so an instance with 2 vCPU’s and a baseline performance of 20% actually has a baseline performance of 2*20%. In this case, You are getting two hyperthreads at 20% of the CPU max capacity.

I still have various questions about this type, such as do you actually use a fraction of the instance CPU credit, or do you use it in full when going over the baseline? Can the different threads (over multiple vCPUs) share the total of 2*20%=40%, or is it just 20% per vCPU and anything above that is over baseline regardless of the other thread idling or not? But I guess I have to settle for burstable complicated, fixed simpler to use. Moving on.

Burstable Unlimited

The burstable instances can also be set to unlimited burstable mode.

In this mode, the instance can run (burst) at full performance all the time, not just limited by accumulated CPU credits. However, you still gain CPU credits as with burstable instances. In comparison to standard bursting type, if you use more CPU credits than you have, with unlimited mode you will just be billed extra for those. You will not be throttled by available credits, rather you can rack up nice extra bills.

If the average utilization rate is higher than baseline + available CPU credits, over a rolling 24-hour window, or during instance lifetime (if less than 24h), you will be billed according to each vCPU hour used over that measurement (baseline average + CPU credits).

Each vCPU hour above the extra billing threshold costs 0.05$ (5 cents USD). Considering the cost difference, this seems potentially quite expensive. Lets see why.

Comparing Prices

What do you actually get for the different instances? I used the following as basis for calculations:

  • T2: 3.0/3.3GHz Xeon. AWS describes T2 instances as T2.small and T2.medium being "Intel Scaleble (Xeon) Processor running at 3.3 GHz", and T2.large at 3.0 Ghz. A bit strange numbers, but I guess there is some legacy there (more cores at less GHz?).
  • T3: 3.1GHz Xeon. AWS describes this as "1st or 2nd generation Intel Xeon Platinum 8000", and "sustained all core Turbo CPU clock speed of up to 3.1 GHz". My interpretation of 3.1 GHz might be a bithigh, as the description says "boost" and "up to", but I don’t have anything better to go with.
  • M5: 3.1GHz Xeon. Desribed same as T3, "1st or 2nd generation Intel Xeon Platinum 8000", and "up to 3.1 GHz"..
Instance type CPU GHz Base perf Instance MHz vCPUs Mem. Price/h
T2.micro 3.3 10% 330 Mhz 1 1GB $0.0116
T2.small 3.3 20% 660 MHz 1 2GB $0.0230
T2.large 3.0 20% * 2 600 MHz * 2 2 8GB $0.0928
T2.large.unl 3.0 200% 3000 MHz * 2 2 8GB $0.1428
T3.micro 3.1 10% * 2 310 MHz * 2 2 1GB $0.0104
T3.small 3.1 20% * 2 620 MHz * 2 2 2GB $0.0208
T3.large 3.1 30% * 2 620 MHz * 2 2 8GB $0.0832
T3.large.unl 3.1 200% 3100 MHz * 2 2 8GB $0.1332
M5.large 3.1 200% 3100 MHz * 2 2 8GB $0.0960

I took the above prices from the AWS EC2 pricing page at the time of writing this. Interestingly, the AWS pricing seems so complicated, they cannot keep track of it themselves. For example, T3 has on price on the above page, and another on the T3 instance page. The latter lists the T3.micro price at $0.0209 / hour as opposed to the $0.0208 above. Yes, it is a minimal difference, but just shows how complicated this gets.

The table above represents the worst-case scenario where you run your instance at 100% performance as much as possible. It also does not include the burstable instances being able to run at up to 100% CPU load for short periods when they accumulate a CPU credit. And with the unlimited burstable types, you can get by with less if you run at or under the baseline. But, as the AWS docs note, the unlimited burstable is about 1.5 times more expensive than the fixed size instace (T3 vs M5).

Strangely, T2 is more expensive than T3, while the T3 is more powerul. So I guess other than free tier use, there should be absolutely no reason to use T2, ever. Unless maybe for some legacy dependency, or limited availability.

Conclusions

I always though it was so nice of AWS to offer a free tier, and how could they afford giving everyone a CPU to play with? Well, it turns out they don’t. They just give you one tenth of a single thread on a hyperthreaded CPU. This is what a T2.micro is in practice. I guess it can be useful for playing around and getting familiar with AWS, but yeah the marketing is a bit of.. marketing? Cute.

Still, the price difference per hour from T2.large ($0.0928) or T3.large ($0.0832) to M5.large ($0.0960) seems small. Especially the difference between the T2 and M5 is so small it seems to make no sense. So why go bursty, ever? With the T3 you are saving about 15%. If you have bursty workloads and need to be able to handle large spikes, on a really large set of servers, maybe it makes sense. Or if your load is very low, you can get smaller (fractions of a CPU) instances using the bursty mode. But seems to me it requires a lot of effort to profile your loads, make predictions, monitor and manage it all.

In most cases I would actually expect something like Lambda functions to be the really best fit for those types of cases. Scaling according to the need, clear pricing (which seems like a miracle in AWS), and a simple operational model. Sounds just great to me.

In the end, comparing the burstable vs fixed performance instances, it just seems silly to me to be paying almost the same price for such a complicated burstable model, with seemingly much worse performance. But like I said, for big houses, and big projects, maybe it makes more sense. Would be really interested to hear some concrete and practical experiences and examples on why use one over the other (especially the bursty instances).

Performance testing with InfluxDB + Grafana + Telegraf, Part 2

Previously I ran my performance test in a local test environment. However, the actual server should run in the Amazon EC2 cloud on one of the free tier micro instances (yes, I am cheap but it is more of a hobby project bwaabwaa.. ๐Ÿ™‚ ). So I installed the server on my EC2 micro instance and ran the same tests against. The test client(s) on my laptop, the server in EC2..

So what did I expect to get? Maybe a bit slower but mainly the same performance. Well, lets see..

Here is the start of the session:

init_2freq

Couple of things to note. I added a chart showing how many different IP addresses the sessions are coming from (middle line here). This is just one due to hitting them all from my laptop. In the production version this should be more interesting. Also, the rate at which the tester is able to create new sessions and keep them running is much bumbier and slower than when running the same tests locally. Finally,ย there is the fact that the tester is observing a much bigger update frequency variation.

To examine if this delay variation is due to server load or network connections, I tried turning off the tester frequency line and show only the server line:

init_1freq

And sure enough, on the server end there is very little variation. So the variation is due to network load/speed. Not a real problem here since most likely there would not be hundreds of clients connection over the same IP line/NAT router. However, the rate at which my simple tester and single IP approach scales up is much slower and gets slower over time:

1300sessions

So I run it for about 1300 clients as shown above. For a more extensive tests I should really provision some short-term EC2 micro instances and use those to load test my server instance. But lets look at the resource use for now:

1300sessions_mem

So the EC2 micro instance has a maximum of 1GB memory, out of which the operating system takes its own chunk. I started the server JVM in this case without remembering to set the JVM heap limit specifically, so in this case it is set to about 256MB. However, from my previous tests this was enough for pretty much the 4000 clients I tested with so I will just go with that for now.

And how about the system resources? Lets see:

byte_count

CPU load never goes above 30% so that is fine. System has memory left so I can allocate more for the JVM if I need so fine. Actually it has more than shown above as I learned looking at this that there is a bunch of cached and buffered memory that is also available although not listed as free. At least for “top”.. ๐Ÿ™‚

But more interestingly, the “no title” chart at the bottom is now again the network usage chart that on my OSX local instance did not work. In the EC2 environment Telegraf seems to be able to capture this data. This is very useful as in EC2 usage they also charge you for your network traffic. So I need to be able to monitor it and to figure out how much network traffic I will be generating. This chart shows I have initially used much more inbound traffic (uploading my server JAR files etc). However, as the number of active clients rises to 1300+, the amount of transferred “data out” also rises quite fast and passes “data in” towards the end. This tells me if the app ever got really popular I would probably be hitting the free tier limits here. But not a real problem at this point, fortunately or unfortunately (I guess more unfortunately ๐Ÿ™‚ ).

That is it for the EC2 experiment at this time. However, one final experiment. This time I tried in my local network with the server on one host and the tester on another. Meaning the traffic actually has to hit the router. Would that be interesting? I figured not but there was something..

It looked like this:

mini

What is interesting here? The fact that the session rate starts to slow down already, and the frequency variation for the tester gets big fast after about 2000 active sessions. So I guess the network issue is not so much just for EC2 but for anything passing beyond localhost. Conforting that, in the sense that the connection to the much further EC2 instance does not seem to make such as big difference as I initially thought from my EC2 tests.

Finally, an interesting part to investigate in this would be to run this local network test while also using SNMP to query the router for various metrics. As I have previously also written a Python script to do that, I might try that at one point. Just because I am too curious.. But not quite now. Cheers.

Configuring EC2 instance connection firewall rules within the VPC (security groups)

OK, again with them Minecraft servers. This time I already had a bunch of them up and running on different free tier instances on EC2. Now I wanted to add a database and connect some of the server plugins to this. The DB should run in its own EC2 instance and the Minecraft servers on their own. So how do I configure my instances so the Minecraft server instances can talk to the DB server but nothing outside my instances can talk to it?

EC2 uses Security Groups to define the firewall configurations. These are simple types of rules such as “Allow TCP from IP X”. But I cannot configure this with the other instances public IP address since all the instances keep changing their IP addresses as they are shut down and started again. So how can I do it? The solution is to put all the Minecraft servers in a specific Security Group. Then set the IP address where the DB server instance allows communications from to the name of the Security Group the Minecraft servers are on. And of course the port the DB server listents to.

The weirdest part of this is the poor documentation as I would not expect to write textual group names in a box expecting an IP address (numbers and dots). Only way I finally figured this out was to look at the “default” security group which is configured by default for such policy.

And how do you rename an existing security group? You dont (sheesh). But you can use an existing one, clone it, and give it a new name. Then go to EC2 instance management console, choose something like “network->change security group” to define your new security group for the instances. Then delete the old group.

Most of this is only possible if the instances are in the same VPC (virtual private cloud). Good thing is, AWS seems to create a VPC and link everything to this by default these days..

Updating configuration files for EC2 dynamic IP’s

Needed to connect some of the EC2 instances running my kids Minecraft servers. Problem is, I am cheap and run a bunch of the free tier instances. I need to stop and start the instances all the time to save on the hosting costs. This results in all of them getting new IP addresses all the time, which makes a mess of the application server configurations. I can use a dynamic DNS service to address the basic requirement to give the main server an address. But then I also needed to connect several of them together, and their addresses keeps changing..

The solution is actually very simple (as usual). The AWS development kit provides means to query all my AWS instance information, including their names and IP addresses. The name is actually just a tag given to the instance, so any other tag could be used as well. To give a simple interface to redo configurations I just query the data, put all the addresses using their tag information as keys and use a template engine to generate a list of the IP addresses preformatted for the server configuration. The server in this case being BungeeCord, which needs to be able to forward players to the different EC2 instances.

Java code:

    AWSCredentials credentials = null;
    try {
      credentials = new ProfileCredentialsProvider().getCredentials();
    } catch (Exception e) {
      throw new AmazonClientException("Cannot load AWS credentials.", e);
    }
    AmazonEC2 ec2 = new AmazonEC2Client(credentials);
    Region eu = Region.getRegion(Regions.EU_CENTRAL_1);
    ec2.setRegion(eu);
    DescribeInstancesResult result = ec2.describeInstances();
    List<Reservation> reservations = result.getReservations();
    for (Reservation res : reservations) {
      List<Instance> instances = res.getInstances();
      for (Instance ins : instances) {
        String ip = ins.getPublicIpAddress();
        List<Tag> tags = ins.getTags();
        for (Tag tag : tags) {
          String key = tag.getKey();
          if (key.equals("Name")) {
            String name = tag.getValue();
            vc.put(name + "-ip", ip);
          }
        }
      }
    }

    StringWriter sw = new StringWriter();
    velocity.mergeTemplate("template.vm", "UTF8", vc, sw);
    write(sw.toString(), "generated.txt");

With this, it gets the names of all of my EC2 instances, loads a Velocity template and sets a key-value pair with key=-ip and the value as IP. I then run Velocity to generate a configuration file with the current instance IP addresses filled in. Now the kids can just run the command to update the server config.