Thursday, April 1, 2010

Stress Testing

Prem Phulara had commented on my earlier post, What knowledge do you need to have in order to do good or best performance testing asking me to explain stress testing.

This post is dedicated to Prem. In this post, we will discuss stress testing.

Stress testing is a special type of performance testing. It is used to stress (or in other words, put a high load on) the system to determine its highest operational capacity. Stress tests establish the highest load that the system is able to withstand while still being operational (and not throwing too many errors). The additional benefit of stress testing is that it exposes the resources that run out the fastest when the load on the system approaches its maximum sustainable value. It is common to design a series of load tests with increasing loads to find out the maximum sustainable load.

Here are the tips to perform stress testing correctly but quickly:

1. Before you begin work on stress testing your system, you should be aware of the goals of your stress test. For example, your goal could be to determine the maximum throughput of your application for a given business transaction/ workflow on the given infrastructure or your goal could be to determine the maximum number of concurrent users with a given workload and the given infrastructure.

2. It is crucial that you identify the correct performance testing tool for your stress test. Otherwise, you would be forced to make compromises in your tests. See How to evaluate automated software test tools quickly?

3. You would need to model test script(s) according to the business transactions specified or implied in your goals.

4. If your scripts are correct, a large number of business transactions would execute during your tests. Therefore, ensure that you have a large set of test data that does not exhaust during the test. You may consider sourcing any existing test data from your application's database. Another way of quickly generating some test data is to use Microsoft Excel to extend a series. There are also free test data generators available on the web.

5. As with other tests, you should have an isolated (as far as possible) environment for conducting your tests. Unless you stress test a system on a stand-alone machine, your test environment may consist of the client machine(s), the load generator machine(s), the server(s) (web server(s), application server(s), database server(s) and so on) and the connectivity between the above. You would need your performance testing tool installed on at least one machine. You would set up your test, run it and analyze the test results from this machine. Depending on your choice of the performance testing tool, other load generator machine(s) may or may not require the full performance testing tool installed on them.

6. When you set up your first load test, you should model it realistically. For example, you should add the required scripts to the test, set up the specifications of the Virtual Users (e.g. initial VU, ramp-up, constant load, ramp-down and distribution of VU assigned to each script etc.) and give the run-time settings (e.g. test duration, time lag between iterations, network bandwidth distribution of the VU, their browser distribution etc.). You would really be limited by the features provided by the chosen performance testing tool.
You should specify a low load in your first load test. This is when you could check if your script(s) work correctly (e.g. the script reads the test data, it accesses the system and interacts with it, it is able to send data to the system which is accepted by the system and is able to send/ receive data to/ from other script(s) as required). If you have multiple scripts in your test, you could check if the scripts run in the correct sequence or in parallel, as specified. When your test is running, you should check the performance of each machine in your
test environment e.g. check that the server(s) and load generator(s) have become busier but not to their limit. On Microsoft Windows machines, you may use the PerfMon (Start > Run > PerfMon) tool to add your desired counters and monitor them during the test. See other tips in the post, Performance testing using HP LoadRunner software.

7. After establishing that your load test is valid, you should increase the load systematically. Do not go "big bang" and deploy the load that you think is the maximum sustainable load. For example, if you think that your system may support up to 1000 concurrent Virtual Users, start with a load test with 100 VU and keep on adding 100 VU every 5 minutes. You may find that your system supports 800 concurrent VU but crashes with a load of 900 concurrent VU. After resetting your test environment, you should then start with a load test with 800 VU and keep on adding 10 VU every 30 seconds. Then you may find that your system supports 850 VU but crashes with a load of 860 concurrent VU. Repeat the last test at least another couple of times to confirm your test result.

8. If you recall, I mentioned that the additional benefit of a stress test is to identify bottlenecks or resources that run out the fastest when the load is increased. You should analyze the test results of several of your last tests to find out about these bottlenecks. Maybe it is the network which is clogged, or your web server runs out of available memory. Confirm your finding by repeating the last few tests. Your system's capacity may be improved by increasing the resource that runs out first. Ensure that you include the details of the important bottlenecks observed during your stress test in your test report.