Tuesday, July 24, 2007

Why I'll continue to recommend and use LoadRunner instead of eLoad

I've raised this concern in meetings with Empirix and haven't heard this as being a priority. However, this is a major reason why I'll continue to recommend and use LoadRunner instead of eLoad:

Lack of ability to generate specified transactional load in eLoad and why it's very important

Background:
Web based (3 and n-tier) applications are different from client-server types of applications in terms of how they deal with user load (specifically how the system performs when users use the system) as explained below:

Load in web-based applications (3 or n-tier) is generally represented in terms of number of transactions expected per unit time (seconds/minutes etc) and not in terms of users per unit time. This is because in web/application systems, a user using the system can generate varying amount of load on the system depending on how active the user is. For example, a single user who submits 10 transactions every second will use much more system resources than 100 users who submit 1 transaction every minute (given the 'transaction' as defined is equal in both cases, and let's rule out caching as well). Even though those 100 users in latter case are using some fixed amount of resources on the system (for maintaining session information, other objects etc) that is 100 times more than that being use by the single user, significant resources are only consumed when the users are actively interacting with the system or waiting for response from the system.

So if while planning performance tests for an application, I get business requirements stating that X users are expected to use the system in a day, I have to work with them to further refine these requirements to specify what is the user activity profile (or scenario profile) and what is the frequency of their actions. For example, how many users will logon every hour/minute, how many will navigate to certain web pages or consume a service every hour/minute and how many will logoff every hour/minute. For simple scenarios, a transaction can be 1 to 3 steps - logon, stepA and logoff. Based on answers from previous questions, I will then refine the requirements to state how many transactions are expected to be executed every minute (or TPM). For complex scenarios, multiple transaction types may have to be defined - usertype1(stepA, stepB...stepX), usertype2(stepB...stepY) etc. In this case as well, I will have to use same questions to refine the requirements for each transaction type, i.e., for each transaction type, what is the expected transactional load in TPM or TPS (transactions per second). Then I'll go ahead and create the load test scenarios based on these transactional loads.

This kind of transaction based load specification is even more important in XML over HTTP/web services applications because there is no concept of user. Rather the system deals with requests (or transactions) that can come from a third-party web application, a custom client etc. that are generally referred to as service consumers. There is rarely a need to maintain session information and each transaction is idempotent (at least from the web/application server perspective). Note that the whole transaction may be non-idempotent but still be made up of a series of idempotent transactions. For example, a credit check service may be non-idempotent because it writes the number of inquiries to the user's credit profile and gives it a weight in calculating the user's credit score. But from the web and application server perspective and especially in test environments where limited test data is available, each transaction can be considered idempotent since we don't care about test user's credit score and need to generate a production load using limited data.

How the 2 tools handle (or do not handle) this:
Coming back to the point of why I prefer to use LoadRunner over eLoad in these scenarios (there are other reasons too but let's just focus on this one for now)...
Both eLoad and LoadRunner let me specify the iteration delay (or VU pacing in LoadRunner terminology, don't confuse it with VU pacing in eLoad which actually means think time!) that controls how long a Virtual User (VU) waits before starting the next iteration. However there is only one way to specify this in eLoad - from the time previous iteration ends (see Figure 1). This creates a problem because the time when next iteration will be started depends on how long the previous transaction took. For example, if you specify this delay to be 30 seconds and the previous transaction took 30 seconds, next iteration will be started after the end of 30 + 30 = 60th second. However, if the previous transaction took 5 seconds to complete, next transaction will start after 5 + 30 = 35th second. Now suppose you want to generate a transactional load of 10 TPM. You use 10 virtual users and specify the delay to be 55 seconds expecting each transaction to take around 5 seconds. This way you can have each of the 10 VUs submitting 1 transaction every minute, thus generating a load of 10 TPM. But when you run the load, server (or the application under test) gets busy and takes 30 seconds or more to return a response. eLoad's virtual users are still going to wait 30 + 55 seconds before starting subsequent iterations, thus reducing the overall transactional load by almost 30% (or 10 * (1/85) * 60 = 7 TPM as compared to required 10 * (1/60) * 60 = 10 TPM). But you really need to find out how the system behaves at production load of 10 TPM even at busy periods...do the requests keep queuing up and ultimately cause the system to become unresponsive or the system returns to stable state soon after! Well...hard luck, because eLoad is going to decrease the load if the system starts taking longer to return the responses. You can possibly add more virtual users to the scenario to increase the load when this happens but I don't want to have to sit and watch the load test for this to happen when I'm running a 12 hour test and am only allowed to run the tests in non-business hours. And I'm not even sure if I'll be able to calculate that fast how many users to increase/decrease everytime this happens.

Figure 1: eLoad VU Settings


LoadRunner gives 3 options in setting the iteration delay/pacing (see Figure 2):
a) As soon as the previous iteration ends
b) After the previous iteration ends: with a fixed/random delay of XXX seconds (In case of random delay, it lets you specify a range)
c) At fixed/random intervals, every XXX seconds.

Figure 2: LoadRunner VU Settings



So the above situation is handled very easily by selecting the 3rd option and choosing a delay of 60 seconds. In the above example, if the previous iteration took 5 seconds, it'll wait for 55 seconds and if it took 30 second, it'll wait another 30 seconds before starting next iteration. No matter how long the previous iterations take (as long as they are less than 60 seconds), it will always start subsequent transactions at specified intervals of 60 seconds from the start of previous transactions. thus keeping the load stable at 10 TPM. If I expect the transactions to take longer than 60 seconds, I can start with more VUs and increase the delay. For example, I can use 20 VUs and set the delay to 120 seconds. This will still generate a load of 10 TPM if I specify the ramp-up time correctly.

Conclusion:
So my conclusion is that I will use LoadRunner as much as I'm able to. In case you're wondering why my company has 2 load testing tools when buying 1 is costly enough, my team is under a different business unit that owns eLoad licenses because LoadRunner was considered too expensive. However, there is another business unit that has LoadRunner licenses and even though I had to go through a lengthy procedure, I got them to agree on letting us use LoadRunner and charge us for the usage.

---------------------------------------

Note 1: You can argue that load in real-life production scenario is never stable. But when it comes to defining the system's performance, I prefer to use multiple scenarios with increasingly different loads. For example, if the business expects about 10,000 transactions per day, considering a 10 hour business day this comes to 16.67 TPM. I will run load tests at 16 TPM (1x), around 40 TPM (2.5x) and around 80 TPM (5x) to give them numbers on how the system can be expected to perform if the transactional load varies from 16 to 80 TPM. I will probably also run some stress tests by running a background load of stable 16 TPM and then submitting a batch of multiple requests (100/200 etc.) to see how the system recovers in this case. Again, this will depend on business requirements and expectations. Also, If I really need to vary the load, I would rather use the random option in LoadRunner and vary the load but still keep the overall load stable.

Note 2: I am not implying that defining load in terms of number of concurrent users is not important. For some applications (e.g., citrix or remote desktop applications) it is the most important load defining criteria. Even for web based applications, you may want to find out how many users you can concurrently support before the server runs out of memory. This will help you determine when you'll need to buy extra hardware. But any commercial load testing tool has to support the ability to generate transactional load as well now that XML and webservices are becoming more and more common.

Note 3: Current eLoad version that I'm using is 8.10 and LoadRunner is 8.1.4

Friday, July 13, 2007

WebScarab

There have been times during LoadRunner scripting that I needed to see the low-level HTTP request that is being sent from my client (which I am using to record, e.g., a browser, or a custom client) to the server. Earlier, I used Ethereal (http://www.ethereal.com/) successfully but the problem is that it doesn't support SSL directly. So if the communication is over SSL, all I would see is encrypted data and there was no way to see the headers/data being transferred. This made me look for alternatives. I recently came across WebScarab and I wouldn't say it's free of bugs but I'm sticking with this tool for as far as I can see in future. Here's how I was able to solve some of the problems in LoadRunner scripting using this tool.

Solution 1: (SSL Intercept)
First things first, WebScarab proxy
is able to observe both HTTP and encrypted HTTPS traffic, by negotiating an SSL connection between WebScarab and the browser instead of simply connecting the browser to the server and allowing an encrypted stream to pass through it.
This is a major advantage. So I no longer have to hope that one of the test environments will not have SSL implemented and will let me observe the non-SSL HTTP traffic. Watching browser traffic was just as easy as starting the WebScrarab proxy and pointing the browser to the local proxy. It gives the options to intercept request and/or responses and lets you modify the requests in any way before passing them on the server. For a custom client, I can capture the exact headers being sent and add them to my web_custom_request:
    web_add_header("Cache-Control", "no-cache");
web_add_header("SOAPAction", "\"\"");
web_add_header("Accept-Encoding", "gzip, deflate");


And I can also copy the body if it's an HTTP XML post for example, to get the exact XML data being sent and put that in the body of the request:
    web_custom_request("SampleService",
"URL={URL}",
"Method=POST",
"EncType=text/xml; charset=utf-8",
"TargetFrame=",
"Resource=0",
"RecContentType=text/xml",
"Mode=HTTP",
"Body="…

and so on. See screenshot.


Solution 2: (Reverse Proxy/Act as a web server)
As it happened, the client I was using (for more details on this, see my previous post) had pre-configured options of selecting the URL. So I didn't have any way to point the client to the local WebScarab proxy. I looked through the help contents and found this:
WebScarab allows you to specify a "base address" for a Listener. The base address instructs the Listener to operate as a reverse proxy, and should be formatted as a HTTP or HTTPS URL. In this mode, it will act as a web server, rather than as a proxy server, and will construct the URL by concatenating the base URL and the path that appears in the request line. If the base URL is an HTTPS URL, it will immediately negotiate an SSL tunnel prior to trying to read the request from the browser. This is useful for the situation where you are using a custom HTTP-based client that does not support configuring an upstream proxy. Simply change the hosts file on the computer on which the custom client is running to point the site in question to the computer on which WebScarab is running on, and WebScarab will receive requests for the targeted website.
This meant that I could use the hosts file to point to the proxy and specify the base address in the proxy listener to intercept the requests. In a few tries, I was able to intercept the SSL requests over a non-local base address. Again, I could get the headers and body and use it in the web_custom_request. See screenshot.





Solution 3: (SSL Server Certificate)
Another client that I was recording my script against had a similar issue. The client didn't support pointing to an upstream proxy so I configured the hosts file to point to the listener proxy. However when running the client, it threw this exception:

{http://xml.apache.org/axis/}stackTrace:javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target.

I tried adding the WebScarab certificate to the keystore through Java Control Panel but no luck. After googling a little more, I came across this forum thread: http://forum.java.sun.com/thread.jspa?threadID=220329&tstart=165
Apparently, Java uses its default keystore and to use another keystore, it has to be created and provided in one of the arguments. So after importing the WebScarab certificates into IE trusted certificates and then exporting it into .cer, creating a keystore with the certificates and modifying the batch run file to add
-Djavax.net.ssl.trustStore= -Djavax.net.ssl.trustStorePassword=

I had my fingers crossed when running the client again. But fortunately, this time it worked as expected and intercepted the HTTPS requests without any errors. Once again, I used the custom headers and the XML body to create LR web_custom_header request.