Monday, October 29, 2007

LoadRunner Scripting Challenge - LiveJournal

My last few posts have unabashedly been about LoadRunner even though I vowed to diversify my writings into more distinct areas. But this one again is about LoadRunner and is too interesting to pass...

Background:

I had an old journal in LiveJournal back from the times when 'blogging' wasn't a very familiar word or activity and most of the popular blogging sites like Blogger and WordPress (these are the only free ones that I've liked so far) didn't exist. I came across LiveJournal and was immediately hooked. I wrote prolifically at that time and I should also mention that my creative writing abilities then were much superior to my current showcase. But over time, I've outgrown LiveJournal because their free version doesn't provide the most needed features and the one that does is ad-supported. To get rid of the ads, you have to pay a nominal amount. I think that wasn't the only reason and I was a paid member for sometime supporting their efforts. But I thought I needed a more traditional 'blog' rather than a journal. I also haven't been writing much lately and have no communities that I would like to keep track of in LiveJournal.

So the result was some of my most cherished writings were stored in LiveJournal's history of posts and I wanted to import all of them to a WordPress blog. WordPress allows importing an XML file of LiveJournal formatted entries. The problem however is that LiveJournal allows exporting the posts only by month. So if I wanted to export my posts from back in 2002, I would have to export it month by month for over 6 years worth of posts. That was certainly not very exciting. But soon it became pretty exciting when I decided to use LoadRunner to export all my posts month by month in XML format and I learned something very interesting in the process - which is of course the whole idea of this post.

Challenge:

When I recorded the script by going to the export page and logging in, I noticed in the script that my password wasn't hard coded anywhere. I expected to find it with the request when I submitted the form containing my username and password. Instead, there were the "chal" field and the "response" field and the password field was blank.

web_submit_data("login.bml",
"Action=http://www.livejournal.com/login.bml?ret=1",
"Method=POST",
"RecContentType=text/html",
"Referer=http://www.livejournal.com/?returnto=/export.bml",
"Snapshot=t5.inf",
"Mode=HTML",
ITEMDATA,
"Name=returnto", "Value=/export.bml", ENDITEM,
"Name=mode", "Value=login", ENDITEM,
"Name=chal", "Value=c0:1193421600:2223:300:DsqGiIDCk0A4hs9sNsmH:c7f09b46d0ba2d82bb68945ea84532fc", ENDITEM,
"Name=response", "Value=b84434e3c2d193c4a1c345d7874a89a3", ENDITEM,
"Name=user", "Value=myusername", ENDITEM,
"Name=password", "Value=", ENDITEM,

Honestly, that was surprising to me. At that time, I couldn't come up with any explanation of how my user ID can be authenticated and successfully logged in without sending the password in form submission. I could smell the hint of another challenge. If you want to try it before reading on to get a better handle, try recording a script on LiveJournal (http://www.livejournal.com) by logging in and making it work.

Solution:

On thinking a little more and looking at the script again, I realized that they could be using JavaScript to generate the response which was some kind of hashed form of the password. On viewing the source of the initial page, I found a js file with this code:

var pass = pass_field.value;
var chal = chal_field.value;
var res = MD5(chal + MD5(pass));
resp_field.value = res;
pass_field.value = ""; // dont send clear-text password!

So on further investigation, I figured what was happening:

1. On initial navigation to any page with the login form, a challenge string is sent to the client. I guess it has to have some kind of expiration time because if you use it sometime later with the correct response, it'll return that the challenge has expired.

2. An event handler is registered with the submit button of the login form.

3. When the submit button is pressed, the event handler takes the field values including the hidden challenge string. It takes care of creating an MD5 digest of the challenge string concatenated with the MD5 digest of the password and clearing the actual password field value. This is the value that you see in the "response" form parameter in the code above. The values are then submitted and the server takes care of validating the response value.

This page explains this in high-level: http://blog.paranoidferret.com/index.php/2007/07/22/secure-authentication-without-ssl-using-javascript/

So to make the LR script work, I had to do 2 things:

1. Get the value of the challenge by correlating appropriately

2. Once I have the challenge string, create the 'response' value by getting the MD5 digest of the challenge string concatenated with digest of the password string.

The MD5 digest calculation is based on the MD5 Message Digest Algorithm by Ron Rivest of MIT. There was no way I was going to write an MD5 function myself. A quick search revealed some implementations of this algorithm. The one I used was from http://www.fourmilab.ch/md5/ because this was in C and saved me from including a lot of legalese. All I had to do was include the header and C file and use the MD5 functions. So this is the outline of final script. I have left out the details which should serve as an interesting exercise.

1. Firstly, get the MD5 digest of the password. This will later be used to concatenate to the challenge string.

MD5Init(&md5c);
MD5Update(&md5c,(unsigned char *)mesg,strlen(mesg));
MD5Final(signature,&md5c);

where mesg is the password string.

2. Correlate the script to save the challenge string at the appropriate place. This is the string that is sent by the server as a hidden form parameter of the login form.

3. Once you have the challenge string, concatenate the MD5 digest of the password (calculated in Step 1) to it. Remember that this digest has to be concatenated as a lower-case hexadecimal number. Get the MD5 digest of this concatenated string.

4. Pass this MD5 digest calculated in Step 3 as the value to 'response' parameter in the login step:

"Name=chal", "Value={chal}", ENDITEM,
"Name=response", "Value={response}", ENDITEM,

That should do it. Now I can parameterize the year and month values in export step and export all my journal entries by month in XML files. It was a simple task to concatenate all the files into 1 big XML file and importing it through WordPress blogging features. I'm a happy WordPress blog user now...we'll just have to see how long that lasts before I start looking for other options.

I guess I should also add that the exercise gave me a great opportunity to learn how web site authentication can be handled without using SSL and how I can script that in LoadRunner.

Note 1: Since creating this script, I explored LiveJournal's server protocol documentation and found that they have documented the authentication mechanism pretty well. Great work in encouraging other users to build custom clients.

Note 2: I felt I should mention that LoadRunner licensing agreement specifically prohibits using the software to run tests on public domain sites. It basically means that you cannot load up your LR Controller with the script you created above and run a load test with any number of virtual users.

Tuesday, October 9, 2007

Random Virtual User Pacing in LoadRunner

I guess there's always a first time. I had never used LoadRunner's random virtual user (VU) pacing earlier, but am working on a project that will need to use this feature for the first time. And as I thought about it a little more, I may start using it more frequently now. Here's how it happened:

This is one of the rare projects that provided me with excellent documentation - not only system documentation like the system specs, API Guides etc but also performance requirements like actual production volume reports and capacity models that estimated the projected volumes.

The capacity models estimated the maximum transaction load by hour as well as by minute (max TPM). What I needed to do was take maximum hourly load, divide it by 60 to get a per minute transactional load and use this as the average TPM. The idea was to vary the VU pacing so that over the whole duration of test, average load stays at this Average TPM but it also reaches the Max TPM randomly.

For example, if the maximum hourly transaction rate is 720 requests and maximum TPM is 20, the average TPM will be 720/60 = 12 and I will need to vary the pacing so that the load varies between 4TPM and 20TPM and averages to around 12TPM.

The Calculation:

To vary the transactional load, I knew I had to vary the VU Pacing randomly. Taking above example, I had to achieve 12TPM and I knew the transactions were taking around 1-2 seconds to complete. So I could have the pacing of around 120 seconds if I needed to generate a fixed load of 12TPM with a 5 second Ramp-up and 24 users.

Script TPM Number of VUs Pacing (sec) Ramp Up
Script 1 12 24 120 1 VU/5sec

So now to vary the TPM to x with the same 24 virtual users, I will need to have a pacing of 24*60/x. I got this from an old-fashioned logic which goes in my head this way:

24 users with a pacing of 60 seconds generate a TPM of 24
24 users with a pacing of 120 seconds generate a TPM of 24 * 60/120
24 users with a pacing of x seconds generate a TPM of 24 * 60/x

So using above formula, to vary the load from 20 to 4TPM I will need to vary the VU pacing from 72 to 360. So now we have:

Script TPM Number of VUs Pacing (sec) Ramp Up
Script 1 4 to 20 24 Random (72 to 360) 1 VU/5sec


Of course, there's a caveat. The range of 72 to 360 seconds has an arithmetic mean of 216. 120 is actually the harmonic mean of the 2 numbers. So the actual variation in TPM will depend on the distribution of random numbers that LoadRunner generates within the given range. If it generates the numbers with a uniform distribution around the arithmetic mean of the range, then we have a problem.

I ran a quick test to find this out. I created an LR script and used the rand() function to generate 1000 numbers between the range with the assumption that LR uses a similar function to generate the random pacing values.

int i;
srand(time(NULL));
for (i=0;i<1000;i++){
lr_output_message("%d\n", rand() % 289 + 72);
}

And of course, the average came out close to the arithmetic mean of 72 and 360, which is 216.

So with the assumption that the function used by LoadRunner for generating random pacing values generates numbers that are uniformly distributed around the arithmetic mean of the range, we'll need to modify the range of pacing values so that the arithmetic mean of the range gives us the arithmetic mean of the TPM that we want...phew. What it means is that the above pacing values need to be modified from 72 to 360 (arithmetic mean = 216) to 72 to 168 (arithmetic mean = 120). However, this gives us the TPM range of 20 to 8.6 TPM with a harmonic mean of 12TPM.

But I'll live with it. I would rather have the average load stay around 12TPM. So here are the new values. Note the asterisk on TPM. I need to mention in the test plan that the actual TPM will vary from 8.6 to 20TPM with an average of 12TPM.

Script TPM* Number of VUs Pacing (sec) Ramp Up
Script 1 4 to 20 24 Random (72 to 168) 1 VU/5sec