Random Sync: 2007

Wednesday, December 19, 2007

Business Case for Testing Tools & Automation

This post had been lying in my drafts for a long time. For some reason, I decided not to publish it. But it's still relevant...except that I have submitted the business case since then and it's lying at somebody at Finance's desk or email Inbox for review/approvals.

----------------------------------------------------------------------------------

I'm currently working on a business case to invest in testing tools and functional test automation. The case will be sent to upper management and finance for approval and once they do, we're going to start working on building an automation framework. I foresee a lot of hurdles and am not keeping my hopes high based on the tightening of budget and how the business case is coming out to be.

One of the things that I have to add in the business case is financial analysis which includes an estimate on return of investment. And it has been a very "enlightening" process. Based on the current estimates on the capital and labor costs, it'll take us 2.9 years to get a return on investment. And of course, it's based on a lot of assumptions and I'm kind of skeptic when claiming that number for ROI.

You see, one of things that are never in short supply at my workplace is the list outstanding tasks. And with that comes extreme work pressure and very little time to invest in exploring new technologies and coming up with new ideas. I'm quite sure that designing the automation framework will be a time consuming and learning process and I will have to be devoted at least half-time (if not more) in this effort. And I'm quite sure that when we actually start working on that effort, the estimates that I put in for labor in the business case will start looking like best case scenario.

The brighter side is that I have now under my belt the experience of creating a business case. And once/if approved, we'll be undertaking the effort of functionally automating the test cases for our applications. We're also planning to expand our QA service by offering the testing tools and/or automation to other teams.

My colleague and I had been talking about the design of the framework since we first conceptualized the idea of automation. We had this whole idea in our minds on how to make it reusable, to maximize customizability with least rework. He has experience in building these kind of frameworks and I have had a brief encounter at my last job. I recently looked up online and found that it is now one of the established practices, one of the buzzwords surrounding test automation process - Keyword Driven Test Automation Frameworks. It must have been a long time since both of us were in automation business because what I read matched what we had been planning to do. Seems like a lot of people have written their experiences with building automation frameworks with this idea. Seems like we'll be able to use some of the knowledge to our benefit.

Coming up...

A few weeks ago, I spent some time going over JMeter and building some advanced web test plans. But then I got distracted and buried under some intense projects, including a 1-week training on a product that I didn't know much about and that I have to be an advanced user of within a few weeks. (BTW...if you want to know why attending a 1-week advanced training on a product without having a solid background is a bad idea, let me know).

Now I have some time again but it's right before my vacation and I just don't feel like getting immersed into it again. But when I come back, I plan on posting some of my notes on how to build an advanced web test plan in JMeter. I plan on using a more advanced scenario than the one with basic Get requests, probably something similar to my previous post about using an MD5 library to create a LoadRunner script. If I can't find something else, I'll use that very example. I also plan on exploring some webservices and pure XML over HTTP kind of scripts. Stay tuned...

Oh... and I should also mention my reasons for looking into JMeter. It has been increasingly frustrating to borrow time on LoadRunner Controller from an external team, partially because of having to justify to the project team the additional charge back incurred to the project. I can't believe how many emails I have to write to justify a few 100 dollars. But anyway, turning misery into a learning experience, I wanted to explore if JMeter provides enough capabilities and is robust enough to be used for production performance testing.

Monday, October 29, 2007

LoadRunner Scripting Challenge - LiveJournal

My last few posts have unabashedly been about LoadRunner even though I vowed to diversify my writings into more distinct areas. But this one again is about LoadRunner and is too interesting to pass...

Background:

I had an old journal in LiveJournal back from the times when 'blogging' wasn't a very familiar word or activity and most of the popular blogging sites like Blogger and WordPress (these are the only free ones that I've liked so far) didn't exist. I came across LiveJournal and was immediately hooked. I wrote prolifically at that time and I should also mention that my creative writing abilities then were much superior to my current showcase. But over time, I've outgrown LiveJournal because their free version doesn't provide the most needed features and the one that does is ad-supported. To get rid of the ads, you have to pay a nominal amount. I think that wasn't the only reason and I was a paid member for sometime supporting their efforts. But I thought I needed a more traditional 'blog' rather than a journal. I also haven't been writing much lately and have no communities that I would like to keep track of in LiveJournal.

So the result was some of my most cherished writings were stored in LiveJournal's history of posts and I wanted to import all of them to a WordPress blog. WordPress allows importing an XML file of LiveJournal formatted entries. The problem however is that LiveJournal allows exporting the posts only by month. So if I wanted to export my posts from back in 2002, I would have to export it month by month for over 6 years worth of posts. That was certainly not very exciting. But soon it became pretty exciting when I decided to use LoadRunner to export all my posts month by month in XML format and I learned something very interesting in the process - which is of course the whole idea of this post.

Challenge:

When I recorded the script by going to the export page and logging in, I noticed in the script that my password wasn't hard coded anywhere. I expected to find it with the request when I submitted the form containing my username and password. Instead, there were the "chal" field and the "response" field and the password field was blank.

web_submit_data("login.bml",
"Action=http://www.livejournal.com/login.bml?ret=1",
"Method=POST",
"RecContentType=text/html",
"Referer=http://www.livejournal.com/?returnto=/export.bml",
"Snapshot=t5.inf",
"Mode=HTML",
ITEMDATA,
"Name=returnto", "Value=/export.bml", ENDITEM,
"Name=mode", "Value=login", ENDITEM,
"Name=chal", "Value=c0:1193421600:2223:300:DsqGiIDCk0A4hs9sNsmH:c7f09b46d0ba2d82bb68945ea84532fc", ENDITEM,
"Name=response", "Value=b84434e3c2d193c4a1c345d7874a89a3", ENDITEM,
"Name=user", "Value=myusername", ENDITEM,
"Name=password", "Value=", ENDITEM,

Honestly, that was surprising to me. At that time, I couldn't come up with any explanation of how my user ID can be authenticated and successfully logged in without sending the password in form submission. I could smell the hint of another challenge. If you want to try it before reading on to get a better handle, try recording a script on LiveJournal (http://www.livejournal.com) by logging in and making it work.

Solution:

On thinking a little more and looking at the script again, I realized that they could be using JavaScript to generate the response which was some kind of hashed form of the password. On viewing the source of the initial page, I found a js file with this code:

var pass = pass_field.value;
var chal = chal_field.value;
var res = MD5(chal + MD5(pass));
resp_field.value = res;
pass_field.value = ""; // dont send clear-text password!

So on further investigation, I figured what was happening:

1. On initial navigation to any page with the login form, a challenge string is sent to the client. I guess it has to have some kind of expiration time because if you use it sometime later with the correct response, it'll return that the challenge has expired.

2. An event handler is registered with the submit button of the login form.

3. When the submit button is pressed, the event handler takes the field values including the hidden challenge string. It takes care of creating an MD5 digest of the challenge string concatenated with the MD5 digest of the password and clearing the actual password field value. This is the value that you see in the "response" form parameter in the code above. The values are then submitted and the server takes care of validating the response value.

This page explains this in high-level: http://blog.paranoidferret.com/index.php/2007/07/22/secure-authentication-without-ssl-using-javascript/

So to make the LR script work, I had to do 2 things:

1. Get the value of the challenge by correlating appropriately

2. Once I have the challenge string, create the 'response' value by getting the MD5 digest of the challenge string concatenated with digest of the password string.

The MD5 digest calculation is based on the MD5 Message Digest Algorithm by Ron Rivest of MIT. There was no way I was going to write an MD5 function myself. A quick search revealed some implementations of this algorithm. The one I used was from http://www.fourmilab.ch/md5/ because this was in C and saved me from including a lot of legalese. All I had to do was include the header and C file and use the MD5 functions. So this is the outline of final script. I have left out the details which should serve as an interesting exercise.

1. Firstly, get the MD5 digest of the password. This will later be used to concatenate to the challenge string.

MD5Init(&md5c);
MD5Update(&md5c,(unsigned char *)mesg,strlen(mesg));
MD5Final(signature,&md5c);

where mesg is the password string.

2. Correlate the script to save the challenge string at the appropriate place. This is the string that is sent by the server as a hidden form parameter of the login form.

3. Once you have the challenge string, concatenate the MD5 digest of the password (calculated in Step 1) to it. Remember that this digest has to be concatenated as a lower-case hexadecimal number. Get the MD5 digest of this concatenated string.

4. Pass this MD5 digest calculated in Step 3 as the value to 'response' parameter in the login step:

"Name=chal", "Value={chal}", ENDITEM,
"Name=response", "Value={response}", ENDITEM,

That should do it. Now I can parameterize the year and month values in export step and export all my journal entries by month in XML files. It was a simple task to concatenate all the files into 1 big XML file and importing it through WordPress blogging features. I'm a happy WordPress blog user now...we'll just have to see how long that lasts before I start looking for other options.

I guess I should also add that the exercise gave me a great opportunity to learn how web site authentication can be handled without using SSL and how I can script that in LoadRunner.

Note 1: Since creating this script, I explored LiveJournal's server protocol documentation and found that they have documented the authentication mechanism pretty well. Great work in encouraging other users to build custom clients.

Note 2: I felt I should mention that LoadRunner licensing agreement specifically prohibits using the software to run tests on public domain sites. It basically means that you cannot load up your LR Controller with the script you created above and run a load test with any number of virtual users.

Tuesday, October 9, 2007

Random Virtual User Pacing in LoadRunner

I guess there's always a first time. I had never used LoadRunner's random virtual user (VU) pacing earlier, but am working on a project that will need to use this feature for the first time. And as I thought about it a little more, I may start using it more frequently now. Here's how it happened:

This is one of the rare projects that provided me with excellent documentation - not only system documentation like the system specs, API Guides etc but also performance requirements like actual production volume reports and capacity models that estimated the projected volumes.

The capacity models estimated the maximum transaction load by hour as well as by minute (max TPM). What I needed to do was take maximum hourly load, divide it by 60 to get a per minute transactional load and use this as the average TPM. The idea was to vary the VU pacing so that over the whole duration of test, average load stays at this Average TPM but it also reaches the Max TPM randomly.

For example, if the maximum hourly transaction rate is 720 requests and maximum TPM is 20, the average TPM will be 720/60 = 12 and I will need to vary the pacing so that the load varies between 4TPM and 20TPM and averages to around 12TPM.

The Calculation:

To vary the transactional load, I knew I had to vary the VU Pacing randomly. Taking above example, I had to achieve 12TPM and I knew the transactions were taking around 1-2 seconds to complete. So I could have the pacing of around 120 seconds if I needed to generate a fixed load of 12TPM with a 5 second Ramp-up and 24 users.

Script	TPM	Number of VUs	Pacing (sec)	Ramp Up
Script 1	12	24	120	1 VU/5sec

So now to vary the TPM to x with the same 24 virtual users, I will need to have a pacing of 24*60/x. I got this from an old-fashioned logic which goes in my head this way:

24 users with a pacing of 60 seconds generate a TPM of 24
24 users with a pacing of 120 seconds generate a TPM of 24 * 60/120
24 users with a pacing of x seconds generate a TPM of 24 * 60/x

So using above formula, to vary the load from 20 to 4TPM I will need to vary the VU pacing from 72 to 360. So now we have:

Script	TPM	Number of VUs	Pacing (sec)	Ramp Up
Script 1	4 to 20	24	Random (72 to 360)	1 VU/5sec

Of course, there's a caveat. The range of 72 to 360 seconds has an arithmetic mean of 216. 120 is actually the harmonic mean of the 2 numbers. So the actual variation in TPM will depend on the distribution of random numbers that LoadRunner generates within the given range. If it generates the numbers with a uniform distribution around the arithmetic mean of the range, then we have a problem.

I ran a quick test to find this out. I created an LR script and used the rand() function to generate 1000 numbers between the range with the assumption that LR uses a similar function to generate the random pacing values.

int i;
srand(time(NULL));
for (i=0;i<1000;i++){
lr_output_message("%d\n", rand() % 289 + 72);
}

And of course, the average came out close to the arithmetic mean of 72 and 360, which is 216.

So with the assumption that the function used by LoadRunner for generating random pacing values generates numbers that are uniformly distributed around the arithmetic mean of the range, we'll need to modify the range of pacing values so that the arithmetic mean of the range gives us the arithmetic mean of the TPM that we want...phew. What it means is that the above pacing values need to be modified from 72 to 360 (arithmetic mean = 216) to 72 to 168 (arithmetic mean = 120). However, this gives us the TPM range of 20 to 8.6 TPM with a harmonic mean of 12TPM.

But I'll live with it. I would rather have the average load stay around 12TPM. So here are the new values. Note the asterisk on TPM. I need to mention in the test plan that the actual TPM will vary from 8.6 to 20TPM with an average of 12TPM.

Script	TPM*	Number of VUs	Pacing (sec)	Ramp Up
Script 1	4 to 20	24	Random (72 to 168)	1 VU/5sec

Wednesday, September 26, 2007

TextPad Regular Expression

\x20+,\|,\x20+

If you use LoadRunner for creating load test scripts and TextPad to view/modify its parameters file AND you get a file with 3000 test records to be used for parameterizing the script with blank spaces before and after the comma delimiter that need to be trimmed, this is the regular expression to use in TextPad to search for all such occurrences and replacing them with a comma instead.

Tuesday, July 24, 2007

Why I'll continue to recommend and use LoadRunner instead of eLoad

I've raised this concern in meetings with Empirix and haven't heard this as being a priority. However, this is a major reason why I'll continue to recommend and use LoadRunner instead of eLoad:

Lack of ability to generate specified transactional load in eLoad and why it's very important

Background:
Web based (3 and n-tier) applications are different from client-server types of applications in terms of how they deal with user load (specifically how the system performs when users use the system) as explained below:

Load in web-based applications (3 or n-tier) is generally represented in terms of number of transactions expected per unit time (seconds/minutes etc) and not in terms of users per unit time. This is because in web/application systems, a user using the system can generate varying amount of load on the system depending on how active the user is. For example, a single user who submits 10 transactions every second will use much more system resources than 100 users who submit 1 transaction every minute (given the 'transaction' as defined is equal in both cases, and let's rule out caching as well). Even though those 100 users in latter case are using some fixed amount of resources on the system (for maintaining session information, other objects etc) that is 100 times more than that being use by the single user, significant resources are only consumed when the users are actively interacting with the system or waiting for response from the system.

So if while planning performance tests for an application, I get business requirements stating that X users are expected to use the system in a day, I have to work with them to further refine these requirements to specify what is the user activity profile (or scenario profile) and what is the frequency of their actions. For example, how many users will logon every hour/minute, how many will navigate to certain web pages or consume a service every hour/minute and how many will logoff every hour/minute. For simple scenarios, a transaction can be 1 to 3 steps - logon, stepA and logoff. Based on answers from previous questions, I will then refine the requirements to state how many transactions are expected to be executed every minute (or TPM). For complex scenarios, multiple transaction types may have to be defined - usertype1(stepA, stepB...stepX), usertype2(stepB...stepY) etc. In this case as well, I will have to use same questions to refine the requirements for each transaction type, i.e., for each transaction type, what is the expected transactional load in TPM or TPS (transactions per second). Then I'll go ahead and create the load test scenarios based on these transactional loads.

This kind of transaction based load specification is even more important in XML over HTTP/web services applications because there is no concept of user. Rather the system deals with requests (or transactions) that can come from a third-party web application, a custom client etc. that are generally referred to as service consumers. There is rarely a need to maintain session information and each transaction is idempotent (at least from the web/application server perspective). Note that the whole transaction may be non-idempotent but still be made up of a series of idempotent transactions. For example, a credit check service may be non-idempotent because it writes the number of inquiries to the user's credit profile and gives it a weight in calculating the user's credit score. But from the web and application server perspective and especially in test environments where limited test data is available, each transaction can be considered idempotent since we don't care about test user's credit score and need to generate a production load using limited data.

How the 2 tools handle (or do not handle) this:
Coming back to the point of why I prefer to use LoadRunner over eLoad in these scenarios (there are other reasons too but let's just focus on this one for now)...
Both eLoad and LoadRunner let me specify the iteration delay (or VU pacing in LoadRunner terminology, don't confuse it with VU pacing in eLoad which actually means think time!) that controls how long a Virtual User (VU) waits before starting the next iteration. However there is only one way to specify this in eLoad - from the time previous iteration ends (see Figure 1). This creates a problem because the time when next iteration will be started depends on how long the previous transaction took. For example, if you specify this delay to be 30 seconds and the previous transaction took 30 seconds, next iteration will be started after the end of 30 + 30 = 60th second. However, if the previous transaction took 5 seconds to complete, next transaction will start after 5 + 30 = 35th second. Now suppose you want to generate a transactional load of 10 TPM. You use 10 virtual users and specify the delay to be 55 seconds expecting each transaction to take around 5 seconds. This way you can have each of the 10 VUs submitting 1 transaction every minute, thus generating a load of 10 TPM. But when you run the load, server (or the application under test) gets busy and takes 30 seconds or more to return a response. eLoad's virtual users are still going to wait 30 + 55 seconds before starting subsequent iterations, thus reducing the overall transactional load by almost 30% (or 10 * (1/85) * 60 = 7 TPM as compared to required 10 * (1/60) * 60 = 10 TPM). But you really need to find out how the system behaves at production load of 10 TPM even at busy periods...do the requests keep queuing up and ultimately cause the system to become unresponsive or the system returns to stable state soon after! Well...hard luck, because eLoad is going to decrease the load if the system starts taking longer to return the responses. You can possibly add more virtual users to the scenario to increase the load when this happens but I don't want to have to sit and watch the load test for this to happen when I'm running a 12 hour test and am only allowed to run the tests in non-business hours. And I'm not even sure if I'll be able to calculate that fast how many users to increase/decrease everytime this happens.

Figure 1: eLoad VU Settings

LoadRunner gives 3 options in setting the iteration delay/pacing (see Figure 2):
a) As soon as the previous iteration ends
b) After the previous iteration ends: with a fixed/random delay of XXX seconds (In case of random delay, it lets you specify a range)
c) At fixed/random intervals, every XXX seconds.

Figure 2: LoadRunner VU Settings

So the above situation is handled very easily by selecting the 3rd option and choosing a delay of 60 seconds. In the above example, if the previous iteration took 5 seconds, it'll wait for 55 seconds and if it took 30 second, it'll wait another 30 seconds before starting next iteration. No matter how long the previous iterations take (as long as they are less than 60 seconds), it will always start subsequent transactions at specified intervals of 60 seconds from the start of previous transactions. thus keeping the load stable at 10 TPM. If I expect the transactions to take longer than 60 seconds, I can start with more VUs and increase the delay. For example, I can use 20 VUs and set the delay to 120 seconds. This will still generate a load of 10 TPM if I specify the ramp-up time correctly.

Conclusion:
So my conclusion is that I will use LoadRunner as much as I'm able to. In case you're wondering why my company has 2 load testing tools when buying 1 is costly enough, my team is under a different business unit that owns eLoad licenses because LoadRunner was considered too expensive. However, there is another business unit that has LoadRunner licenses and even though I had to go through a lengthy procedure, I got them to agree on letting us use LoadRunner and charge us for the usage.

---------------------------------------

Note 1: You can argue that load in real-life production scenario is never stable. But when it comes to defining the system's performance, I prefer to use multiple scenarios with increasingly different loads. For example, if the business expects about 10,000 transactions per day, considering a 10 hour business day this comes to 16.67 TPM. I will run load tests at 16 TPM (1x), around 40 TPM (2.5x) and around 80 TPM (5x) to give them numbers on how the system can be expected to perform if the transactional load varies from 16 to 80 TPM. I will probably also run some stress tests by running a background load of stable 16 TPM and then submitting a batch of multiple requests (100/200 etc.) to see how the system recovers in this case. Again, this will depend on business requirements and expectations. Also, If I really need to vary the load, I would rather use the random option in LoadRunner and vary the load but still keep the overall load stable.

Note 2: I am not implying that defining load in terms of number of concurrent users is not important. For some applications (e.g., citrix or remote desktop applications) it is the most important load defining criteria. Even for web based applications, you may want to find out how many users you can concurrently support before the server runs out of memory. This will help you determine when you'll need to buy extra hardware. But any commercial load testing tool has to support the ability to generate transactional load as well now that XML and webservices are becoming more and more common.

Note 3: Current eLoad version that I'm using is 8.10 and LoadRunner is 8.1.4

Friday, July 13, 2007

WebScarab

There have been times during LoadRunner scripting that I needed to see the low-level HTTP request that is being sent from my client (which I am using to record, e.g., a browser, or a custom client) to the server. Earlier, I used Ethereal (http://www.ethereal.com/) successfully but the problem is that it doesn't support SSL directly. So if the communication is over SSL, all I would see is encrypted data and there was no way to see the headers/data being transferred. This made me look for alternatives. I recently came across WebScarab and I wouldn't say it's free of bugs but I'm sticking with this tool for as far as I can see in future. Here's how I was able to solve some of the problems in LoadRunner scripting using this tool.

Solution 1: (SSL Intercept)
First things first, WebScarab proxy

is able to observe both HTTP and encrypted HTTPS traffic, by negotiating an SSL connection between WebScarab and the browser instead of simply connecting the browser to the server and allowing an encrypted stream to pass through it.

This is a major advantage. So I no longer have to hope that one of the test environments will not have SSL implemented and will let me observe the non-SSL HTTP traffic. Watching browser traffic was just as easy as starting the WebScrarab proxy and pointing the browser to the local proxy. It gives the options to intercept request and/or responses and lets you modify the requests in any way before passing them on the server. For a custom client, I can capture the exact headers being sent and add them to my web_custom_request:

    web_add_header("Cache-Control", "no-cache");
    web_add_header("SOAPAction", "\"\"");
    web_add_header("Accept-Encoding", "gzip, deflate");

And I can also copy the body if it's an HTTP XML post for example, to get the exact XML data being sent and put that in the body of the request:

    web_custom_request("SampleService",
        "URL={URL}",
        "Method=POST",
        "EncType=text/xml; charset=utf-8",
        "TargetFrame=",
        "Resource=0",
        "RecContentType=text/xml",
        "Mode=HTTP",
        "Body="…

and so on. See screenshot.

Solution 2: (Reverse Proxy/Act as a web server)
As it happened, the client I was using (for more details on this, see my previous post) had pre-configured options of selecting the URL. So I didn't have any way to point the client to the local WebScarab proxy. I looked through the help contents and found this:

WebScarab allows you to specify a "base address" for a Listener. The base address instructs the Listener to operate as a reverse proxy, and should be formatted as a HTTP or HTTPS URL. In this mode, it will act as a web server, rather than as a proxy server, and will construct the URL by concatenating the base URL and the path that appears in the request line. If the base URL is an HTTPS URL, it will immediately negotiate an SSL tunnel prior to trying to read the request from the browser. This is useful for the situation where you are using a custom HTTP-based client that does not support configuring an upstream proxy. Simply change the hosts file on the computer on which the custom client is running to point the site in question to the computer on which WebScarab is running on, and WebScarab will receive requests for the targeted website.

This meant that I could use the hosts file to point to the proxy and specify the base address in the proxy listener to intercept the requests. In a few tries, I was able to intercept the SSL requests over a non-local base address. Again, I could get the headers and body and use it in the web_custom_request. See screenshot.

Solution 3: (SSL Server Certificate)
Another client that I was recording my script against had a similar issue. The client didn't support pointing to an upstream proxy so I configured the hosts file to point to the listener proxy. However when running the client, it threw this exception:

{http://xml.apache.org/axis/}stackTrace:javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target.

I tried adding the WebScarab certificate to the keystore through Java Control Panel but no luck. After googling a little more, I came across this forum thread: http://forum.java.sun.com/thread.jspa?threadID=220329&tstart=165
Apparently, Java uses its default keystore and to use another keystore, it has to be created and provided in one of the arguments. So after importing the WebScarab certificates into IE trusted certificates and then exporting it into .cer, creating a keystore with the certificates and modifying the batch run file to add
-Djavax.net.ssl.trustStore= -Djavax.net.ssl.trustStorePassword=

I had my fingers crossed when running the client again. But fortunately, this time it worked as expected and intercepted the HTTPS requests without any errors. Once again, I used the custom headers and the XML body to create LR web_custom_header request.

Friday, June 29, 2007

LoadRunner function: web_convert_param

After spending about 6 hours trying to figure out a solution to the problem of parameterizing URL encoded strings in my LoadRunner script, I finally found the solution: a built-in function provided by LR.

The problem:
I recently recorded a script against an Java application that ultimately sends a HTTP POST request to the server being tested with an XML in the Body of the request. The script came out fine, except that the body (XML content) was URL encoded. It would have been fine if I didn't need to parameterize the data in the body but since I had to, I realized in a few seconds that I would have to do some extra work to convert the data from parameter file to URL encoded format. The data in parameter file is of the form:

Fname,Mname,Lname,Gen,SSN,A1,A2,A3,A4,A5,City,State,Zip
John,,Smith,,123456789,123,Main,St,#,88,TestCity,TestState,987654321

However, in the script, address is one field that is a concatenated string of A1 - A5 parameters with a space between each of them.

Further Research:
A1 - A5 are the components that make up the street address that is supposed to be concatenated to one field in the script. No problem.
sprintf(as,"%s %s %s %s %s", lr_eval_string("{a1}"), lr_eval_string("{a2}"), lr_eval_string("{a3}"), lr_eval_string("{a4}"), lr_eval_string("{a5}"));
lr_save_string(as, "addressStreet");

…but the problem is there are spaces in between this components, which we all know get converted to a '+' character in URL encoding. Ok, so I could've just used sprintf(as,"%s+%s+%s+%s+%s",…
But some of the addresses also have '#' signs that get converted into '%23'. Now I had 2 options:
1. remove all the addresses that have a '#' sign
2. write a URLEncode(char *) function that does the obvious.

1st option wasn't very tempting because however unlikely it may be, the addresses can have other characters that need to get URL encoded as well. And if a sizeable chunk of the data given to me has these characters, I could lose a lot of records. Also I didn't want to have to modify the data every time I get a new 10,000 record file.
2nd option seemed the way to go. But it was late Friday and I had to start the tests in a short time so I was lacking the much needed patience. Anyways, I ended up running my tests by removing the records with '#' character. Luckily, there were few.

Solution:
On Monday after running all the tests, I got back to writing the URLEncode function. After researching online on what the characters should get converted to and trying to find an already published function in C that I could tweak to my purpose, and stumbling on PHP, Java, JavaScript functions, I started getting a sense that LR may have some kind of built-in function that does that or something similar. I don't know why I hadn't thought of using Mercury Knowledgebase before that. So a simple search of "url encoding" brought up 3 articles - none of which seemed helpful. But reading through the 2nd one (KB Problem ID: 18880), I couldn't believe what I saw:
Pass the 'XMLSource' parameter as an input to the web_convert_param function, and store the result as 'TargetXML'
web_convert_param("TargetXML", "SourceString={XMLSource}", "SourceEncoding=HTML", "TargetEncoding=URL", LAST);

So this web_convert_param function seemed to do exactly what I needed. I tried it out with a couple of different strings and it did as promised. So my new script had:
sprintf(as,"%s %s %s %s %s", lr_eval_string("{a1}"), lr_eval_string("{a2}"), lr_eval_string("{a3}"), lr_eval_string("{a4}"), lr_eval_string("{a5}"));
lr_save_string(as, "addressStreet");

web_convert_param("encAddressStreet", "SourceString={addressStreet}", "SourceEncoding=PLAIN", "TargetEncoding=URL", LAST);
lr_output_message("%s", lr_eval_string("{encAddressStreet}"));

It took me only a few minutes to make that change. And I didn't need to worry about the '#' characters or any other unsafe characters in the address string. Phew…

Morale:
Use the Mercury Knowledgebase more often. If you think there should be any other, let me know.

Random Sync