Friday, May 21, 2010

Goodbye multiple putty windows!

If you use PuTTY for your remote telnet or SSH needs, chances are you have had a situation where you had to open and manage PuTTY windows. I used to do that and it was sometimes confusing. But recently, just out of luck (my laptop had crashed and I was looking for ways to export/import PuTTY connection profiles), I found this: PuTTY Connection Manager. Its a tabbed version of PuTTY client and provides a solution for managing multiple PuTTY instances. I’ve tried and it works great!

Since I’m endorsing some good tools that I use everyday, here’s others:

- Agent Ransack: A file searching utility for Windows, it has great features for searching by regular expression, within files etc. I haven’t used Windows built-in Search since I found this.

- SketchPath: is an XPath tool that you can use to view XMLs and run XPath queries.

- Sysinternals: a collection of utilities to perform various troubleshooting tasks in Windows. It provides a lot of helpful utilities like Process Explorer (an advanced version of Task Manager), tcpview (shows all open connections with their processes) etc.

- WinMD5: an MD5 utility for Windows.

That’s it for now. I’ll update this post if I can remember more tools to recommend.

Saturday, May 15, 2010

LoadRunner Scripting Challenge – KodakGallery.com (AJAX, JSON, REST & XML)

If you are not familiar with KodakGallery.com, it is an online photo publishing and sharing site offered by Kodak. It offers features to store and share your photos online and to print them or order certain photo products, just as competing sites like flickr, Snapfish, Picasa etc. What they don’t offer is an API to interact with your account online (uploading/downloading pictures etc) without having to go through their website. Flickr (the site I use) has a published API that can be used.

Background:

Last year, KodakGallery changed their storage policy so that users would have to make a minimum purchase from the site based on their storage size in order to continue storing pictures. Even though the cost is minimal for the storage they offer, it was still considered worthwhile to explore other options. And the direct impact to me was that I was tasked with downloading all the pictures that have been stored there for last few years. Manual download was out of the question because of the number of pictures that had accumulated over that duration. And scripting it with Perl or LoadRunner (my weapon of choice) was very viable because of my experience with finding myself in these kind of situations and the thrill of being faced with a challenge and learning something new out of it.

Note: If you navigated here looking for an automated way to download all your pictures from KodakGallery.com, I’m planning to create a Perl script to automate that. You can safely ignore the rest of the post and write me a comment, which will provide me with the motivation to stay up extra hours in the night.

Challenge:

So here’s what we needed to do: Build a script that logs in to KodakGallery.com website and downloads all stored pictures to the local disk categorized by their album folders.

If you’re a LoadRunner user who has to use it in your daily life (atleast while at work) and want to give it a shot, please do so before going through the post. You’ll need a KodakGallery account and a few albums with pictures in them. I promise that it’ll be fun and challenging. I also have to state that scripting/automating interactions with a website has to be done with some caution. You may be using the website’s resources in a way they were not intended to be used. Sean Burke, author of Perl & LWP puts it in a very succinct and precise manner here:

…the data on the Web was put there with the assumption (sometimes implicit, sometimes explicit) that it would be looked at directly in a browser. When you write an LWP program that downloads that data, you are working against that assumption. The trick is to do this in as considerate a way as possible.

Scripting:

With that in mind, let’s get to it. Scripting a live website comes with its own unknowns. The fact that you have no idea about the underlying technologies used to transfer the data and present it to the user provides a great opportunity to learn not only about some new technologies but also about the tool that you use because you may have to use the tool (or some of its features) in ways that you’ve never done before.

The first step in LoadRunner scripting of course is to record the user interaction with the website. For the record, I only have availability to LoadRunner 8.0 that I’m using for scripting here. I’ve heard that newer versions have better support for new web technologies that have come up in recent years. But in last few years at my current role, there’s never been a time that I wasn’t able to deliver a script because of using an older version and I have never felt that I’m missing something.

I created a script using my preferred recording options:

Recording Mode: HTML-based script containing explicit URLs only
Not using correlation
Not recording headers
Not excluding content types
Do not record this content types as a resource:
text/html
text/xml
Record non-HTML elements in current HTML function

I put a comment in the script before every action so that when editing the script later on, I know clearly where each action starts and ends. But after recording, the script in this case was not very intuitive. Some of the actions didn’t correspond to my script comments. For example, in the script where I put the comment for login, I didn’t see any web request that would match a login request or any form-based submission with login parameters. Instead, there was a web_url request to “storagestatus.jsp”. When I put a comment for clicking on an album, there were no steps.

So after scanning through the recorded script and the recording log, I realized that the login and other actions were being submitted through JavaScript and the content-type for these requests was non-HTML. My current recording settings specified that the requests that do not have a content-type of “text/html” or “text/xml” are not to be recorded as a resource, and so those were considered a part of current step and included after the EXTRARES parameter or the original request. Here’s the initial request. The login request is included in the original step (I later found that it has the Content-Type=text/javascript) and the authentication itself is handled through submitting the username and password in an HTTP cookie called “ssoCookies”:


web_add_cookie("ssoCookies=%7B%22email%22%3A%22abc%40example.com%22%2C%22password%22%3A%22test%22%7D; DOMAIN=www.kodakgallery.com");
web_url("welcome.jsp",
"URL=http://www.kodakgallery.com/gallery/welcome.jsp",
"TargetFrame=",
"Resource=0",
"RecContentType=text/html",
"Referer=",
"Snapshot=t1.inf",
"Mode=HTML",
EXTRARES,
... //lots of images
"Url=https://www.kodakgallery.com/gallery/account/login.jsp?&uid=791701608",
ENDITEM,
LAST);

So with that information, I saved the current script and recorded another script with different recording options. I used URL-based script that records all the content (including non-HTML, like css, js, gif etc) in a separate web_url function. That meant the script was longer and a little harder to navigate through but atleast I could scan through the script and figure out what requests are being submitted by their corresponding actions.

Having recorded the actions (login, navigate to an album, download a full-resolution image etc), scanning through the script multiple times, using WebScarab (related post) to scan through the HTTP traffic, I found a lot of interesting things:

  1. Login is handled through JavaScript. A lot of JavaScript. They use MooTools JavaScript libraries for a lot of functionality but for login, it creates the cookie “ssoCookies” and sends a request to “login.jsp:” which (after successful authentication) returns some user information (ssId: probably some sort of unique user identifier, firstname: firstname of the user) in a script (Content-Type: text/javascript) which executes another JavaScript function (“callSignInComplete”), which sends the user to “/gallery/storagestatus.jsp”, which then ultimately redirects (HTTP 302) to “/gallery/creativeapps/photoPicker/albums.jsp”.
    >>Here’s some interesting information on how this lazy JavaScript loading works: http://ajaxpatterns.org/On-Demand_Javascript
  2. The site uses JavaScript and AJAX extensively to request resources and to present the response. For example, the album list (the images it uses to show the links to individual albums) and the pictures within the albums (once you click on an album) are retrieved asynchronously using XMLHttpRequest object.
  3. Complimentary to AJAX, it uses JSON to exchange the data. For example, the request for the list of albums returns a JSON response with the album list and the details. Here’s the HTTP request headers for the album list and as you can see (x-request and x-requested-with are custom HTTP headers used by the app):
    "GET /site/rest/v1.0/albumList HTTP/1.1\r\n"
    "Accept-Language:
    en-us\r\n"
    "Accept-Encoding: gzip, deflate\r\n"
    "Connection:
    Keep-Alive\r\n"
    "Accept: application/json\r\n"
    "x-request:
    JSON\r\n"
    "x-requested-with: XMLHttpRequest\r\n"


    And here’s a part of json response:

    {"AlbumList":{"TOS":{"storageSize":"11111111","storageStartDate":
    "2010-05-05T08:54:54.676-07:00","tosTotalTransactionsAmt":"0",
    "tosStatus":"4","tosComplianceDate":"2010-08-27T00:00:00.000-07:00",
    "warningZone":"false"},"Album":[{"id":"12355"...

  4. It implements the services using REST architecture. Representational State Transfer (REST) is an architectural style to expose services on the web and you can read more about it online…but to a LoadRunner scripter, XML services exposed RESTfully are no different than any other XML based service over HTTP. Some more information about REST:
  5. It includes some tracing cookies and requests to third-party sites that can be safely ignored and commented out.

So with all that information, it was fairly easy to visualize the script’s high-level steps:

  1. Navigate to home page & login.
  2. Send a REST-style request to get the list of Albums (name, URI etc)
  3. For each Album in the list
    1. Get the name of the Album and create a corresponding folder on the local disk
    2. Send a request to get the list of all photos in the album
    3. For each photo
      1. Send a request to download the image file
      2. save the image file in the album folder
  4. Logout
Step 1:

The 1st step is to navigate to the “welcome.jsp” page which returns the session cookies to be used throughout the session. I deleted all the extra requests in the script for images, css files etc. Next step is to login and to do that, we need to send a request to login.jsp with a random 9-digit number and also the username and password in the “ssoCookies” cookie. All other session cookies are automatically handled by LR.


web_url("welcome.jsp",
"URL=http://www.kodakgallery.com/gallery/welcome.jsp",
...
LAST);

web_add_cookie("ssoCookies=%7B%22email%22%3A%22abc%40example.com%22%2C%22password%22%3A%22test%22%7D; DOMAIN=www.kodakgallery.com");

web_url("login.jsp",
"URL=https://www.kodakgallery.com/gallery/account/login.jsp?&uid={randnum}",
...
LAST);


Step 2:

Once login is successful, we need to get a list of albums that the user has. This is done by sending a GET request to this REST-style URL: “http://www.kodakgallery.com/site/rest/v1.0/albumList” directly. We’ll also need to save the response body in a parameter that we’ll have to parse to get the album details.

web_reg_save_param("albumList", "LB=", "RB=", "Search=Body", LAST);
web_url("albumList",
"URL=http://www.kodakgallery.com/site/rest/v1.0/albumList",
...
LAST);

Now here’s the beauty of how the service has been implemented. When you visit the website through a browser, the actual response to this request is returned in JSON formatted string. But we can actually send the request in such a way that it returns the album list in XML rather than JSON. All we have to do is to not include the “Accept: application/json” header, instead just send “Accept: */*”. And since it’s easier to use LR’s built-in XML functions to parse XML strings, we do exactly that. LR’s web_url() function sends “Accept: */*” by default so we get an XML response with the album list.

Step 3:

So once we have the Album List in XML, I use lr_xml_get_values() to get the id of all the albums in the list.

numAlbums = lr_xml_get_values("Xml={albumList}", "Query=/AlbumList/Album/id","SelectAll=yes", "ValueParam=albumId", LAST); 

It returns the number of matches that match the XPath Query, which is the number of albums the user has. This parameter “albumId” holds all these ids and will be used to get the list of all the photos in an album.

Step 3.1:

Now for each of these albums, we get the id from the parameter “albumId” and then get the name of the album using lr_xml_get_values again and using the id in the XPath. Then we go ahead and create the directory as specified.

    for(j=1;j<=numAlbums;j++){
sprintf(sfx,"{albumId_%d}", j);
lr_save_string(lr_eval_string(sfx), "aid");

lr_xml_get_values("Xml={albumList}", "Query=/AlbumList/Album[id='{aid}']/name","ValueParam=albumName","NotFound=Continue",LAST);

sprintf(dname,"%s\\%s",baseDir,lr_eval_string("{albumName}"));
if (mkdir(dname)) { //works, but need better error handling
lr_output_message("Create directory %s failed", dname);

return -1; }

Step 3.2:

With the album Id, we send another GET request to the URL: “http://www.kodakgallery.com/site/rest/v1.0/album/{aid}”. Just like we did above for the album list, we save the response body which is a list of all the photos in this album.


        //---get album details
web_reg_save_param("albumDetails", "LB=", "RB=", "Search=Body", LAST);
web_url("albumDetails",
"URL=http://www.kodakgallery.com/site/rest/v1.0/album/{aid}",
...
LAST);

numPics = lr_xml_get_values("Xml={albumDetails}", "Query=/Album/pictures/photoUriFullResJpeg","SelectAll=yes", "ValueParam=fullResURI", LAST);

And again, just like we did above, we user lr_xml_get_values() to get the URIs to the full resolution picture. It returns the number of pictures and the URIs in a parameter.

Step 3.3:

Now for each of the pictures, we get the URI to full-resolution image from the parameter (lines 3-4 below). We need to get the filename which is returned in the “Content-Disposition” HTTP header (line 7) and also, we have to save the whole body (binary image data) in a parameter (line 9) that we can later use to store on the local disk.

   1:         //get all photos in the album
   2:         for (i=1;i<=numPics;i++){
   3:             sprintf(sfx,"{fullResURI_%d}", i);
   4:             lr_save_string(lr_eval_string(sfx), "uri");
   5:  
   6:             //save the file name that's part of Content-Disposition header
   7:             web_reg_save_param("filename", "LB=Content-Disposition: attachment;filename=", "RB=\r\n", "Search=Headers", LAST);
   8:             //save the whole HTTP body of request
   9:             web_reg_save_param("body", "LB=", "RB=", "Search=Body", LAST);
  10:  
  11:             web_url("FS",
  12:                 "URL={uri}",
  13:                 "TargetFrame=",
  14:                 "Resource=1",
  15:                 "RecContentType=image/jpeg",
  16:                 "Referer=",
  17:             LAST);
  18:  
  19:             lr_eval_string_ext("{body}",strlen("{body}"), &buf, &prmLen, 0, 0, -1);


Then we send a GET request to the URI which returns us the full-resolution image. But since this is a dynamically generated response (“Transfer-Encoding: chunked”), the server doesn’t return the size of the file in the Content-Length HTTP header which we could have used to write the contents in a file. Instead, we have to use lr_eval_string_ext() (line 19) to save the value in a buffer and to get the length of the buffer.

Now we have everything to save the final file: name, size and its contents. We use standard C file handling functions for that and finally free the memory by using lr_eval_string_ext_free().
            sprintf(fname, "%s\\%s", dname, lr_eval_string("{filename}"));
if ((file = fopen(fname, "wb" )) == NULL) {
lr_output_message("Unable to create %s", fname);
return -1;
}
fwrite(buf,prmLen, 1, file);
fclose(file);
lr_eval_string_ext_free(&buf);


The code then loops to save each image in each album locally.

Step 4:

The last step is to logout which is just another request to logout.jsp with a random number. It goes through similar steps as the login and finally redirects back to the home page.

Epilogue:

This was a great challenge and if you actually made it this far, I hope you enjoyed reading it and trying it yourself. I got to learn a lot from this and I hope you do too. Specifically, I learned about REST, JSON and about JavaScript lazy loading as well as HTTP chunked transfer. I also learned a little bit more about looking into LoadRunner recording logs, saving the HTTP response in a file and the XML functions in LR.

As I noted earlier, if you went through all this just looking for an automated way to download your pictures from KodakGallery, please leave me a comment and I’ll work on a Perl script to automate that.