Friday, July 1, 2011

Friday Lunch: Content-Type, Content-Disposition HTTP Headers and MIME type handling in IE

We have a scripting tool that is used to create web-based scripts that simulate application scenarios. These scripts are run through a monitoring tool on a continuous basis to assess the health of these applications. The scripts & scripting tool are also used to validate the scenarios if the monitoring tool issues an alert. The scripting tool uses IE to run the HTTP request/response scenarios. When recording the script for a recent new application, it was noticed that after submitting the request which was an HTTP POST with XML data, the tool was bringing up a dialog box to save the response file, something you’d get when clicking on a HTML link in the browser to a file (maybe a word .doc etc) and if you don’t have an application to open that kind of file. We checked the response and it contained a standard XML that should usually be displayed within the browser (like if you click on Amazon’s AWS service). The fact that it was bringing up a download prompt and not displaying the XML in the browser was a problem because the non-technical team that uses it to validate the application will not be able to determine whether the script ran successfully or not.

The application is a servlet that takes the initial POST request and returns an XML response. After further investigation with the tool and looking at the response headers, it was found that the application was returning “application/x-www-form-urlencoded” in “Content-Type” header erroneously. I haven’t found something specific that would say that content type is not correct for HTTP response, but it is more common to use this content type when submitting form-submission data that is URL encoded (see here).

Investigating with WebScarab

My first conjecture was that the file download prompt was being displayed because of this Content-type header’s value. If we could intercept the response and replace its value with something like “text/xml”, the tool or browser will be able to display the content within the application instead of bringing up the file download prompt. And to validate that, I used WebScarab to intercept the request/responses. But for some reason (maybe because of invalid Content-Type header), it didn’t intercept the response. It would intercept the request and after accepting, it would just complete the request and we’ll end up seeing the file download prompt in the tool. Clearing the “Only MIME-Types matching:” didn’t help.

So the workaround was to write a BeanShell script within WebScarab to replace the “Content-Type” header with a proper value instead of manually intercepting the response. That turned out to be quite straightforward. All we needed to do was put:

response.setHeader("Content-Type","text/xml");

in WebScarab’s Bean Shell interface under the fetchResponse method. As is obvious, this replaces the value of Content-Type header in the response to “text/xml”.

Once we tried it out, it worked and the browser displayed the XML response without bringing up the file download prompt. This proved that the Content-Type header was the problem. But what exactly was the problem? What makes the browser display a prompt to the user to download a file instead of just displaying the content? Further searching revealed this article: http://www.jtricks.com/bits/content_disposition.html

So basically the Content-Disposition header along with Content-Type specifies the MIME type of the file returned and whether the browser should display it or prompt the user to download it. The RFC has all the details but the gist is that the Content-Disposition header can be set to “attachment” for the browser to display the file download prompt. But the response we were getting didn’t have a Content-Disposition header. And IE was still displaying the file download prompt. Further reading warranted.

How does IE handle MIME types in a response, anyways?

I kept on searching and reading on how and when a browser brings up the file dialog prompt. The Content-Type header still bothered me. Then I came across this MSDN article: MIME Type Detection in IE. It provides details on what steps IE takes to detect the content-type of the content before deciding what to do with the file:

The purpose of MIME type detection, or data sniffing, is to determine the MIME type (also known as content type or media type) of downloaded content using information from the following four sources:

  • The server-supplied MIME type, if available
  • An examination of the actual contents associated with a downloaded URL
  • The file name associated with the downloaded content (assumed to be derived from the associated URL)
  • Registry settings (file name extension/MIME type associations or registered applications) in effect during the download

If you look at the known MIME types for FindMimeFromData, “application/x-www-form-urlencoded” is not one of them. It is also not an ambiguous MIME type ("text/plain," "application/octet-stream," an empty string, or null) either. So with the MIME type as unknown,

If the "suggested" (server-provided) MIME type is unknown (not known and not ambiguous), FindMimeFromData immediately returns this MIME type as the final determination.

And what happens after that is in this MSDN article: Handling MIME Types in IE. It doesn’t mention exactly what happens if the MIME type is unknown, there’s no application associated with it in the registry and there’s no Content-Disposition header. But I’m guessing that the only thing for it left to do is to let the user decide by showing the file download prompt as was happening in our case. And we were able to prove that. I looked up the registry key for “text/xml” under HKEY_CLASSES_ROOT\MIME\Database\Content Type and added it to a new key “application/x-www-form-urlencoded”. The new registry key looked like this:

[HKEY_CLASSES_ROOT\MIME\Database\Content Type\application/x-www-form-urlencoded]
"CLSID"="{48123BC4-99D9-11D1-A6B3-00C04FD91555}"
"Extension"=".xml"
"Encoding"=hex:08,00,00,00

This tells IE that for responses received with Content-Type equal to “application/x-www-form-urlencoded”, use extension “.xml” and open it however XML files are opened (which is to display them inline). Once added, we went back to the tool and ran the script. And this time, it displayed the response xml without bringing up the download prompt.

So through this exercise, I learned about how IE displays different MIME types and about Content-Type and Content-Disposition headers.

No comments:

Post a Comment